Last week, one of my hard disks crashed. I maintain a file server @ home running OpenBSD with the raid(4) driver. This setup was done in Augustus 2007. The design is quite simple: 2 x 250 GB disk configured in RAID1.
I already had a power outage two months ago which caused a complete parity rebuild but the last crash forced me to completely review my setup and ideas regarding software RAID.
Storage solutions become more and more critical even for personal usage: media centers, email archives. If it’s so critical, there must be countermeasures to reduce risks. Why do we need a secure storage solution?
- For availability: The files must be available when users need them. The system needs to take care of detected issues and try to recover them in a transparent way.
- For integrity: The system must apply mechanisms to avoid files corruption.
- For confidentiality: The system must, in all cases, restricts user access to its own files. Even in case of degraded situation.
My first solution was not able to handle those three requirements. When the power failure occured, the RAID parity was wrong and the system decided to rebuild the parity. It took more than 4 hours! Availability is not achieved in this case. When one disk failed, the system crashed too with a kernel panic. I rebooted the server, it failed to restart the RAID properly (on a stand alone disk). I was forced to manually perform a recovery. Then, during the parity rebuild, a second crash killed my file system (even fsck was unable to repair it).
I decided to completely review my setup. Instead of RAIDing my disks again, I decided to rsync them. What are the pro/con of rsync and RAID?
|– Complex setup and maintenance||+ Easy setup|
|– Parity rebuild + fsck at boot time||+ Only fsck|
|– Media must be the same||+ Sync to different media or servers|
|+ Immediate synchronization||+ Quick restore of deleted files (human error)|
|+ Transparent for the user||+ Media portable with other OS (ext3fs, ufs, …)|
|+ Better I/O split between devices|
|– Delayed synchronization|
|– Performance impact during synchronization|
The big advantage of rsync is the ability to split I/O across multiple devices. In a RAID1 setup, one disk is used, the other one has an exact copy of your data. It’s easy to created several file systems and select the correct master / slave disk for replication.
Finally, RAID or rsync, do not forget to have a good backup policy! A simple solution is to rsync critical files to a remote location over ssh. This can be easily achieved with the –include-from=FILE parameter.