|
|
All Hard Disk Drives Eventually Fail
| |
This isn't necessarily because of a manufacturing defect; it's simply
because these systems contain high-performance moving parts. Just like
every car will eventually fail to start, no matter how well it is made,
eventually every hard drive will fail.
With current pricing, hard disk drives are incredibly cheap. Even
high-performance SCSI drives are becoming less and less expensive. Most of
them even come with three or five-year warranties. So the cost of replacing
the physical drive is not a huge financial burden. However, there's another
factor in play: your data.
Everyone involved in corporate information services should perform simple
risk assessment. How much would a day of downtime on a workstation
cost my organization?
On a server? How much cost to the company would be incurred if all our data
were simply gone? In some cases, the cost might be minimal. For most
businesses, though, the cost of losing all the data on the system would be
astronomical simply in terms of re-entry, and that assumes that the data can
even be recreated from other sources.
Even assuming you are performing good, reliable server backups and that
those backups are intact, you still have the downtime associated with rebuilding
the server. In order to restore from that backup, you have to install the
new drive, format it, re-install the operating system, install service
packs, install all relevant video, tape, SCSI drivers, install the backup
software, and then restore from the tape. You run the risk that certain key
information, such as the Windows NT Registry or the Windows 2000 System
State Information, was not properly backed up. Some databases get backed up
in an inconsistent state and must be repaired. Or, if the settings aren't
just right, those databases might not be backed up at all.
A server rebuild, even with a backup, is generally a three
to four-hour process if everything goes smoothly, during which that server
-- and in many cases, an entire department or more -- is down for the count.
On top of that, you could easily lose as much as a full day's worth of data.
On a workstation in a network where the policy is to store all created
files on the server, the loss may not be as severe. However, it is usually
not cost-effective to install a backup device on each workstation, and
network backups are often slow and difficult. Even assuming no irreplacable
data is stored on a workstation, you still have the time involved in
replacing the drive, re-installing the operating system and all drivers,
installing all the applications, and re-customizing the system for your
network and for the user's needs. How much time will this require? How
long will the user be unable to do his or her work?
|
The Solution: Mirroring and Striping
| |
Fortunately, there's a solution to this problem. Called 'RAID' (for
Redundant Array of Inexpensive Disks), this technology stores the
information across multiple drives in such a way that even if one drive
fails, the data remains intact and accessible. A full description of how
the technology works is beyond the scope of this document, but a couple of
points should be made.
There are basically two types of RAID: mirroring and
striping. Mirroring
involves simply writing the same information to two drives at once. They
become mirror images of one another. If one fails, the system can continue
running on the other one until the failed drive can be replaced. Striping
is more complex; the information is written across multiple drives in such a
way that any single drive in the array can fail, and the system can continue
running until the failed drive can be replaced. In both cases, the drives
must be at least approximately the same size -- whichever drive is smallest
determines the size of the array.
Mirroring is the simplest technology. It uses two and only two drives
per 'mirror set'. You buy two drives and get the storage space of one of
them. For example, if you buy two twenty-gigabyte drives, you get twenty
gigabytes of fault-tolerant storage. However, you can boot off of a
software-mirrored drive with no real issues. This technology is also
referred to as RAID-1.
Striping is more complex. It uses from three to thirty-two drives to
form a 'stripe set'. One of the drives is designated to store parity
information used to reconstruct the information on the other drives. If the
parity drive is lost, all data is still present. If one of the other drives
is lost, then the parity drive's data can be used to reconstruct what must
have been on the failed drive. You can't boot off of a software-striped
drive, however. Since one drive is used for parity information, you
effectively lose the storage space of that one drive. If you have five
twenty-gigabyte drives in an array, you get effectively 80 gigabytes of
fault-tolerant storage. This technology is also referred to as RAID-5
|
Hardware and Software RAID
| |
RAID can either be implemented by hardware or by software. With
Hardware RAID, a special RAID controller card is installed in the machine.
It makes the drive appear to be a single
drive to the operating system. Most hardware RAID controllers can perform both mirroring
and RAID-5, and traditionally have only worked on the more expensive SCSI
drives. However, there are now several brands of IDE RAID controllers on
the market, and they are becoming less and less expensive with time.
Hardware RAID is more robust, since it doesn't require the operating
system's intervention. Everything is managed directly by the card. You can
always boot off of a hardware RAID array, whether it be a stripe-set or a
mirror. Even operating systems such as Windows NT Workstation, Windows 2000
Professional, or Windows 95/98/ME, can use a hardware RAID card, even though
they don't support RAID in software. However, Hardware RAID cards typically
start at approximately $500 US for a SCSI model, and $100 for an IDE RAID
model.
Software RAID functions without any additional hardware. Typically, only
server operating systems support it, such as Windows NT Server, Windows 2000
Server, or Linux. Two problems arise, however. First, since most systems
support a maximum of 4 IDE devices, one of which is typically the CD-ROM,
implementing software IDE stripe-sets is typically not feasible. You can
only do mirroring simply because you can't install enough drives. Further,
while many OS's permit you to boot off of a mirrored drive, you usually
cannot boot from a striped array. IDE is also poorly suited to the
demands of mirroring or striping, since it does not handle multiple
simultaneous reads and writes as well as SCSI does. Finally, when
implementing software RAID, keep in mind that you will see a performance hit
on the system due to the demands of keeping the drives synchronized.
When you first turn a computer on, it must 'boot' the operating system.
That is, it must load its core code (called the kernel) into RAM from some
medium, whether that is a floppy disk, a CD-ROM, or most commonly, a hard
disk drive. Floppy disks usually aren't big enough to hold the core
operating system files (some versions of Linux can still boot from floppy if
needed) and CD-ROMs can't easily be changed if the components change.
Therefore, if you cannot boot from an array, you have to have some other
kind of hard disk in the system. And, if you don't put some kind of fault
tolerance onto that drive, then it becomes a potential point of
failure.
The primary situation where this arises is RAID 5 implemented via
software. Companies want to use RAID 5 because it is faster than mirroring,
and gives you more space per amount of money spent. However, if it's
implemented in software, two more drives must be added to store the system
and boot partitions. Once the system boots, it can then access the data in
the RAID 5 array.
The following table summarizes the different options:
|
|   |
Hardware RAID | Software RAID |
|
Mirroring (RAID 1) |
- 2 drives required
- Fully fault-tolerant
- Bootable
- High performance
- Any OS supported (if drivers available)
- Effective storage space equal to that of one of the drives
- IDE or SCSI supported
|
- 2 drives required
- Fully fault-tolerant
- Bootable
- Moderate performance
- Windows NT Server, Windows 2000 Server, Linux
- Effective storage space equal to that of one of the drives
- IDE or SCSI supported
|
| Striping (RAID 5) |
- 3 to 32 drives supported
- Fully fault-tolerant (with parity)
- Bootable
- Best performance
- Any OS supported (if drivers available)
- Effective storage space equal to sum of all drives, minus one used for
parity
- Usually SCSI; some IDE controllers available
|
- 3 to 32 drives supported
- Fully fault-tolerant (with parity)
- Not bootable
- Supported by Windows NT Server, Windows 2000 Server, Linux
- High performance
- Effective storage space equal to sum of all drives, minus one drive used
for parity
- SCSI strongly recommended
|
|
|