|
|
Disaster Planning and Recovery
| |
Nobody wants to think about disasters. We like to think that everything
is going to be just fine, and that thinking about things we would prefer
didn't happen would be tantamount to tempting fate. Unfortunately, things
will go wrong, and often, it seems, at the most inopportune times.
Under item '2' above, I talked about the importance of fault tolerance: using
multiple drives to prevent data loss and maintain server uptime in the event
of a hard disk drive failure. Fault-tolerance is important, but it is not
the final solution. Its goal is to make it less likely that you have to
rely on your last line of defense: your backups.
|
When Fault-Tolerance Isn't Enough
| |
Keeping your server online is, of course, important. However, there are
times that fault-tolerance won't save you:
- Your system could the victim of a malicious attack by inside
personnel.
- A technical or configuration error could leave a security hole
open.
- An administrative error or technical mistake could result in a large
quantity of data being deleted.
- An end user could mistakenly delete critical files.
- A bug in a piece of software or the operating system itself could result
in corruption of partition tables or other information, which could
be replicated to all drives.
- A failure in the controller card could cause all drives to go offline,
potentially corrupting the data stored in a stripe or mirror set.
- A trojan horse program or virus that slipped past your virus protection
could destroy critical data.
- A critical data file could be corrupted by a bug in the
application.
- Multiple drives could, potentially, fail at the same time, or a second
drive could fail before the first failed drive can be replaced.
- The completely unexpected could happen: natural disasters, fires,
vandalism, or even, given the recent changes in world events, terrorist attacks.
|
Backups: Your Informational Insurance Policy
| |
Each of these items is, itself, relatively unlikely. However, taken
together, they present a potential that must be taken into account. A prudent
systems administrator makes plans for regular offline backups of crucial data.
A backup is, almost by definition, preparing for the unexpected. It's
like insurance: something you hope never to have to use, but that you want
to be there when you need it.
Backups, based on their nature as offline storage, are also made
periodically -- usually once per business day. This means that in the event
of needing to use them, a business may lose up to one day of work even if
the system functions perfectly. However, one day is usually far less
devastating than 'everything'.
|
Verifying Your Backup Is Working
| |
The primary problem with backups is verifying that, indeed, they are
functioning properly. That they are backing up everything that needs to be
backed up. It's important to check the settings on the software, to make
sure that information such as the system registry or system state
information is being backed up. That appropriate backups are being done of
open files such as databases or the Microsoft Exchange Message Store. In
addition, it's important to make sure that the tape device is working.
Backup procedures need to include checking the log files at least
periodically, and preferably after each backup is completed, to ensure that
errors are not occurring. My own personal experience includes one office
with no on-site IS person. The office staff was religiously swapping tapes
every night. I got called in for the first time after the server
experienced a hard disk drive failure, and I had to
deliver the news that all the tapes were blank, since the drive had started
failing six months previously and no one had noticed.
Periodically, it's a good idea to attempt to restore one of your backup
tapes to confirm that everything is working properly. Do this to another
partition, or even to another server if that's an option. Many times it's
not. At the very least, turn on the 'verify after backup' feature whenever
possible; this will confirm that the data is at least being written to the
tape in an uncorrupted fashion.
Make sure that you have all the appropriate options for your packages.
The Microsoft Exchange databases as well as MS SQL Server databases require
special backup options. Files that are consistently skipped due to being
open may also present a problem (once in a while isn't an issue, but if they
fail every night then they may well never get backed up). Make sure the
system registry or system state is backed up, as well.
|
What Should Be Backed Up?
| |
Tape backup systems are usually too expensive to cost-justify deployment
of them on individual workstations. Therefore, an important part of any
backup systems is training users and configuring their workstations so that
important and/or critical files are stored on the server where they can be
easily backed up. Remote backup agents can also dynamically retrieve
information from users' hard drives, but it can become difficult to locate
all the places files might be saved. Saving the entire hard disk drive
becomes difficult due to the probability of excess data getting backed up
and overflowing the size of the tape.
Also, it's important to rotate your tapes off-site at least occasionally.
Many companies have chosen to use fire-proof safes. Unfortunately, these
safes are rated only to a certain temperature. They are also designed to
protect paper documents, which do not typically get damaged until
temperatures exceed the boiling point of water. Tape, however, can be
damaged by much lower temperatures. Most companies should consider a
once-weekly off-site rotation policy. Simply creating two Friday tapes and
sending one of the two tapes home with a trusted employee can solve the
problem. The employee brings in the previous Friday tape and leaves with
the new Friday tape. This provides a recovery procedure in the event of
fire, tornado, or in the event of a theft of the systems along with the
backup tapes (which has happened to at least one of my clients in the
past).
Nobody wants to think about disaster recovery, but backups are are you final
line of defense -- the only thing, in some circumstances, that can prevent a
complete loss of data. Depending on your company's needs, a simple tape
backup solution may not be sufficient. Some companies perform periodic or
even 'real-time' backups of crucial data to an off-site company that
archives and warehouses the data in the event that it's needed. For most
smaller companies, the steps described here will successfully protect
against many kinds of failures. Still, the backup system is not something
to be taken for granted, but instead prepared for the day when,
unfortunately, it might be needed.
Small businesses who are looking for a backup solution may wish to
strongly consider our new FireBak system. It
provides a faster, cheaper, bigger, and more reliable alternative to tape
without losing any of the advantages that tape provides. Click the link
below to view information on this system.
|