Tuesday, November 27, 2012

The perils of cycling your backups

The perils of cycling your backups

We had a disturbing incident recently with a customer whose genealogy database was found to be corrupted. When the customer attempted to view a particular person in the database, the program crashed. Of course our software did not corrupt his database (it is read-only), we simply discovered the fact. The corruption was most likely caused by an obscure bug in the database software, and a rare combination of circumstances.

Our Trellis chart reads every person, event and relationship in a family history database. We stumbled upon corrupted data in this particular case. The original program is no longer supported.

The problem is that this error was probably introduced months or years ago. It was in a rarely-visited corner of a large, 140,000-person database. All of the customer's backups were recent, and they all contained the error. This was a serious problem. Even the software's integrity checking feature terminated abnormally (crashed). There was no way of recovering the data using the program. A GEDCOM export was not possible, as the program crashed when it attempted to export the record in error.

"Cycling" your backups means using the same media in rotation, such as external hard drives, or USB flash memory ("thumb drives"). For example, you use:

MondayDisk 1
TuesdayDisk 2
WednesdayDisk 3
ThursdayDisk 1Resume cycle
FridayDisk 2

On the fourth day of the cycle, you are back to Disk 1.

The advantage of this system is that it is economical. If you accidentally delete a file and catch the mistake at the time it is made, you can go back to yesterday's backup and retrieve a copy.

However, if you do not notice the error (as was the case with our customer), by the time you have cycled through all your backups, the original has disappeared. Your "Cloud" copy is also corrupted. Increasing the cycle to ten or twenty disks (a significant investment) does not solve the problem.

The solution I recommend is to backup to CD or DVD, with a checksum. The CD/DVD will last many years, and will be available to restore the last good copy of your genealogy database, long after you have discovered the error. Detailed instructions are available in an earlier post.

As an alternative, you can exercise the data integrity check of your software, if such a feature is available. The integrity check needs to be done daily.

The story has a happy ending: another program, Ancestral Quest, was able to recover the data.

2 comments:

  1. This could have been caused by bit loss, also known as "bit rot": http://en.wikipedia.org/wiki/Bit_rot

    USB flash memory drives are extremely prone to this as they are not designed for long-term use. Any type of internal or external hard drive is as well. Unfortunately, the random, often undetectable loss of data is very difficult to combat.

    I wrote about how I approached this in my latest post: http://www.thehineks.com/preserving-family-history-records-digitally/

    Unfortunately, the technique I use at home to protect against the loss of data integrity is too technically complex for most people, but the CrashPlan service is something anyone can take advantage of. The benefit of that service is that it keeps unlimited versions of your files so that you can always go back to a copy that was working. That being said, a very frequent (at least weekly), hopefully automated integrity check should be run so that you know as soon as possible when corruption has occurred.

    ReplyDelete
  2. Your point is valid, and your solution excellent.

    I was referring to corruption introduced by the application itself, due to bugs.

    ReplyDelete