Tuesday, November 27, 2012

The perils of cycling your backups

The perils of cycling your backups

We had a disturbing incident recently with a customer whose genealogy database was found to be corrupted. When the customer attempted to view a particular person in the database, the program crashed. Of course our software did not corrupt his database (it is read-only), we simply discovered the fact. The corruption was most likely caused by an obscure bug in the database software, and a rare combination of circumstances.

Our Trellis chart reads every person, event and relationship in a family history database. We stumbled upon corrupted data in this particular case. The original program is no longer supported.

The problem is that this error was probably introduced months or years ago. It was in a rarely-visited corner of a large, 140,000-person database. All of the customer's backups were recent, and they all contained the error. This was a serious problem. Even the software's integrity checking feature terminated abnormally (crashed). There was no way of recovering the data using the program. A GEDCOM export was not possible, as the program crashed when it attempted to export the record in error.

"Cycling" your backups means using the same media in rotation, such as external hard drives, or USB flash memory ("thumb drives"). For example, you use:

MondayDisk 1
TuesdayDisk 2
WednesdayDisk 3
ThursdayDisk 1Resume cycle
FridayDisk 2

On the fourth day of the cycle, you are back to Disk 1.

The advantage of this system is that it is economical. If you accidentally delete a file and catch the mistake at the time it is made, you can go back to yesterday's backup and retrieve a copy.

However, if you do not notice the error (as was the case with our customer), by the time you have cycled through all your backups, the original has disappeared. Your "Cloud" copy is also corrupted. Increasing the cycle to ten or twenty disks (a significant investment) does not solve the problem.

The solution I recommend is to backup to CD or DVD, with a checksum. The CD/DVD will last many years, and will be available to restore the last good copy of your genealogy database, long after you have discovered the error. Detailed instructions are available in an earlier post.

As an alternative, you can exercise the data integrity check of your software, if such a feature is available. The integrity check needs to be done daily.

The story has a happy ending: another program, Ancestral Quest, was able to recover the data.

Wednesday, September 26, 2012

Trellis: the Chart With Everyone In It


Trellis: the Chart With Everyone In It

Progeny introduces the latest innovation in genealogy - the Trellis chart, "the chart with everyone in it".

trellis_bart.pngThe Trellis chart is a brand new way to tell the story of your family. It shows everyone in your family, in a way that no traditional box chart can do.

Based on research by a group of scientists, the Trellis is a diagonally-filled matrix, where rows are individuals and columns are nuclear families. Click here for more information.

The Trellis allows for interactive investigation of your family tree. With one click, you can highlight all the ancestors and descendants of an individual. Click a second person and you can see where their pedigrees intersect in a colorful display. Collapse the tree for a condensed view. Navigate up and down the tree with a simple click.

Charting Companion with the Trellis is available for:


Coming soon: Trellis for MyHeritage, Geni, FamilySearch.

Get Charting Companion today and see your family in a Trellis!


Tuesday, August 7, 2012

Effective backups

We all know that backups are important, right? It's not if your hard drive will fail, it's when. Yet we don't all do them regularly because it's a pain in the butt, right?

Well here's a practical solution, which I use daily. You will need a copy of Nero disk burning software, and MD5 checksum software (Advanced CheckSum Verifier is perfect for this, well worth the modest price).

Here is a Windows batch file that will perform the whole backup automatically, all you need to do is load a blank DVD every day. Just copy the batch file, modify it in Notepad with your specific folder names, save it on your hard drive and schedule it to run daily (or weekly if you don't mind losing six days' work).

:: The DVD burning command-line program
set NERO="C:\Program Files\Ahead\Nero\nerocmd.exe"
:: The MD5 checksum calculator
set ACSV=C:\Program Files\Checksum Verifier\acsv.exe
:: '/p2' = checksum file in each folder, '/s+' process subfolders
set OPT=/f0 /p2 /s+
:: ------------------------------------------------
:: 1- CALCULATE CHECKSUMS of all the directories you will backup
:: '/u' = update (create) checksum files, '/o+' disable prompt for overwriting checksum files
:: Results stored in '\temp\checksum1.log'
"%acsv%" /u "C:\My First Folder\" /o+ %OPT% /a \temp\checksum1.log
"%acsv%" /u "C:\My Second Folder\" /o+ %OPT%  /a \temp\checksum2.log
"%acsv%" /u "C:\Users\John\" /o+ %OPT%  /a \temp\checksum3.log

:: ------------------------------------------------ 
 :: 2- BURN DVD (or copy to disk) ('^' splits command over many lines)
%NERO% --write --drivename D --real --iso "%DATE%_MY_BACKUP" --dvd ^
    --recursive --speed 4 --no_user_interaction  ^
     "C:\My First Folder" "C:\My Second Folder" ^
     "C:\Users\John"
:: ------------------------------------------------  
:: 3- VERIFY CHECKSUMS
%NERO% --load --drivename D
"%acsv%" /v D:\ %OPT% /a \temp\checksum4.log
:: ------------------------------------------------  
:: 4- Eject disc
%NERO% --drivename D --eject
:: ------------------------------------------------ 
:: 5- Display Checksum verification results
notepad \temp\checksum4.log
pause

Here is an explanation of the commands.

The double colon '::' at the beginning of the line makes it a comment.

The 'set' commands at the beginning create shortcuts that make the rest of the commands easier to read and edit.  When preceded and followed by '%' percent, the shortcut is replaced with the fully expanded equivalent. Substitute the drive and folders where you installed Nero and ACSV on your PC.

Step 1 calculates the checksum files on your source (original) disk. The checksum files are small files called "md5sum.lst". There will be one written in each folder. Substitute the name of your own folders in the batch file. You can list as many as you want.

The results of the checksum calculation or verification is written to a log file ('/a' parameter).

Step 2 burns your data, including the checksum files, to a CD or DVD. If you prefer to backup to a hard drive, or over the network, replace this step with one or more XCOPY commands. Change the drive letter in "--drivename D" with your own CD or DVD drive letter. The caret (or circumflex) '^' allows you to spread the command over many lines.

Replace "C:\Users\John" with the name that you sign on with.

Step 3 verifies the checksum on the destination (target) CD, DVD or hard drive. A checksum is a hash total of all the bytes in the file. Every file has a unique checksum. If there were any errors during the copying, the re-calculated checksum of the copy will be different from the original checksum. The checksum verification program won't be able to tell you where the error is, but it will catch it. If there is an error, you know your copy is bad, and you must re-run the backup. In three years of running daily backups over two DVDs, I have had one error.

The "--load" command closes the drive droor, which Nero will have ejected.

Step 4 ejects the CD or DVD, letting you know the job completed. Step 5 displays the result of the verification. You should see zero errors. Again, replace with your CD/DVD drive letter.

Save this file, call it "mybackup.bat", in a folder called "C:\batchjobs".

IMPORTANT: The folder names listed for ACSV must be followed by a backslash ('\'). The folder names for Nero MUST NOT have a trailing backslash.

To schedule the backup in Windows 7, click on Start, Control Panel, Administrative Tools. Double-click on Task Scheduler. In the menu, click on Action, Create Task. In the General tab, give the task a name, and Change User to Administrator. In the Action tab, browse to your batch file. In the Trigger tab, set the time and frequency. I run my backup at 3:00 AM.

Every morning, all you have to do is pluck the disc from the CD/DVD drive, label it, file it in a paper sleeve for the cheapest storage.

Another benefit of this solution is that the checksum files remain permanently in the backup directories. You  can run the checksum verification any time to confirm the validity of the backup, long after the original has been modified.

You can compare the CD/DVD to the original hard drive with the best utility ever designed to do this: Beyond Compare, worth many times its price.

Occasionally, the automatic Windows upgrade might mess up the job by rebooting in the middle of a backup. Not a problem: just run the batch job manually as Administrator.

The advantage of CD/DVD over cycling a set of hard drives or "thumb drives" (memory sticks), is that by the time you discover one of your files is corrupted, it might be too late, as the corrupted copy will have been spread over all available media.

BTW, I also backup to Mozy, I do a full disk backup to an external drive with Acronis, and I copy to another PC over the network.