Bacula had been working as our backup/restore solution for just over a year and a half. Aside from some juggling of backup pools to better accommodate growth, we'd had to do very little to keep it running. Restoration of our FreeBSD servers from bare metal never really worried me. The FreeBSD installation media itself has powerful recovery features and based on reading, it seemed that recovering a UNIX system would be fairly straightforward (I've done it with just dumps dozens of times).
I was far less certain about restoring our Windows 2000 systems from bare metal. The file set that can be used on a Windows 2000 system intentionally bypasses a number of system files which cannot be backed up due to the Windows kernel file locking mechanisms which prevent a simple copy. Volume Shadow Copy (finally) corrected this problem, but that service was only available on Windows 2003 and above, and it's not a good idea to make that upgrade just for that feature.
So, we embarked on a project to determine if we could recover a Windows 2000 system from bare metal using only Bacula backups. There were many false starts. Several pitfalls. We did some things to these systems that would make a Microsoft Professional Support Services engineer shake with fear, anger, and revulsion. In the end though, we can recover our Windows systems from catastrophic failure fairly quickly.
This guide is divided into three sections which I'll call Bliss, Panic, and Relief.
All of the servers are up. All of the customers pay on time. All the data is intact, and available. Now is the time to plan a proper disaster recovery situation, before the excrement hits the air circulation device. There are two things to do here - set up your servers and build your recovery disk.
By convention, we drop/move all of the files resulting from Steps 2-4 into C:\SystemState. It's a very obvious and standard looking place to find things.
Build a BartPE recovery disk from the instructions located at http://www.nu2.nu/pebuilder/. The guy is a genius - use that to your advantage. I tweaked my recovery disk somewhat to allow me to choose what network and bacula server were being used (I use the disk both at home and at the office) but you don't have to - the bacula plug-in works fine once you tweak the configuration files.
One (or more!) of the servers are down. Customers are threatening to never pay you again. Data integrity is now a question mark, hovering over your head. Now is the time to not panic (despite your intense desire to do so). First - analyze the situation careful and determine if you really, really need to do a bare metal restore. You can usually tell if the system is a smoking pile of ruin (I had an NCR Windows NT server catch on fire once in my presence. It was - exciting.) Some Windows errors cause the OS to get into such a damaged state that even if you recover it, you're not going to be sure it's okay. This is important, because after this proceedure, you can't really guarantee it'll be 100% right anyway. If you want that - use UNIX, my friend. I restored a partition on one of my UNIX boxes while I was logged in and using KDE - both KDE and bacula-fd were resident on that partition, and continued to run even after rm had it's way with the file system (it was late, my finger slipped, honest.) If you show me a Windows install that can survive the trashing of half of the file system, I'll eat a car. But enough about me.
The system is back up. Customers stopped harassing your help desk. All of the data appears to be there. Now would probably be a good time to get some sleep or at least a meal. You're not done yet.
At this point, you're probably out of the woods.
We treat the systems and data pretty independently. System software is on C:, data is on D:, for instance. By doing this, we insure that bacula is always getting our customers data all the time (off of the D: drive) and it's getting enough of C: to let the server come back up and do what it was doing before without a lot of manual reconfiguration. Something else we can do is restore a dead boxes SystemState directory and D: to another server and "merge" them together. This can be done very efficiently, since we don't have to wait for a large amount of useless system software to come off the tape, and we didn't have to laboriously go through file system selection to get there.
Windows 2003 with IIS6 has better backup/recovery tools than IIS5, however, there are a lot of caveats to upgrading which may make this impossible or impractical (search for CDONTS Windows 2003 some time..). Windows 2003 also has Shadow Volume Copy and ASR, making this entire document fairly moot.
We keep copies of our SQL database backups on two local servers. If one dies, we have the option to restore those databases to another DB server or recover the dead DB server from bare metal. Before bacula, we were using Amanda and really didn't have this all down pat, and we ended up with 10 hours of downtime for our trouble after a RAID tossed two disks out at once, killing a 8 disk array. By storing our SQL backups on multiple servers instead of just the target and tape, we can can start to recover from this situation within minutes and be done in a fraction of the time.
VMWare (or qemu, or bochs) makes testing this sort of thing very fast, because you can quickly iterate through various BartPE configuratons. You can also test the backup restore process itself much faster than blasting a real box. Do the test on a real box, though, both to verify it works with your hardware and frankly to get practice doing it. People who remain calm under pressure work better, get more raises, and are more attractive to their chosen dating pool. You learn how to remain calm by knowing what you're doing, and by having done it before.
Here is our Windows File Set Configuration. Some of the items in here are specific to our setup, but you should be able to get the gist. We use "WIN*" to catch both WINNT and WINDOWS, since we use this set on both Win2K and Win2K3 boxes.
FileSet {
Name = the-set
IgnoreFileSetChanges = yes
Include {
Options {
wilddir = "C:/Documents and Settings/*/Application Data/*/Profiles/*/*/Cache"
wilddir = "C:/Documents and Settings/*/Desktop"
wilddir = "C:/Documents and Settings/*/Local Settings/History"
wilddir = "C:/Documents and Settings/*/Local Settings/Temporary Internet Files"
wilddir = "C:/Documents and Settings/*/Local Settings/Temp"
wilddir = "C:/WIN*/$Nt*Uninstall*"
wilddir = "C:/WIN*/CSC"
wilddir = "C:/WIN*/Internet Logs"
wilddir = "C:/WIN*/Microsoft.NET/Framework/v1*/Temporary ASP.NET Files"
wilddir = "C:/WIN*/msdownld.tmp"
wilddir = "C:/WIN*/system32/LogFiles"
wilddir = "C:/WIN*/system32/MsDtc/Trace"
wilddir = "C:/WIN*/system32/Perflib*"
wilddir = "C:/WIN*/system32/config"
wilddir = "C:/WIN*/system32/wbem/Repository/FS"
wilddir = "C:/WIN*/SYSVOL/domain/DO_NOT_REMOVE_NtFrs_PreInstall_Directory"
wilddir = "C:/WIN*/SYSVOL/sysvol/*/DO_NOT_REMOVE_NtFrs_PreInstall_Directory"
wilddir = "C:/WIN*/Temp"
wilddir = "[A-Z]:/RECYCLER"
wilddir = "[A-Z]:/System Volume Information"
wilddir = "[A-Z]:/Temp"
wilddir = "[A-Z]:/Tmp"
wilddir = "[A-Z]:/WUTemp"
wildfile = "C:/Documents and Settings/*/Application Data/Microsoft/CLR Security Config/v1*/security.config.cch*"
wildfile = "C:/Documents and Settings/*/ASPNET/Application Data/Microsoft/CLR Security Config/v1*/security.config.cch*"
wildfile = "C:/Documents and Settings/*/Local Settings/Application Data/Microsoft/Windows/USRCLASS.*"
wildfile = "C:/Documents and Settings/*/NTUSER.*"
wildfile = "C:/Documents and Settings/*/Cookies/*"
wildfile = "C:/WIN*/Debug/PASSWD.LOG"
wildfile = "C:/WIN*/Debug/NtFrs*.log"
wildfile = "C:/WIN*/Microsoft.NET/Framework/v1*/CONFIG/enterprisesec.config.cch*"
wildfile = "C:/WIN*/Microsoft.NET/Framework/v1*/CONFIG/security.config.cch*"
wildfile = "C:/WIN*/NETLOGON.CHG"
wildfile = "C:/WIN*/NTDS/edb.log"
wildfile = "C:/WIN*/NTDS/ntds.dit"
wildfile = "C:/WIN*/NTDS/temp.edb"
wildfile = "C:/WIN*/ntfrs/jet/log/edb.log"
wildfile = "C:/WIN*/ntfrs/jet/ntfrs.jdb"
wildfile = "C:/WIN*/ntfrs/jet/temp/tmp.edb"
wildfile = "C:/WIN*/Registration/*.crmlog"
wildfile = "C:/WIN*/SchedLgU.Txt"
wildfile = "C:/WIN*/security/logs/scepol.log"
wildfile = "C:/WIN*/security/edb.log"
wildfile = "C:/WIN*/security/edbtmp.log"
wildfile = "C:/WIN*/security/log.edb"
wildfile = "C:/WIN*/system32/DTCLog/MSDTC.LOG"
wildfile = "C:/WIN*/system32/ias/*.ldb"
wildfile = "C:/WIN*/system32/ias/*.mdb"
wildfile = "C:/WIN*/system32/MsDtc/MSDTC.LOG"
wildfile = "C:/WIN*/system32/inetsrv/urlscan/urlscan.*.log"
wildfile = "C:/WIN*/system32/wbem/Repository/CIM.REP"
wildfile = "C:/WIN*/system32/windows media/server/NamespaceDelta.xml"
wildfile = "C:/WIN*/Tasks/SchedLgU.Txt"
wildfile = "[A-Z]:/pagefile.sys"
wildfile = "[A-Z]:/Program Files/APC/PowerChute Business Edition/agent/data.dat"
wildfile = "[A-Z]:/Program Files/APC/PowerChute Business Edition/agent/DataLog"
wildfile = "[A-Z]:/Program Files/APC/PowerChute Business Edition/server/data.dat"
wildfile = "[A-Z]:/Program Files/APC/PowerChute Business Edition/server/debug.txt"
Exclude = yes
}
Options { signature=MD5; }
File = C:/
File = D:/
}
}