Projects

Backup Strategy

Project goals

Develop a low-cost, cross platfrom backup system, which is reasonably safe and not tedious to run, so that it suits private comupter users.

Note

Let's straighten this out first: I do not need a professional setup. They are easy to be described and not so hard to be implemented, but the costs are way too high for a private person.

For the interested reader: A professional setup would consinst of at least a dedicated file server with a tape for backups. The tapes would be rotated through, e.g. in a grandfather-father-son way (see Wikipedia for more information on that), and full backups would be used from time to time. You'd make two copies each and keep them in different places. An additional copy in the mail, which you resend to yourself the moment you get it increases security of loss by hardware destruction, but of course your data, which may be sensitive, is in the hands of other people.

This project will most probably be bound quite tight to the synchronization strategy?, which I need to work out, too.


Project requirements

  • The solution must be low-cost, in a student's understanding of that term.
    • Initial costs: no more than half of a regular PC. I talk about one for home office use. Not one of those game machines, where the video card is more expensive than my notebook. So let's make that 1 EUR per GB because it's a number easy to calculate with.
    • Operational cost: As low as possible, of course. I'd say that 0.10 EUR per day is achievable.
    • Example calculation: 250 GB * 1 EUR/GB + 0.10 EUR/day * 365 days/year * 3 years = 359.50 EUR (about 120 EUR per year)
  • The backup system must be cross-platform.
    • It must be able to backup data I create on Windows.
    • It must be able to backup data I create on Linux.
    • Preserving file attributes would be nice, but is not necessary.
  • The solution must be reasonably safe.
    • Note: You can't have 100% security. That is a fact.
    • Data storage must be redundant in some way, so that hardware failure doesn't mean total data loss.
  • Dayly usage may not be tedious.
    • Preferably the whole procedure runs fully automated without further user interaction.
      • Full automatic mode must be achievable for a static PC setup (i.e. a PC at a desk).
      • Semi automatic mode must be supported for any (mobile) setup, e.g. by executing a script file. Semi automatic means, that the process is started manually, but runs automatically afterwards.
    • The backup procedure must be able to run without a GUI and be started from a script.
    • The backup procedure must provide means of adjusting by either using command line options or a configuration file.

First sketch

Concept

  • USB disks are cheap and well supported under most OSs. Therefore they are our friends.
    • For the backup I bought a Trekstor 3.5" 500 GB USB 2.0 HDD with an external power supply. It cost me about 120 EUR.
    • For working data I use my old Trekstor 3.5" 120 GB USB 2.0 HDD, because it was already there.
      Note: This extra HDD is not used for backup storage, it's the backup's source. It's only for my personal convenience, because I switch computers quite often.
      I choose to work on an external drive, because it got really tedious to keep the different built-in HDDs of my computer systems in sync.
      If I had to buy one now, I'd go for either an 8 GB USB memory stick (or "pen drive" or whatever you call it) or a 2.5" USB HDD, because they can be bus powered and therefore are also highly portable. Right now (2007-12-19T23:56Z) you can buy an external 2.5" HDD with 160 GB for 82.99 EUR at my favourite local store's web site.
      I have to admit, that there are external drives, that support various remote file access mechanisms (of which SMB (or CIFS) is most interesting, because that is what Windows can handle. Linux and Mac can use it too, of course). Maybe such a drive, hooked up to your LAN switch would be the better idea, but a 2.5" drive is more portable in case your computers don't live on the same network.
  • rsync is the tool of choice for copying the files. Rsync is very efficient, because it can handle file differences (also referred to as deltas):
    Using a simple copy comand (e.g. from your file manager of choice) always copies the entire file. Rsync copies only the part that has changed. If you are curious about the details, consult Wikipedia.

Now let's see, how this reflects on the requirements:

  • low cost: I could backup 500 GB of data. The costs are 120 EUR. That's 0.24 EUR per GB.
  • cross-platform: rsync is available for Linux and there is also a Windows build available.
  • reasonably safe:
    • The employed technologies in both hard- and software are well tested.
    • The data is mirrored. Any hard drive (even the backup) can go haywire without losing any data. You should buy a new one the next day though, if it happens, just so redundance can be restored quickly.
    • This solution is not fire proof. If your house burns down, your data goes with it. But in this case I suppose you'd have worse things to worry about and the loss of data would be a minor problem.
      You could buy a fire safe box and deposit the backup drive in there and only fetch for the backup once a week, but that's your own problem. Of course, automatic opreation would not be possible then.
  • Not tedious: Rsync is operated from the command line. It can be run without any user interaction at all. This means, that using scripts for semi-automatic backups is no problem. Full automatic operation is possible by employing the OS's task scheduling mechanisms to start the scripts (e.g. crond on Linux).

After all, this concept looks sane to me. I'll post here as soon as I have any results to report.