Twiddling with tar – Differential backups on Linux

When I first mentioned the title of this post to my girlfriend, she misheard and thought there was an extra “t” at the end.
One hasty explanation later I have avoided banishment to the shed. All of which is mildly ironic as the tar command comes with a whole alphabet of options, many of which are about to get used here.

As it’s name suggests, the venerable tar command ( Tape ARchive) has it’s roots back in the time when computers were the size of a small semi in Dagenham and punch cards and tapes were the acme of the Programmer’s art.

Now I’m going to use it for backing up data on my assorted Ubuntu machines.
What I want to do here is :

  • work out how much data I need to backup
  • create a full backup of all of my data
  • make sure I know what files have been backed up
  • test the restore of a file from the backup
  • make subsequent incremental backups

In the course of this odyssey, we will discover that du has a human face and that tar has a bit of a yellow streak.
There are several things that can go horribly wrong when playing around with tar, so I’m going to test everything on a small subset of files…that I have safely stored elsewhere.
Speaking of which…

How much data – Duh!

The du command tells you how much space is used in files in the current directory and any sub-directories.
We need to use the “-s” switch to get a single summary total of all the space being used as opposed to that being used by each file in the directory tree. For example, if I’m in my home directory and issue the command :

$ du -s 
16667664	. 

OK, I think that’s the amount of data in bytes. Fortunately, there’s an option to help us mere humans ( or at least, those of us who don’t know our 1024 times table) :

$ du -sh 
16G	. 

Making a full backup

Now for the test directory. Let’s just have a quick look at what’s there :

$ ls -l 
total 7420 
-rw-r--r-- 1 mikes mikes 5549452 2010-12-07 17:27 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 1986965 2010-12-07 17:27 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes      21 2010-12-07 17:27 phpinfo.php 
-rw-r--r-- 1 mikes mikes   33460 2010-12-07 17:27 xe_ts_notes.odt

So we’ve got one vorbis file, one jpeg ( both of which are already heavily compressed formats), a simple php text file and an Open Office Writer document.

$ du -sh 
7.3M	. 

To make a full backup of this directory :

$ tar -cvpzf test_backup.tar.gz --exclude=test_backup.tar.gz . 
./ 
./phpinfo.php 
./11 - Motörhead.oga 
./gobsmack.jpg 
./xe_ts_notes.odt 
tar: .: file changed as we read it 

In case you’re wondering, cvpzf is not simply a really bad hand in Scrabble.
These options are :
c – create an archive
v – verbose – i.e. echo the filenames to the screen as they’re added to the archive
p – keep the existing permissions on each file within the archive – i.e. if you need to restore it later, it’ll have the same permissions as it does now
z – compress the archive
f – the name of the archive file to create (in this case backup.tar.gz)

I also used the –exclude option to stop tar trying to backup it’s own archive.

Incidentally, the first time I ran this command, I forgot to specify the starting directory ( . or current directory in this case) and tar responded by doing the equivalent of hiding behind the sofa :

tar: Cowardly refusing to create an empty archive

Anyway, if we check the contents of the directory now…

$ ls -l 
total 14776 
-rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes      21 2010-11-14 18:58 phpinfo.php 
-rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz 
-rw-r--r-- 1 mikes mikes   33460 2010-04-04 15:57 xe_ts_notes.odt 

Not much of a compression ratio, but my hopes weren’t high.
As an aside, it is possible to use tar with bzip2 ( using the j switch instead of z). Some quick testing revealed it wouldn’t make much of a difference in this particular instance.

Finding files in the Archive

If I want to restore a file from an archive, it would be good to know whether it’s in the archive in the first place. To list the full contents of a tar archive we can do :

$ tar -tf test_backup.tar.gz 
./ 
./phpinfo.php 
./11 - Motörhead.oga 
./gobsmack.jpg 
./xe_ts_notes.odt 
$

The -t option here simply lists the contents of the archive.
If I wanted to find a specific file…

$ tar -tf test_backup.tar.gz  ./gobsmack.jpg 
./gobsmack.jpg 

Restoring a file from an Archive

Saving everything is all well and good, but what if we want to get something back ?
Let’s start by deleting one of the files in the directory ( NOTE – if you’re using your one and only copy of a file for this test, I’d suggest you rename it instead) :

$ ls -l xe_ts_notes.odt 
-rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt 
$ rm -i xe_ts_notes.odt 
rm: remove regular file `xe_ts_notes.odt'? y 
$ ls -l 
total 14740 
-rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes      21 2010-11-14 18:58 phpinfo.php 
-rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz

Now let’s see if we can get it back from the archive. Remember, we need to use the full path of the file relative to the starting directory used when the archive was created :

$ tar -xf test_backup.tar.gz ./xe_ts_notes.odt 
$ ls -l 
total 14776 
-rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes      21 2010-11-14 18:58 phpinfo.php 
-rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz 
-rw-r--r-- 1 mikes mikes   33460 2010-04-04 15:57 xe_ts_notes.odt 

The x switch is “extract”. As we can see, xe_ts_notes.odt has been restored to it’s former glory.

Incremental Backups

Messing around with a few files is one thing. Backing up the entire machine is something else.
All in all, it may not be something you want to do every time you do a backup.
Fortunately, tar lets you do incremental backups.
The first thing to do is a full “level 0″ backup. You can then make incremental backups, archiving only files that have changed since the last backup – a “level 1″ backup.
First off, the level 0 backup :

$ tar -cvzf archive1.tar.gz -g arch1.snar . 
./ 
./11 - Motörhead.oga 
./arch1.snar 
./gobsmack.jpg 
./lotto.sql 
./phpinfo.php 
./xe_ts_notes.odt 
$ ls -l 
total 14788 
-rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes     155 2010-11-30 20:28 arch1.snar 
-rw-r--r-- 1 mikes mikes 7521299 2010-11-30 20:28 archive1.tar.gz 
-rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes    2115 2010-11-04 13:23 lotto.sql 
-rw-r--r-- 1 mikes mikes      21 2010-11-14 18:58 phpinfo.php 
-rw-r--r-- 1 mikes mikes   33460 2010-04-04 15:57 xe_ts_notes.odt 

Note here that we’ve used the g switch to tell tar to keep a record of what it’s backed up.
If we now copy an additional file over ( funky.fnc) :

$ tar -cvzf archive2.tar.gz -g arch1.snar --exclude=archive1.tar.gz . 
./ 
tar: .: file changed as we read it 
./arch.snar 
./funky.fnc 
$ ls -l 
total 14796 
-rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 7521299 2010-11-30 20:47 archive1.tar.gz 
-rw-r--r-- 1 mikes mikes     430 2010-11-30 20:49 archive2.tar.gz 
-rw-r--r-- 1 mikes mikes     182 2010-11-30 20:49 arch1.snar 
-rw-r--r-- 1 mikes mikes      94 2010-11-30 20:48 funky.fnc 
-rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes    2115 2010-11-04 13:23 lotto.sql 
-rw-r--r-- 1 mikes mikes      21 2010-11-14 18:58 phpinfo.php 
-rw-r--r-- 1 mikes mikes   33460 2010-04-04 15:57 xe_ts_notes.odt 

So the level 1 backup now contains the new file ( funky.fnc) as well as our archive record.
This is because we’ve used the g switch again and told tar to look at arch1.snar and only backup files not in the original archive.

On the downside, I will now need to check all of the archives ( in this case, two) to work out which one holds the file I want to restore. However, this may well be offset by the faster execution of the incremental backup.

Backing up and retrieving changed files

Now, suppose I change funky.fnc ( which is, incidentally a PL/SQL function), but adding some comments so it looks like this :

CREATE OR REPLACE FUNCTION funky RETURN VARCHAR2 IS 
-- 
-- random comment added to demonstrate tar treatment of a change 
-- in a file. 
BEGIN 
    RETURN 'Hey funky dude'; 
END; 
/

Look, I was working on an Extreme Programming project when I wrote this, OK ? When using XP, it’s mandatory to drink Pepsi Max and call everyone dude.

Anyway, now to create a second incremental backup…

$ tar -cvzf archive3.tar.gz -g arch1.snar --exclude=*.tar.gz . 
./ 
tar: .: file changed as we read it 
./arch1.snar 
./funky.fnc 

If I want to retrieve the latest version…

#!/bin/sh 
# 
# test script to recover funky.fnc from incremental backup 
# 
for file in `ls -t *.tar.gz` 
do 
    echo "Checking archive $file ..." 
    fexists=`tar -tf $file $1` 
    if [ $fexists = "$1" ]; then 
      tar -xf $file $1 
      exit 0; 
    fi 
done 
echo 'File not found.' 
exit 1 

With the script saved as recover.sh, let’s test it. First, delete the file :

$ rm funky.fnc 
$ ls -l 
total 36864 
-rw-r--r-- 1 mikes mikes  5549452 2010-12-07 17:27 11 - Motörhead.oga 
-rw-r--r-- 1 mikes mikes 15041845 2010-12-11 15:39 archive1.tar.gz 
-rw-r--r-- 1 mikes mikes 15044776 2010-12-11 15:40 archive2.tar.gz 
-rw-r--r-- 1 mikes mikes      487 2010-12-11 15:50 archive3.tar.gz 
-rw-r--r-- 1 mikes mikes      207 2010-12-11 15:50 arch.snar 
-rw-r--r-- 1 mikes mikes  1986965 2010-12-07 17:27 gobsmack.jpg 
-rw-r--r-- 1 mikes mikes       21 2010-12-07 17:27 phpinfo.php 
-rw-r--r-- 1 mikes mikes      322 2010-12-11 16:42 recover.sh 
-rw-r--r-- 1 mikes mikes      321 2010-12-11 16:41 recover.sh~ 
-rw-r--r-- 1 mikes mikes    33460 2010-12-07 17:27 xe_ts_notes.odt 

Now run the script :

$ . ./recover.sh ./funky.fnc
Checking archive archive3.tar.gz ... 
$ ls -l funky.fnc 
-rw-r--r-- 1 mikes mikes 176 2010-12-11 15:48 funky.fnc 
$ cat funky.fnc 
CREATE OR REPLACE FUNCTION funky RETURN VARCHAR2 IS 
-- 
-- random comment added to demonstrate tar treatment of a change 
-- in a file. 
BEGIN 
    RETURN 'Hey funky dude'; 
END; 
/ 

And with that, I’ll say “tar tar” for now.

About these ads

One thought on “Twiddling with tar – Differential backups on Linux

  1. Pingback: Data Protection Manager 2010 migration successes and challenges | IT Security, Hacking, Vulnerability alerts, IT Leadership and more

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s