When I first mentioned the title of this post to my girlfriend, she misheard and thought there was an extra “t” at the end.
One hasty explanation later I have avoided banishment to the shed. All of which is mildly ironic as the tar command comes with a whole alphabet of options, many of which are about to get used here.
As it’s name suggests, the venerable tar command ( Tape ARchive) has it’s roots back in the time when computers were the size of a small semi in Dagenham and punch cards and tapes were the acme of the Programmer’s art.
Now I’m going to use it for backing up data on my assorted Ubuntu machines.
What I want to do here is :
- work out how much data I need to backup
- create a full backup of all of my data
- make sure I know what files have been backed up
- test the restore of a file from the backup
- make subsequent incremental backups
In the course of this odyssey, we will discover that du has a human face and that tar has a bit of a yellow streak.
There are several things that can go horribly wrong when playing around with tar, so I’m going to test everything on a small subset of files…that I have safely stored elsewhere.
Speaking of which…
How much data – Duh!
The du command tells you how much space is used in files in the current directory and any sub-directories.
We need to use the “-s” switch to get a single summary total of all the space being used as opposed to that being used by each file in the directory tree. For example, if I’m in my home directory and issue the command :
$ du -s 16667664 .
OK, I think that’s the amount of data in bytes. Fortunately, there’s an option to help us mere humans ( or at least, those of us who don’t know our 1024 times table) :
$ du -sh 16G .
Making a full backup
Now for the test directory. Let’s just have a quick look at what’s there :
$ ls -l total 7420 -rw-r--r-- 1 mikes mikes 5549452 2010-12-07 17:27 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 1986965 2010-12-07 17:27 gobsmack.jpg -rw-r--r-- 1 mikes mikes 21 2010-12-07 17:27 phpinfo.php -rw-r--r-- 1 mikes mikes 33460 2010-12-07 17:27 xe_ts_notes.odt
So we’ve got one vorbis file, one jpeg ( both of which are already heavily compressed formats), a simple php text file and an Open Office Writer document.
$ du -sh 7.3M .
To make a full backup of this directory :
$ tar -cvpzf test_backup.tar.gz --exclude=test_backup.tar.gz . ./ ./phpinfo.php ./11 - Motörhead.oga ./gobsmack.jpg ./xe_ts_notes.odt tar: .: file changed as we read it
In case you’re wondering, cvpzf is not simply a really bad hand in Scrabble.
These options are :
c – create an archive
v – verbose – i.e. echo the filenames to the screen as they’re added to the archive
p – keep the existing permissions on each file within the archive – i.e. if you need to restore it later, it’ll have the same permissions as it does now
z – compress the archive
f – the name of the archive file to create (in this case backup.tar.gz)
I also used the –exclude option to stop tar trying to backup it’s own archive.
Incidentally, the first time I ran this command, I forgot to specify the starting directory ( . or current directory in this case) and tar responded by doing the equivalent of hiding behind the sofa :
tar: Cowardly refusing to create an empty archive
Anyway, if we check the contents of the directory now…
$ ls -l total 14776 -rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg -rw-r--r-- 1 mikes mikes 21 2010-11-14 18:58 phpinfo.php -rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz -rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt
Not much of a compression ratio, but my hopes weren’t high.
As an aside, it is possible to use tar with bzip2 ( using the j switch instead of z). Some quick testing revealed it wouldn’t make much of a difference in this particular instance.
Finding files in the Archive
If I want to restore a file from an archive, it would be good to know whether it’s in the archive in the first place. To list the full contents of a tar archive we can do :
$ tar -tf test_backup.tar.gz ./ ./phpinfo.php ./11 - Motörhead.oga ./gobsmack.jpg ./xe_ts_notes.odt $
The -t option here simply lists the contents of the archive.
If I wanted to find a specific file…
$ tar -tf test_backup.tar.gz ./gobsmack.jpg ./gobsmack.jpg
Restoring a file from an Archive
Saving everything is all well and good, but what if we want to get something back ?
Let’s start by deleting one of the files in the directory ( NOTE – if you’re using your one and only copy of a file for this test, I’d suggest you rename it instead) :
$ ls -l xe_ts_notes.odt -rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt $ rm -i xe_ts_notes.odt rm: remove regular file `xe_ts_notes.odt'? y $ ls -l total 14740 -rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg -rw-r--r-- 1 mikes mikes 21 2010-11-14 18:58 phpinfo.php -rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz
Now let’s see if we can get it back from the archive. Remember, we need to use the full path of the file relative to the starting directory used when the archive was created :
$ tar -xf test_backup.tar.gz ./xe_ts_notes.odt $ ls -l total 14776 -rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg -rw-r--r-- 1 mikes mikes 21 2010-11-14 18:58 phpinfo.php -rw-r--r-- 1 mikes mikes 7520167 2010-11-27 16:49 test_backup.tar.gz -rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt
The x switch is “extract”. As we can see, xe_ts_notes.odt has been restored to it’s former glory.
Incremental Backups
Messing around with a few files is one thing. Backing up the entire machine is something else.
All in all, it may not be something you want to do every time you do a backup.
Fortunately, tar lets you do incremental backups.
The first thing to do is a full “level 0″ backup. You can then make incremental backups, archiving only files that have changed since the last backup – a “level 1″ backup.
First off, the level 0 backup :
$ tar -cvzf archive1.tar.gz -g arch1.snar . ./ ./11 - Motörhead.oga ./arch1.snar ./gobsmack.jpg ./lotto.sql ./phpinfo.php ./xe_ts_notes.odt $ ls -l total 14788 -rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 155 2010-11-30 20:28 arch1.snar -rw-r--r-- 1 mikes mikes 7521299 2010-11-30 20:28 archive1.tar.gz -rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg -rw-r--r-- 1 mikes mikes 2115 2010-11-04 13:23 lotto.sql -rw-r--r-- 1 mikes mikes 21 2010-11-14 18:58 phpinfo.php -rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt
Note here that we’ve used the g switch to tell tar to keep a record of what it’s backed up.
If we now copy an additional file over ( funky.fnc) :
$ tar -cvzf archive2.tar.gz -g arch1.snar --exclude=archive1.tar.gz . ./ tar: .: file changed as we read it ./arch.snar ./funky.fnc $ ls -l total 14796 -rw-r--r-- 1 mikes mikes 5549452 2010-09-26 12:46 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 7521299 2010-11-30 20:47 archive1.tar.gz -rw-r--r-- 1 mikes mikes 430 2010-11-30 20:49 archive2.tar.gz -rw-r--r-- 1 mikes mikes 182 2010-11-30 20:49 arch1.snar -rw-r--r-- 1 mikes mikes 94 2010-11-30 20:48 funky.fnc -rw-r--r-- 1 mikes mikes 1986965 2008-03-07 20:21 gobsmack.jpg -rw-r--r-- 1 mikes mikes 2115 2010-11-04 13:23 lotto.sql -rw-r--r-- 1 mikes mikes 21 2010-11-14 18:58 phpinfo.php -rw-r--r-- 1 mikes mikes 33460 2010-04-04 15:57 xe_ts_notes.odt
So the level 1 backup now contains the new file ( funky.fnc) as well as our archive record.
This is because we’ve used the g switch again and told tar to look at arch1.snar and only backup files not in the original archive.
On the downside, I will now need to check all of the archives ( in this case, two) to work out which one holds the file I want to restore. However, this may well be offset by the faster execution of the incremental backup.
Backing up and retrieving changed files
Now, suppose I change funky.fnc ( which is, incidentally a PL/SQL function), but adding some comments so it looks like this :
CREATE OR REPLACE FUNCTION funky RETURN VARCHAR2 IS
--
-- random comment added to demonstrate tar treatment of a change
-- in a file.
BEGIN
RETURN 'Hey funky dude';
END;
/
Look, I was working on an Extreme Programming project when I wrote this, OK ? When using XP, it’s mandatory to drink Pepsi Max and call everyone dude.
Anyway, now to create a second incremental backup…
$ tar -cvzf archive3.tar.gz -g arch1.snar --exclude=*.tar.gz . ./ tar: .: file changed as we read it ./arch1.snar ./funky.fnc
If I want to retrieve the latest version…
#!/bin/sh
#
# test script to recover funky.fnc from incremental backup
#
for file in `ls -t *.tar.gz`
do
echo "Checking archive $file ..."
fexists=`tar -tf $file $1`
if [ $fexists = "$1" ]; then
tar -xf $file $1
exit 0;
fi
done
echo 'File not found.'
exit 1
With the script saved as recover.sh, let’s test it. First, delete the file :
$ rm funky.fnc $ ls -l total 36864 -rw-r--r-- 1 mikes mikes 5549452 2010-12-07 17:27 11 - Motörhead.oga -rw-r--r-- 1 mikes mikes 15041845 2010-12-11 15:39 archive1.tar.gz -rw-r--r-- 1 mikes mikes 15044776 2010-12-11 15:40 archive2.tar.gz -rw-r--r-- 1 mikes mikes 487 2010-12-11 15:50 archive3.tar.gz -rw-r--r-- 1 mikes mikes 207 2010-12-11 15:50 arch.snar -rw-r--r-- 1 mikes mikes 1986965 2010-12-07 17:27 gobsmack.jpg -rw-r--r-- 1 mikes mikes 21 2010-12-07 17:27 phpinfo.php -rw-r--r-- 1 mikes mikes 322 2010-12-11 16:42 recover.sh -rw-r--r-- 1 mikes mikes 321 2010-12-11 16:41 recover.sh~ -rw-r--r-- 1 mikes mikes 33460 2010-12-07 17:27 xe_ts_notes.odt
Now run the script :
$ . ./recover.sh ./funky.fnc
Checking archive archive3.tar.gz ...
$ ls -l funky.fnc
-rw-r--r-- 1 mikes mikes 176 2010-12-11 15:48 funky.fnc
$ cat funky.fnc
CREATE OR REPLACE FUNCTION funky RETURN VARCHAR2 IS
--
-- random comment added to demonstrate tar treatment of a change
-- in a file.
BEGIN
RETURN 'Hey funky dude';
END;
/
And with that, I’ll say “tar tar” for now.
Pingback: Data Protection Manager 2010 migration successes and challenges | IT Security, Hacking, Vulnerability alerts, IT Leadership and more