...making Linux just a little more fun!

<-- prev | next -->

Poor Man's Laptop

By Gerrit Renker

Introduction

But you know nowadays
It's the old man
He's got all the money
And a young man ain't got nothin' in the world these days

- The Who `Young Man Blues'

Laptops are fine, if you can afford them. Even if you cannot afford one, often enough the situation arises that computer work has to be done at home - whether you are studying, doing business charts, programming, Web designing....

This article presents a reliable and truly cheap alternative to buying a laptop for keeping up with work/study demands.

Scripting, not purchasing

USB stick image

My situation forced me to prepare work at home, but a laptop was far too expensive. Perhaps due to the popularity of laptops, it is actually much cheaper to buy a second-hand PC. A portable hard drive would have been a nice thing, but requires synchronising directory content between three locations - the portable drive, the computers at home, and those at work.

The solution presented here is possibly the cheapest possible and based on an inexpensive USB stick; the required `intelligence' to reliably synchronise directories is provided by a set of scripts. Unlike hardware-based solutions, this works across different distributions and even operating systems. The principle is simple, but there are a number of subtle problems that have been ironed out during three years of use (with my second-hand PC) on a daily basis.

Overview

The USB directory synchroniser consists of the following:

  1. A central Perl script to track file modification times.
  2. A mode of mounting the stick which ensures file integrity.
  3. Add-ons to the script, such as timestamping, additional archives, and command execution.
  4. Subsidiary scripts for adding/reading important notes and remote command execution.

This all is too much for one article; hence, the basic functionality (first two items) is covered by this article, the remaining items follow in a subsequent part.

This part starts with a reduced, but fully functional `bare-bones' script, to explain core functionality, how to set up, and how to hack it. When the full script follows in part II, its functionality will then be obvious - it is not more complex, but just has more bells and whistles.

If you are a Perl hacker, you should feel at liberty to skim through the following sections and move on to the more complex version.

Reliable archives versus removable media

The daily archive has a fixed name; mine is called `actual.2' and is located in $flashdir:

$flashdir =  ${ENV}{'VOL'} || "/vol/flash";
$tarfile  = "$flashdir/actual.2";

The first line sets the mount point, which can be overridden using the VOL environment variable. (For example, you could say export VOL=/vol/floppy and store the archive on a floppy disk.)

Mounting happens as with any other filesystem (e.g., hard drive partitions), but with one important difference: the system may be able to tell when a USB stick is inserted, but it can not reliably tell or sync any unflushed file system buffers when the stick is pulled out. As convenient as the `storage media' icons are on Gnome/KDE desktops, I have found them unsatisfactory for the purpose of archiving: more than once I have ended up with corrupt, half-full archives, this way. Therefore, a different alternative is presented here; and there is additional provision to make sure that the archive does indeed end up on the USB stick.

Mounting the USB stick

The safest bet, but tedious, is to always mount(8) / umount(8) the stick:

mount -t vfat  /dev/sda1 /vol/flash

USB uses the SCSI subsystem; therefore, in most cases the first USB stick stuck into your computer will appear as /dev/sda1; check this via dmesg and/or in /var/log/messages.

A better alternative to manual mounting is the automounter, which auto-mounts a directory the first time one tries to access (read) it; and umounts automatically after a fixed timeout.

Automatic mounting

Most, if not all, systems come with the automounter per default (man autofs(8)); it is started at boot-time via /etc/init.d/autofs. The automounter is configured via so-called `map' files, which designate the mapping from hardware devices (such as /dev/sda1) to mountpoints.

The first file to consider is /etc/auto.master which contains but one line:

/vol /etc/auto.vol --timeout=20

This instructs autofs to consult the file auto.vol in all matters of volatile media. The file /etc/auto.vol then contains the actual map; the relevant entry is the following:

flash  -fstype=auto,dirsync,user,umask=000,exec,noatime,shortname=winnt :/dev/sda1

The above line can be parsed into three distinct sections: mountpoint under /vol, mount options, and the device to be mounted. (To create the all mountpoints, use mkdir -vp /vol/{flash,floppy,cdrom} under bash; see the file for the configuration of floppy/cdrom). The fstype is automatically detected; but I can only recommend to stick with vfat: using ext2/3 will trigger unpleasant fsck-ing at boot time. Important for electronic filesystems such as USB sticks are dirsync and noatime, as these reduce the number of device accesses (limited by component lifetime). For the remaining options, see mount(8).

After creating and editing these two files (with the correct settings for your stick), you should be able to do a `/etc/init.d/autofs restart' and see contents of your flash directory via `ls -l /vol/flash'. If so, you are ready to experiment with the script and its configuration file (if it is copied into /etc/pack.list, make sure that you have write access). The automounter should best be enabled at boot time (Fedora/RH: chkconfig --list autofs, Debian: update-rc.d -n autofs defaults).

The script works very well with automounting: before doing anything, it will first try to access the $flashdir. If it cannot access, after several repeated attempts it will give up with a warning message.

The remainder of the article now describes how the script works. (You can also see debug output by setting the shell variable debug to > 4.)

Functioning of the core script

The main, bare-bones script, which we will now take apart, uses Perl built-ins for things that have to run fast (such as traversing directories and creating lists), and calls other programs (such as tar) for everything else.

The following requirements made a script inevitable:

Functional Principle:
A configuration file is used to keep track of the important files and directories.
To take home (on the USB stick) all tracked files home that changed during the last x days, the script is called with the --days x option.
At home, the archive is unpacked using the --unpack option. This works in both directions (from home to work, and from work to home).

User interaction

An important program called by this script is (g)dialog, which provides GUI dialogs (example error box in dialog and in gdialog):

$dialog = ($ENV{'DISPLAY'} && (qx#which gdialog 2>/dev/null#))? "gdialog" : "dialog";

On the console, dialog is called; under X, gdialog is used for the same purposes. The qx/.../ statement chooses dialog in both cases if gdialog is not available. On a Debian system, you can install both via apt-get install dialog zenity (gdialog is in the zenity package); similar for other distros.

For the rest of the user interaction, the pager $less is used, and we have the logging function:

sub log {
   printf STDERR "\033[1;34;7m@_\033[m\n";
}

The funny digits are ANSI escape sequences to colour the output (a screenshot is here); good examples of using these can be found elsewhere in LG. Since Perl already has a log(-arithm) function, we need to make clear which log we want; hence above function will be invoked as ::log() (which is an abbreviation for main::log()).

Core blocks of the script

The bare-bones script can already do the following:

In increasing order of complexity, we will consider (1) listing, (2) un-packing, (3) packing, and (4) building the file list.

1) Listing files that have changed

This is easy, $see is true when the --list option is set:

if ( $see ) {
   build_list();
   system "$less $list";
}
The build_list() function hides all the complexity of parsing, collating, and checking; it is discussed later. The third line calls our pager `$less' (PAGER environment variable) on the newly created $list.

2) Un-packing an archive

Un-packing is done by invoking the script with the --unpack option. On my system, I have found it useful to use the following alias (the pack script is in ~/bin and ~/.bashrc contains: export PATH=~/bin:$PATH):

alias unpack="pack --unpack"
Un-packing is described by the following pseudo-code:
  1. check if archive was created on this host
  2. call unpack(), a wrapper around `tar -jxpPvf $tarfile'
  3. display errors encountered in (2), if any

Step (1) is important: if you accidentally left an archive on the stick and then, not knowingly, unpack it some days later, it will silently overwrite your files with the older files stored in the archive. This has happened to me several times, but the remedy is both simple and very efficient.

Tar has a rich set of features, stemming from the old days when it was used as Tape ARchiver. That is, it supports storing archives on several different (tape) volumes and with the --label=myLabel option, these volume archives can be given individual names. You can view the volume name of an archive by using -t to list an archive, the volume label appears in the first line. So, in the present case, the volume names are simply set to the fully-qualified hostname(1) of the system. (This assumes that different PCs have different hostname.)

3) Archiving what has changed in the past days

The inverse operation to unpacking is simple, all the complexity is in build_list():

  ::log "Creating backup archive file ...";
  system "tar -jcpPvf $tarfile --label $hostname --files-from=$list 2>$log";
  if ($? != 0) {
       unlink $tarfile;
       error "Tar Problem!\nDeleting archive $tarfile";
  }
  ::log "Syncing ...";
  system "sync; sync;";

  ::log "Testing file integrity:";
  system "bzip2 -tvv $tarfile";

The $tarfile is created with the files from $list, the volume label is set to the $hostname. In case of error, the archive is deleted (unlink-ed) and an error window will pop up. Otherwise, sync(1) is called twice to flush filesystem buffers. The subsequent file integrity test provides additional protection against removing the USB stick before all of the data has been safely transferred. (This is a common problem with removable media, and it is good to be cautious, here.)

4a) Building the file list: parsing configuration file

The build_list() routine adds intelligent functionality around the tar program: it processes the contents of a configuration file in such a manner that only files changed in the last few days are passed on to tar, without adding unwanted subdirectories, but with full expansion of symlinks.

The complexity that this requires is hidden behind recursion (a function calling itself again), which is a good working principle, since directories are themselves recursive: a sub-sub-directory is a directory in a directory is a directory in a directory ... :-)

Let's look at the main loop, which parses the file $config.

while(<>) {
    strip_comments_and_leading_whitespace();
    next if $line_is_empty;

    my @arr = split;                  # put all single words into @arr

    if ($arr[0] = m/<\s*rec\s*>/i)       {  # line starts with <REC>
       shift @arr;
       getLinkedFiles(@arr);
    } elsif ($arr[0] = m/<\s*link\s*>/i) {  # line starts with <LINK>
       shift @arr;
       readLink(@arr);
    } else {                          # this is a `normal' line
       foreach (@arr) {
         if (m#[{*]#) {               # e.g. /home/gerrit/{.bash*,.exrc,bin/*}
             let_bash_expand_entry();
         } elsif ( -d ) {             # a single directory: traverse
             getLinkedFiles($_);
         } else {                     # a file or a link: just print to list
             printf "$_\n";
         }
       }
    }
 }

The configuration file contains file/directory names on each line; bash shell-globbing syntax is allowed. (In this case, bash is actually used to expand entries.) Configuration lines starting with <LINK> mean "I want this symlink to be followed, but I don't care about deep directory traversal". You get the full works of directory traversal plus symlink expansion with deep recursion by using the <REC> tag. Here is an example configuration file which illustrates the principle.

The while() loop iterates over each line of the $config file. The split statement separates single words into the array @arr. Hence, when a line starts with a <LINK> or <REC> tag, we need to remove that tag from @arr before passing it as argument to one of the functions; this is handled by the shift statement. All output from the invoked subroutines is redirected to the TMPLIST temporary file, containing the expanded list of files, resolved symlinks, and traversed directories.

We now briefly look at getLinkedFiles(), which maintains a hash-list %KNOWN to avoid cycles and proceeds differently by these cases:

The readlink_recurse() routine in turn calls getLinkedFiles() to resolve new file entries; it contains a logic to avoid getting lost in symlink loops. This can be a bad trap otherwise (try e.g. this: ln -s a b; ln -s b a; namei a).

4b) Sort out the temporary filelists: build_list()

To pick those files that have changed during the last $days days, build_list() uses a simple trick: it rewinds the start time $^T (Perl special variable) of the script by this amount in seconds. This means that, once file modification times are tested, Perl already thinks it is executing $days back in history:

$^T -= $days * 24 * 3600;

The following is the final processing of the file list, refining TMPLIST into LIST:

while(<TMPLIST>) {
    chomp;
    s#/[^/]+/(\.\.)+/(\S+)#/$2#g;
    error "FATAL:\n \"$_\"\n--$!\n" unless -e $_;

    next if ( (-M ) >= 0 );
    print LIST "$_\n" unless $KNOWN{$_}++;
}

After chomp()ing the `\n', pathnames containing `..' are normalised: for example, /usr/local/lib/wx/../../bin/wx2 is reduced to /usr/local/bin/wx2. If the file does not exist (e.g., due to a broken symlink), an error message is produced. (An example output is here.)

The file age test `-M' returns the time since the file ($_) was last modified; a negative modification age means that the file was modified in the future. Relative to the rewound start time $^T, this means: all files created during the last $days until now will appear with a negative modification time; and thus be added to the list. Last, the filename is printed into the LIST unless it has been encountered before (indicated by a non-null entry in the hashlist %KNOWN).

That's it, the list is built, the file closed, and it's passed on to tar to create the archive.

Conclusion

This part of the article has described how to use a USB stick for daily directory synchronisation between two non-networked computers. The principle has been presented on a scaled-down (but fully functional) script. The next part will introduce additional functionality that simplifies the usage and makes it robust for many day-to-day situations.

In summary, using USB sticks for synchronising directories between home and workplace is an efficient, workable, and very cost-effective solution. Some companies now even give away USB sticks for free, thereby contributing to a significant reduction in Total Cost of Ownership (TCO) of the solution presented in this article.

Talkback: Discuss this article with The Answer Gang


[BIO]



Copyright © 2006, Gerrit Renker. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 132 of Linux Gazette, November 2006

<-- prev | next -->
Tux