Lab 9 Preparation

Have a look at the man pages for tar(1) and gzip(1) to prepare for Exercise 9 on using these tools.

In ages past, it was common to use a tape drive to create backups of disks. This was simply due to the lower cost of storage on mountable tapes; disks were relatively small and expensive. Many tools were developed for managing backups, and one of the earliest was the tape archive - tar. As a result, the tar command still retains all its tape drive management facilities for /dev/tape since it is still used for that purpose today. tar takes all its file and directory arguments and creates a single large file combining all of them, with a "directory" of the files as part of the tape archive. The block size is usually large for efficiency in writing to a tape. On our Linux systems, the default size is 10K, 10,240 bytes.

A tar file (or "tarball") is often used for distributing information between systems. To keep the size small (and to overcome the large default block size), the tar file is compressed after creation, usually by using gzip (although other compression utilities like bzip2 can also be used). gzip (and gunzip for decompression) uses common loss-less mathematical algorithms for compressing data into a smaller space. There are also "loss-y" compression schemes that are usually much more efficient, but they are reserved for graphics, video, and voice, where the loss of a few bits does not normally make the data useless. It's unusual for a loss-free compression scheme to cut the size by much more that one-half, except for the unused part of a 10K tape block.

The normal usage, then, is to create a collection of files and compress it, then to send it to its destination. There it is uncompressed and untarred to restore all the files to their original form and directory structure (the -> arrow represents some form of sending the output to the second program):

a bunch of files sent to tar cf... -> gzip ... produces a single compressed file

a compressed tar file sent to gunzip ... -> tar xf ... produces the original files and directory structure

You will always use the f option of tar (unless you are using a real tape), followed immediately by the filename you wish to use. You should combine that with the key character that describes what you want to do: cf filename to create, xf filename to extract, or tf filename to list (test).

File suffixes you will see are .tar for normal tar files, .tar.gz for compressed tar files, and .tgz as a handy short form for .tar.gz. You will normally add them yourself.

There are short-cuts available, but they are not supported in all implementations of the utilities. For example, GNU versions of tar support the z option to combine its operation with gzip or gunzip. You can also pipe the output through stdout and stdin to avoid creating an extra file (the dash "-" for the filename in tar represents stdout, while the same character in gzip indicates stdin):

tar cf tar-file.tar <file-list>; gzip tar-file.tar

tar czf tar-file.tgz <file-list>

tar cf - <file-list> | gzip - > tar-file.tgz

Note that gzip and gunzip will delete the original file if the compression or uncompression is successful.