Linux training

City LinUX Training Courses

Section 15.

File system
management.

"A multithreaded file system is only a performance hack."

Andrew Tanenbaum to Linus Torvalds.

15. File system management.

15.1. Checking file system capacity.

In order to find the current disk usage use the command df.

sa101$ df -Pk
Filesystem     1024-blocks     Used Available Capacity Mounted on
/dev/sdb1          5226072  4551596    409004      92% /
/dev/sda2          1039748   823372    163560      84% /var
/dev/sdc5        120161140 64612768  49444480      57% /u
/dev/sdc6        120201332 10466152 103629280      10% /usr
tmpfs               246556       12    246544       1% /dev/shm

The output shows the device, the total capacity of the disk, the disk space used, the disk space that remains available, the proportion of disk space that is already in use (expressed as a percentage) and the current mount point.

The df command does not show information regarding any devices that have not been mounted.

A typical systems administration script would run the df command and send an alert to the systems administration if the "Capacity" exceeded a certain threshold. Determining what that threshold should be depends on many factors. The threshold may be higher for a system storing a few slow growing files than that which might be set on a more volatile system where the rate of increase might vary very quickly.

If only a relatively short retention period is required for the stored data then timely deletion of redundant data may be all that is needed. Where the filestore is used for information that is required in perpetuity it may be necessary to expand the available storage. If this involves the procurement of additional hardware the lead times to resolution of capacity constraints may be longer.

The awk command processes text data streams as lines and fields and is ideal for extracting this kind of information.

Once we have our script for checking the remaining file system capacity and raising the necessary alerts the process can run to an appropriate schedule using the cron daemon.

15.2. Finding the data.

If a file system has unexpectedly exceeded our capacity threshold, it will be necessary to find out where in the file hierarchy the problem is occurring.

The du (disk usage) command will give us the disk usage in each directory.

Let us suppose that /var is being reported as 70% full and we need urgently to identify the cause of the problem

sa101$ for d in ‘find /var -type d -maxdepth 1‘;do du -sk $d;done
821340    /var
10728     /var/spool
16   /var/lock
188  /var/run
28   /var/yp
24372     /var/named
15548     /var/cache
16   /var/state
64   /var/db
1300 /var/man
137076    /var/tmp
63348     /var/lib
4    /var/empty
13940     /var/www
542180    /var/log
8    /var/games
204  /var/lost+found
1772 /var/squirrel
10508     /var/data
32   /var/local
4    /var/nmbd
sa101$ exit

All of these results look well within expectations, but if we identified a directory with exceptional usage investigation would continue. Often with /var a sudden increase in disk usage is caused by the logging of repeated iterations of the same error.

In relatively small systems identifying target directories for investigation may be done by inspection, on larger systems or when dealing with multiple systems, it may be preferable to script the inspection process.

15.3. Partition tools.

The tools to be used for creating, deleting, shrinking or growing file systems are various and selection will depend on the local hardware and software build.

Modern Linux systems are often built with software RAID employing metadisks and logical volumes. Hardware RAID built on either local devices or on a SAN may also be used.

Even on a relatively small desktop host with a single local hard disk device, there are a number of alternatives for managing the partitions.

The most familiar is likely to be fdisk a tool that shares its origin with the world of Microsoft DOS. Usually used interactively with a text based display fdisk, can be used to list partitions on a known device.

E.g.

sa101$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1        5226072  4567884    392716  93% /
/dev/sda2        1039748   822860    164072  84% /var
/dev/sdc5      120161140 64616312  49440936  57% /u
/dev/sdc6      120201332 10466152 103629280  10% /usr
tmpfs             246556       12    246544   1% /dev/shm

sa101$ for d in a b c ;do fdisk -l /dev/sd$d;done
Disk /dev/sda: 1083 MB, 1083801600 bytes
64 heads, 63 sectors/track, 525 cylinders, total 2116800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x7d99c20d

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1              63        4031        1984+  82  Linux swap
/dev/sda2            4032     2116799     1056384   83  Linux

Disk /dev/sdb: 6448 MB, 6448619520 bytes
255 heads, 63 sectors/track, 784 cylinders, total 12594960 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xffffffff

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *          63    10618964     5309451   83  Linux
/dev/sdb2        10618965    12594959      987997+  82  Linux swap

Disk /dev/sdc: 250.1 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders, total 488397168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe7495dc5


   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1              63   488392064   244196001    5  Extended
/dev/sdc5             126   244155869   122077872   83  Linux
/dev/sdc6       244155933   488392064   122118066   83  Linux
sa101$ exit

Next up is cfdisk a curses based interactive partitioning tool popular in Linux distributions for some years. cfdisk does not have command line options so is of little use to us here.

A tool which can be used entirely from the command line is scripted fdisk or sfdisk.

With a confident hand on the tiller sfdisk can be used to reconfigure/destroy your filesystems on the fly.

The sfdisk is particularly good at finding and listing all block devices. Unfortunately this process is also unbearably slow and may hang if drives (e.g. floppy devices) are installed but no media is present.

For disk partitions larger than 2TB fdisk, cfdisk and sfdisk will need to be discarded (for the present) in favour of a GPT aware tool. The standard for this in recent years has been parted. which can also list devices at the command line if you have the time to spare.

parted -l
  ......

A better option for listing block devices is the one trick pony lsblk.

sa101$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
fd0      2:0    1     4K  0 disk
sda      8:0    0     1G  0 disk
|-sda1   8:1    0     2M  0 part [SWAP]
‘-sda2   8:2    0     1G  0 part /var
sdb      8:16   0     6G  0 disk
|-sdb1   8:17   0   5.1G  0 part /
‘-sdb2   8:18   0 964.9M  0 part [SWAP]
sr0     11:0    1  1024M  0 rom
sdc      8:32   0 232.9G  0 disk
|-sdc1   8:33   0     1K  0 part
|-sdc5   8:37   0 116.4G  0 part /u
‘-sdc6   8:38   0 116.5G  0 part /usr
sdd      8:48   0 232.9G  0 disk

By default Ubuntu uses a utility called partman at install time to partition the disk. The partman tool provides an interface to parted which actually does the on disk partitioning.

15.4. File transfers and archiving.

Whilst the primary use of tar, cpio, dd and dump is for creating archives and backups, these tools may also be used for quickly transferring data to alternative devices either locally or resident on other hosts.

The tape archive and retrieval tool tar is one of those tools that has been written off many times by the new kids on the block but always makes a come back. It is powerful, flexible and universal. Copying to tape, across physical and logical devices and across the network are all readily achieved with tar.

Because tar processes a bit stream and writes to a file with its own tar format, it is able to backup and retrieve not just across Linux distributions but across many different operating systems.

The use of tar to package software is near ubiquitous. The required files for a software installation together with the installation instructions and documentation are usually bundled together and compressed with one of the GNU file compression tools, most commonly gzip. The resultant bundle is called a tar ball and is often named in a similar fashion to this:

softwarepackage_1.0.1-1.tar.gz
appmenu-qt_0.2.6-1ubuntu1.debian.tar.gz

The Ubuntu Linux distribution follows Debian in the use of tar to create bundles that are downloaded, unpacked and installed using the package management tool apt-get.

A common use of tar is to rapidly recreate a directory tree on another file system either locally or across the LAN.

sa101$ tar cf - . |(cd /new/file/system;tar xf -)

NB. We can echo the path name to the screen both when creating the archive and on extraction with the -v option. Writing to screen is relatively very slow, it is preferable to avoid this when creating substantial archives. If a record of the files being processed is required, use -v with redirection to a file. Do note that the verbose listing is written to standard error, not standard out, so the command would resemble the following:

tar cvSf /dev/st0 . 2>/var/log/backup‘date +%d‘

Tar used not to be good at handling sparse files but recent versions have the -S/ S option which causes tar to handle sparse files properly. It is good practice to always use the -S/ S option.

It is quite usual to combine the use of tar with one of the file compression utilities such as gzip, bzip2 (better compression but with a big speed penalty) or zcat / compress.

The -z flag tells tar to pipe the output through gzip.

15.5. dump / restore.

Sparse files are handled by default when using dump, with which, whole filesystems can be backed up and archives can be created over multiple volumes.

The restore utility provides both command line and interactive tools to restore from dump files.

The downside is that the tools are filesystem specific. Backups taken on one system may not be recoverable on another sometimes not even across upgrades on the same Linux distribution.

15.6. dd - device to device

The dd command will copy from files from one device to another creating exact byte level replicas. The syntax is rather different from most UNIX / Linux commands in that it uses equates on the command line e.g.

dd if=/dev/sda of=/dev/sdb

As may be immediately implied dd can create clones of block devices and is commonly used in IT labs to image disks, clone DVDs et al. There are number of other flags in dd that allow the fine tuning of the way the copy is created, including the block size, no of bytes copied etc. These controls allow dd to be used in combination with other software tools to optimise network transfers, create Master Boot Record copies and other technical tricks.

sa101# tar cSf - /usr|dd bs=4096 |\
ssh root@archives (cd /arc/hostname1;dd bs=4096|tar xf -)

15.7. rsync and rdist

Identical copies of files can be maintained over multiple hosts using rdist.

The file mode, group, owner and mtime can be preserved. Running programs can be updated using rdist.

The rsync utility can update a remote file set by just copying the data required to synchronise with the set on the local host.

15.8. Other options.

There are a number of other backup utilities available with Ubuntu and other Linux distributions. Most appear to be graphical user interfaces to well known tools like tar.

KBackup, File Backup Manager, Lucky Backup and Back in time are possibilities you may want to check out should your life be longer than that of most system admins.

15.8.1. Deja Dup.

A graphical front end to rsync. The maintainer Michael Terry does not recommend that Deja Dup is used for maintaining data across distribution upgrades. See "http://mterry.name/log/tag/deja-dup/"

15.9. Exercises.

Create a recursive compressed tar backup of the /etc directory.

Create a dump file archive of /etc.

/sbin/dump -0u -f /var/backup/etc_dump‘date +%d‘

Use restore -i to find and extract /etc/mail/aliases.

Using the next tools and metanotation table revise the material covered on the course so far.

15.10. Tools and metanotation:

The layout and associated style sheets for this page are taken from the World Wide Web Consortium and used here under the W3C software licence.

Linux training for private, public & voluntary sector.

Search

Linux Administration 101.

Sample courses.

City LinUX Training Courses

Section 15.

File system
management.

"A multithreaded file system is only a performance hack."

15. File system management.

15.1. Checking file system capacity.

15.2. Finding the data.

15.3. Partition tools.

15.4. File transfers and archiving.

15.5. dump / restore.

15.6. dd - device to device

15.7. rsync and rdist

15.8. Other options.

15.8.1. Deja Dup.

15.9. Exercises.

15.10. Tools and metanotation:

Trainer

Linux Training

Linux training for private, public & voluntary sector.

Search

Linux Administration 101.

Sample courses.

City LinUX Training Courses

Section 15. File system management. "A multithreaded file system is only a performance hack."

15. File system management.

15.1. Checking file system capacity.

15.2. Finding the data.

15.3. Partition tools.

15.4. File transfers and archiving.

15.5. dump / restore.

15.6. dd - device to device

15.7. rsync and rdist

15.8. Other options.

15.8.1. Deja Dup.

15.9. Exercises.

15.10. Tools and metanotation:

Trainer

Section 15.

File system
management.

"A multithreaded file system is only a performance hack."