Andrew Tanenbaum to Linus Torvalds.
In order to find the current disk usage use the command df.
sa101$ df -Pk Filesystem 1024-blocks Used Available Capacity Mounted on /dev/sdb1 5226072 4551596 409004 92% / /dev/sda2 1039748 823372 163560 84% /var /dev/sdc5 120161140 64612768 49444480 57% /u /dev/sdc6 120201332 10466152 103629280 10% /usr tmpfs 246556 12 246544 1% /dev/shm
The output shows the device, the total capacity of the disk, the disk space used, the disk space that remains available, the proportion of disk space that is already in use (expressed as a percentage) and the current mount point.
The df command does not show information regarding any devices that have not been mounted.
A typical systems administration script would run the df command and send an alert to the systems administration if the "Capacity" exceeded a certain threshold. Determining what that threshold should be depends on many factors. The threshold may be higher for a system storing a few slow growing files than that which might be set on a more volatile system where the rate of increase might vary very quickly.
If only a relatively short retention period is required for the stored data then timely deletion of redundant data may be all that is needed. Where the filestore is used for information that is required in perpetuity it may be necessary to expand the available storage. If this involves the procurement of additional hardware the lead times to resolution of capacity constraints may be longer.
The awk command processes text data streams as lines and fields and is ideal for extracting this kind of information.
Once we have our script for checking the remaining file system capacity and raising the necessary alerts the process can run to an appropriate schedule using the cron daemon.
If a file system has unexpectedly exceeded our capacity threshold, it will be necessary to find out where in the file hierarchy the problem is occurring.
The du (disk usage) command will give us the disk usage in each directory.
Let us suppose that /var is being reported as 70% full and we need urgently to identify the cause of the problem
sa101$ for d in ‘find /var -type d -maxdepth 1‘;do du -sk $d;done 821340 /var 10728 /var/spool 16 /var/lock 188 /var/run 28 /var/yp 24372 /var/named 15548 /var/cache 16 /var/state 64 /var/db 1300 /var/man 137076 /var/tmp 63348 /var/lib 4 /var/empty 13940 /var/www 542180 /var/log 8 /var/games 204 /var/lost+found 1772 /var/squirrel 10508 /var/data 32 /var/local 4 /var/nmbd sa101$ exit
All of these results look well within expectations, but if we identified a directory with exceptional usage investigation would continue. Often with /var a sudden increase in disk usage is caused by the logging of repeated iterations of the same error.
In relatively small systems identifying target directories for investigation may be done by inspection, on larger systems or when dealing with multiple systems, it may be preferable to script the inspection process.
The tools to be used for creating, deleting, shrinking or growing file systems are various and selection will depend on the local hardware and software build.
Modern Linux systems are often built with software RAID employing metadisks and logical volumes. Hardware RAID built on either local devices or on a SAN may also be used.
Even on a relatively small desktop host with a single local hard disk device, there are a number of alternatives for managing the partitions.
The most familiar is likely to be fdisk a tool that shares its origin with the world of Microsoft DOS. Usually used interactively with a text based display fdisk, can be used to list partitions on a known device.
E.g.
sa101$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb1 5226072 4567884 392716 93% / /dev/sda2 1039748 822860 164072 84% /var /dev/sdc5 120161140 64616312 49440936 57% /u /dev/sdc6 120201332 10466152 103629280 10% /usr tmpfs 246556 12 246544 1% /dev/shm sa101$ for d in a b c ;do fdisk -l /dev/sd$d;done Disk /dev/sda: 1083 MB, 1083801600 bytes 64 heads, 63 sectors/track, 525 cylinders, total 2116800 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x7d99c20d Device Boot Start End Blocks Id System /dev/sda1 63 4031 1984+ 82 Linux swap /dev/sda2 4032 2116799 1056384 83 Linux Disk /dev/sdb: 6448 MB, 6448619520 bytes 255 heads, 63 sectors/track, 784 cylinders, total 12594960 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xffffffff Device Boot Start End Blocks Id System /dev/sdb1 * 63 10618964 5309451 83 Linux /dev/sdb2 10618965 12594959 987997+ 82 Linux swap Disk /dev/sdc: 250.1 GB, 250059350016 bytes 255 heads, 63 sectors/track, 30401 cylinders, total 488397168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe7495dc5 Device Boot Start End Blocks Id System /dev/sdc1 63 488392064 244196001 5 Extended /dev/sdc5 126 244155869 122077872 83 Linux /dev/sdc6 244155933 488392064 122118066 83 Linux sa101$ exit
Next up is cfdisk a curses based interactive partitioning tool popular in Linux distributions for some years. cfdisk does not have command line options so is of little use to us here.
A tool which can be used entirely from the command line is scripted fdisk or sfdisk.
With a confident hand on the tiller sfdisk can be used to reconfigure/destroy your filesystems on the fly.
The sfdisk is particularly good at finding and listing all block devices. Unfortunately this process is also unbearably slow and may hang if drives (e.g. floppy devices) are installed but no media is present.
For disk partitions larger than 2TB fdisk, cfdisk and sfdisk will need to be discarded (for the present) in favour of a GPT aware tool. The standard for this in recent years has been parted. which can also list devices at the command line if you have the time to spare.
parted -l ......
A better option for listing block devices is the one trick pony lsblk.
sa101$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT fd0 2:0 1 4K 0 disk sda 8:0 0 1G 0 disk |-sda1 8:1 0 2M 0 part [SWAP] ‘-sda2 8:2 0 1G 0 part /var sdb 8:16 0 6G 0 disk |-sdb1 8:17 0 5.1G 0 part / ‘-sdb2 8:18 0 964.9M 0 part [SWAP] sr0 11:0 1 1024M 0 rom sdc 8:32 0 232.9G 0 disk |-sdc1 8:33 0 1K 0 part |-sdc5 8:37 0 116.4G 0 part /u ‘-sdc6 8:38 0 116.5G 0 part /usr sdd 8:48 0 232.9G 0 disk
By default Ubuntu uses a utility called partman at install time to partition the disk. The partman tool provides an interface to parted which actually does the on disk partitioning.
Whilst the primary use of tar, cpio, dd and dump is for creating archives and backups, these tools may also be used for quickly transferring data to alternative devices either locally or resident on other hosts.
The tape archive and retrieval tool tar is one of those tools that has been written off many times by the new kids on the block but always makes a come back. It is powerful, flexible and universal. Copying to tape, across physical and logical devices and across the network are all readily achieved with tar.
Because tar processes a bit stream and writes to a file with its own tar format, it is able to backup and retrieve not just across Linux distributions but across many different operating systems.
The use of tar to package software is near ubiquitous. The required files for a software installation together with the installation instructions and documentation are usually bundled together and compressed with one of the GNU file compression tools, most commonly gzip. The resultant bundle is called a tar ball and is often named in a similar fashion to this:
softwarepackage_1.0.1-1.tar.gz appmenu-qt_0.2.6-1ubuntu1.debian.tar.gz
The Ubuntu Linux distribution follows Debian in the use of tar to create bundles that are downloaded, unpacked and installed using the package management tool apt-get.
A common use of tar is to rapidly recreate a directory tree on another file system either locally or across the LAN.
sa101$ tar cf - . |(cd /new/file/system;tar xf -)
NB. We can echo the path name to the screen both when creating the archive and on extraction with the -v option. Writing to screen is relatively very slow, it is preferable to avoid this when creating substantial archives. If a record of the files being processed is required, use -v with redirection to a file. Do note that the verbose listing is written to standard error, not standard out, so the command would resemble the following:
tar cvSf /dev/st0 . 2>/var/log/backup‘date +%d‘
Tar used not to be good at handling sparse files but recent versions have the -S/ S option which causes tar to handle sparse files properly. It is good practice to always use the -S/ S option.
It is quite usual to combine the use of tar with one of the file compression utilities such as gzip, bzip2 (better compression but with a big speed penalty) or zcat / compress.
The -z flag tells tar to pipe the output through gzip.
Sparse files are handled by default when using dump, with which, whole filesystems can be backed up and archives can be created over multiple volumes.
The restore utility provides both command line and interactive tools to restore from dump files.
The downside is that the tools are filesystem specific. Backups taken on one system may not be recoverable on another sometimes not even across upgrades on the same Linux distribution.
The dd command will copy from files from one device to another creating exact byte level replicas. The syntax is rather different from most UNIX / Linux commands in that it uses equates on the command line e.g.
dd if=/dev/sda of=/dev/sdb
As may be immediately implied dd can create clones of block devices and is commonly used in IT labs to image disks, clone DVDs et al. There are number of other flags in dd that allow the fine tuning of the way the copy is created, including the block size, no of bytes copied etc. These controls allow dd to be used in combination with other software tools to optimise network transfers, create Master Boot Record copies and other technical tricks.
sa101# tar cSf - /usr|dd bs=4096 |\ ssh root@archives (cd /arc/hostname1;dd bs=4096|tar xf -)
Identical copies of files can be maintained over multiple hosts using rdist.
The file mode, group, owner and mtime can be preserved. Running programs can be updated using rdist.
The rsync utility can update a remote file set by just copying the data required to synchronise with the set on the local host.
There are a number of other backup utilities available with Ubuntu and other Linux distributions. Most appear to be graphical user interfaces to well known tools like tar.
KBackup, File Backup Manager, Lucky Backup and Back in time are possibilities you may want to check out should your life be longer than that of most system admins.
A graphical front end to rsync. The maintainer Michael Terry does not recommend that Deja Dup is used for maintaining data across distribution upgrades. See "http://mterry.name/log/tag/deja-dup/"
Create a recursive compressed tar backup of the /etc directory.
Create a dump file archive of /etc.
/sbin/dump -0u -f /var/backup/etc_dump‘date +%d‘
Use restore -i to find and extract /etc/mail/aliases.
Using the next tools and metanotation table revise the material covered on the course so far.

Copyright
© 2003-2017
Clifford W Fulford.
Fulford Consulting Ltd.
Regd. Co. 4250037 in England & Wales.
Regd. office 162, Edward Rd. Nottingham NG2 5GF, England, UK.
Related web sites: City Linux | Flare Support | West Bridgford | Fulford Portal | Joan Mary Fulford (Nottingham Writer) | Fulford Gallery | Amharic Interpreter | Arabic Interpreter | Tigrinya Interpreter
The layout and associated style sheets for this page are taken from the World Wide Web Consortium and used here under the W3C software licence.