The UNIX File System

One of the features which make UNIX especially attractive is the structure of its file system. UNIX makes use of an inverted branching-tree file structure. The tree trunk consists of a single large file, "/" (called root), which contains all the files on the system. (In this structure, all files which contain other files are called "directories"). Any number of subdirectories may branch from this main trunk. However, the hierarchical structure does not end there, for each major branch may also spawn any number of further branches (sub-subdirectories). These subdirectories may, in their turn, send forth yet more levels of files...and so on. All successive directory names are separated by `/', so you'll find directory names like "/usr/local/doc/email-addr".
As an example of the way this file structure works in practice, consider the following example. Someone with a username of "user" creates a directory to contain her correspondence. This is one of several such high-level directories she has in her home directory; others might be entitled "projects," "dissertation," "articles," etc. Within the directory "correspondence," she creates several sub-directories: "personal," "memos," "professional." These subdirectories contain her basic files: in this case, individual letters and memos. She might also store programs (which, like most everything else in UNIX, are really files at heart) in any of her directories. With the branching structure of her files and directories (much like folders on a Macintosh, or nested folders in a real-world filing system), "user" could easily to keep track of a large number of files.

Directories

All directories and files, including all home directories (the directories assigned to individual users) are ultimately branched from the root directory. If you issue the command

     > ls -F /

(meaning "list root, showing which files are directories and which are certain other special files"), you will see a listing something like this:

     News/             export/           nfs/               usenet/
     bin@              home@           pcfs/             userhome@
     boot               k1@               q1@              usr/
     core               k2@               q2@              var/
     dev/               k3@               q3@              vmUNIX*
     e1@               kadb*             q4@              vmUNIX.931007
     e2@               lib@                sbin/             vmUNIX.ORIG.Z
     e3@               lost+found/     scratch/         vmUNIX.dist.Z
     e4@               mnt/               sys@              vmUNIX.nopatch.Z
     etc/                mntl/              tmp/

Your home directory will be probably be one of those beginning with a "q", an "e", or a "k".
The Academic Server Cluster is specially set up so that all four machines, Quads, Ellis, Kimbark, and Woodlawn look like directories on a single UNIX system. Because of this, you can safely ignore the leading "/nfs/ellis" or "/nfs/quads" or "/nfs/kimbark" in path names, and just write file and directory names as though "e1" and "k2" and "q3" (and their siblings) all branched directly from the root directory.
UNIX has several built-in commands which simplify working with directories and files. If you want to work with a file, you can access it in either of two ways. You may provide UNIX with its absolute or fully qualified pathname, which is a list of all the directory names between root and your destination file. If you have several subdirectories, this can mean quite a bit of typing. Consider the case of our friend "user," who wishes to list the contents of a file named "UNIXmemos" to her terminal. This file exists in a sub-directory entitled "Memos," which is itself a sub-directory of a larger directory entitled "Correspondence," which is a sub-directory of a still larger directory called "Work" -- which is part of user's directory on the disk /e3. The fully qualified name, or "pathname," for this file would be "/e3/user/Work/Correspondence/Memos/UNIXmemos".
This is obviously quite a bit of typing, and it is all too easy to make a mistake while doing it. Fortunately, it is not necessary for our long-suffering example person to type this whole path each time she wants to view or edit "UNIXmemos." First of all, UNIX automatically places everyone in their home directories when they log in to the system, here "/e3/user". The practical effect of placing someone in a default directory is that the higher levels of the path (in this example everything from '/' to 'user') need not be specified when working with files in that default directory.
To further simplify any work with directories and files further down the path, user may change her working directory with the command `cd' (discussed below in "changing from one directory to another."). As an example, let's say user intends to work with "Memos." She intends to edit not only the file "UNIXmemos," but also the files "pcmemos" and "macmemos." Rather than typing a large part of a very long pathname each time she wanted to shift from one to the other, it would behoove user to use the command
     > cd Work/Correspondence/Memos
to get closer to the files she wants to work with. Then she may enter
     > cd
to return to her home directory.
Note that the C-shell does not allow you to switch directly from one lower-level directory to another. If user wanted to work with the files in another directory, under "Personal" for example, and make "Personal" the default directory, she could move to "Personal" by typing the command lines
     > cd                       [to return to her home directory]
     > cd Personal
Then she could work with any files under "Personal."

Directory-name abbreviations: . .. ~

Our friend "user" could also use another method to switch back and forth, because her UNIX shell reserves three special characters as abbreviations to make it easier to work with directories. These characters are '.', '..' and '~'.
     .     denotes your current directory
     ..    symbolizes the parent directory of the directory you are currently in
     ~     is a home directory;  the default directory is your own
So, if Personal and user's current directory (Work) were both subdirectories of the same parent, she could type
     > cd ../Personal
If Personal were a subdirectory of a directory two levels higher, she could type
     > cd ../../Personal
-- and so on. The '..' abbreviation refers only to the directory at the level immediately above the one you are currently in. So, if she ever wanted to obtain a list of the directories and files in a parent directory, user could simply type
     > ls ..
As another way to use directory-name abbreviations, consider the case of a person whose username is "pers". He wants to access and work with files in the subdirectories "food" and "cats," in the home directory of a classmate whose username is "othr." (Under the default file protections of UNIX, almost any file can be read by any user on the system. Please note that there are ethical implications of reading such files when you do not have permission from their owners. We urge you to protect your own important and/or sensitive files, by making use of the commands described in "File and directory permissions." )
To obtain a list of othr's files in the sub-directory "food", pers would normally have to enter the command line
     > ls /q2/othr/food
However, this command can be simplified by using the ~ abbreviation. Pers can obtain the same result by typing the command line
     > ls ~othr/food
If he had changed directories, was in ~othr/food, and wanted a listing of his own home directory, he could obtain one by issuing the command
     > ls ~
since '~', without a following username, is interpreted as the home directory of the person logged in.
These three directory abbreviations can also aid in other standard UNIX operations. If pers was in his home directory and wanted to copy a file named "fondue" from ~othr/food, he could do so by combining the abbreviations described above and the 'cp' utility in the command line
     > cp ~othr/food/fondue .
The copy utility 'cp' requires the location of a file and its destination. Thus, the command line first uses "~othr" to refer to othr's home directory, then the character '.' to symbolize pers's current directory. We'll discuss 'cp' in more detail in "Copying, moving, and removing files."

Naming conventions

UNIX naming conventions also make long, descriptive names possible; to make best use of the file structures of UNIX, you'll probably want to give your directories names which describe their contents.
Directory and file names under our version of UNIX may be any typable length, and may contain pretty much anything. However, you should probably avoid punctuation other than numbers, periods and underscores, since many punctuation marks will have specific meanings to your shell (for more information on characters to avoid, see "Wildcards," below). This includes spaces; spaces inside filenames can be very annoying to deal with.
Some UNIX users always begin directory names with upper-case letters, to more easily differentiate them from simple filenames.
Wildcards: ? * [ ] { }
Some characters function as "wildcards" in UNIX commands involving file and directory names. Effectively, using one (or a pair) of these characters in a command creates an ambiguous reference -- a filename which may be interpreted in any of several different ways. When you use wildcards, the shell will generate a list of the filenames in your directory which match your file reference. (If you must use one of the reserved characters and don't want its special meaning, simply precede the character with a backslash ['\'].)
?
The question mark stands for a single character in a filename. In the command line
     > ls memo?
the question mark will cause 'ls' to list all filenames or directories which begin with the string "memo" and contain exactly one more character, whatever that character may be. For example, if your directory contained the files "memo1", "memo2", "memo3", "memo4" "mamo1", "mimo", "m_mo" and the directories "momol" and "memok", `ls' displays the list:
     memo1 memo2 memo3 memo4 memok
(Note that memok is a directory, so its contents will be listed as well.)
If instead you replaced one of the characters within the word "memo" with the wildcard "?":
     > ls m?mo
you would receive a list of all filenames in your directory which started with the character 'm', ended with the characters 'mo', and had any single character in place of the question mark:
     mamo mimo m_mo
*
The asterisk, '*', stands for any number of characters, including zero, in a filename. The one exception is that it will not match a filename which begins with a period, '.' . Thus a command line containing the file reference 'memo*', like:
     > ls memo*
would list all filenames which begin 'memo', however many characters follow.
[ ]
The square brackets, '[ ]', are a limited form of the question mark, allowing you to search for specific filenames by specifying a list or a range of characters. Appending "[1234]" or "[1-4]" to "memo" in the command line
     > ls memo[1-4]
would display the filenames "memo1", "memo2", "memo3", and "memo4" from the list above. The brackets define a character class that includes all the characters listed within the brackets. Square brackets are especially useful when combined with other wildcards. The command line
     > ls [aeiou]*
would list all the files whose names begin with the vowels "a", "e", "i", "o", or "u". (Brackets are also useful when creating your own, more complex applications, since they enable you to feed the names of selected files to other utilities or programs one at a time.)
{ }
Finally, curly brackets, '{ }', allow you to specify partial filenames. If you happened to be in a directory containing files numbered from 1 through 1000, for example, you could type:
     > ls 4{37,56,74}
to get a listing of "437", "456", and "474".
Paths
As you saw above, UNIX is structured like a tree -- only upside-down.
Like MS-DOS, UNIX uses "paths," or descriptions of a certain route down the tree, to find command names. Just as you can refer to files without typing their full tree-structure names, so you can call most programs in an abbreviated fashion, since UNIX assumes it should look in certain places first. You don't need to know where the file containing the program "rm" is stored, for instance, to use the 'rm' command:
     > rm myfilename
The system administrators have set up a default path for new users; you can change this, or type an explicit path to the command you want:
     > /bin/rm myfilename
Navigating around in the file system

Checking current directory: pwd
To show the full path name of your current directory, use the 'pwd' (print working directory) command:
     > pwd
     /q2/pers/dir/
Changing from one directory to another: cd
The basic way to change directories in UNIX is with the 'cd' (change directory) command. You can use 'cd' with absolute path names, relative path names, and some abbreviations.
An example of absolute path names: to change from anywhere else to /usr/local/doc, you can type "cd /usr/local/doc". All absolute path names start with "/", the root directory.
An example of relative path names: to change from /usr/local/doc/news to /usr/local/doc/news/newusers, you can type "cd newusers", since newusers is a subdirectory.
You can also use special directory names, such as ".", "..", and "~" (discussed earlier in "Directories"). Remember that "." is your current directory, whatever that is at the time; ".." is your current parent directory, and you can specify subdirectories without giving the full path name.
So, as another example of relative path names: to switch from /mydir/dir-one to /mydir/dir-two, you could say "cd ../dir-two".
Recall that "cd ~username" should change to the home directory of person "username", regardless of where that directory is and where you are; if the directory is protected, you'll get a message saying that you can't cd there.
With no arguments, 'cd' always returns you to your home directory:
     > cd
     > pwd
     /q2/pers
Listing files: ls
To actually see what's in a directory, you use the `ls' (list) command. It has a number of useful options:
  -F      displays directory names with a "/", executable files with a "*",
          links to other files with a "@"

  -a      shows all files in the directory, including those beginning with
          "."; also lists . and .. directories

  -l      displays in long (detailed) format, including information on each
          file's owner and access privileges (see Section 4, "File and
          directory permissions," for a discussion of these things).

  -t      sorts by time, with last-written first
Don't forget that options can be combined. For example, 'ls -Fa' (or -aF; order does not matter) shows all files in the directory, with a "/" after directory names, and other characters to mark other special files. And 'ls -laF' does both these things, in detailed format.
There are other options for `ls' (lots of them, actually). Typing "man ls" will get you the detailed manual entry.
Copying, moving, and removing files: cp, mv, rm
You'll need the following three commands to carry out basic operations with files:
  cp        copy a file
  mv        rename a file; move files into a directory
  rm        scratch, delete, remove a file
To copy the contents of one file into another, use the 'cp' (copy files) command.
     > cp firstfile secondfile
Note that if "secondfile" already exists, it will be overwritten with the contents of "firstfile"; 'cp' has a "-i" (interactive) option which will warn you before overwriting files. You can uncomment some aliases defined in your .cshrc file that will force cp, mv, and rm to always use the -i option. It is highly recommended!
Both "firstfile" and "secondfile" can be either relative path names or absolute path names; "secondfile" can be a directory name instead. For example, "cp myfile .." copies the file "myfile" to the parent directory of wherever you are, if you have write privileges to that parent directory. If "secondfile" is a directory, you can replace "firstfile" with several file names, or a wildcard. To copy all files ending in ".txt" to a "Letters" directory, for example:
     > cp *.txt Letters
To move a file into another directory, use 'mv':
     > mv file1 Stuff_directory
You can think of this as working exactly like 'cp', only the original is deleted. Like 'cp', 'mv' has a "-i" option which will warn you if -- in the above example -- a file named "file1" already exists in Stuff_directory. If you don't use the -i option, you risk overwriting files, so be careful.
Since UNIX doesn't have a separate command to rename a file, 'mv' does double duty for that as well. To change file2's name to "thing":
     > mv file2 thing
Everything noted above about path names and write privileges applies equally to 'cp' and 'mv'.
To delete a file or files from a directory, use the 'rm' command.
Before you start experimenting, however, be aware that 'rm' removes files permanently. Unless you're certain that the file is on a system backup (it's older than a week, say, and you haven't touched it since), and you're willing to bribe the system administrators to get it back for you, you should treat 'rm' with a great deal of caution.
     > rm file7 oldfile *.txt
For multiple deletions, like the one above, we recommend the "-i" flag (which works the same as with 'cp' and 'mv': it asks you if you're sure you want to do that).
Creating and removing directories: mkdir, rmdir
To create a new directory, inside of the current directory, use the 'mkdir' command:
     > mkdir Stuff
People often start directory names with a capital letter and file names with a lowercase letter, to make them easier to distinguish.
To remove an empty directory, use 'rmdir'. This command returns an error message if there are still files inside; you must first remove them, using 'rm', or move them elsewhere using 'mv'.
     > rmdir No_more_stuff
Finding files: find
Occasionally you may forget where you put a file, or where a command is located. With the fast-find feature of the (otherwise very complex) UNIX `find' command, you can easily find it again:
     > find which
     /usr/bin/ypwhich
     /usr/share/man/man1/which.1
     /usr/share/man/man1/ypwhich.1
     /usr/ucb/which
The `find' command, given only a filename or other string, will check the fast-find database (a list of filenames from all directories which were publicly searchable the last time the database was updated); it will return names of files on the system containing the pattern you specified. Note that the database does not contain filenames from directories which are not publicly searchable, so if you've set your own directories to be very private, the fast-find feature of 'find' can't help you locate your own files.
For more advanced uses of the 'find' command, check out its man page.
Files

Showing contents of files: cat, more/less
You could use the 'cat' command to see what's in a text file, but since it lists the entire file without stopping, it's not useful in most situations. (You can use 'cat' to do other interesting things, though; for more information, check out its man page.)
The 'more' and 'less' utilities allow you to view a file one screenful at a time. At the bottom of each screen, both `more' and `less' will prompt you for the next screen with:
        --More--(n%)
At this point you may press the return key to display the next line, or the space bar to display the next screen. With the "-n" option to 'more', where n is an integer specifying a number of lines, you can define the size a screenful should be.
Showing beginnings/ends of files: head/tail
To display only the beginning of a file, use the `head' command; to display only the end, use `tail'. Without an option specifying how many lines you want displayed (`head -3', for instance, for the first three lines), both `head' and `tail' default to ten lines.
For some reason known only to a few UNIX wizards, 'head' works with multiple filenames, while 'tail' does not.
Comparing files: diff
To show the differences between two similar files, use 'diff'. Let's say we have two files, one copied from another, and make a minor change to file2: we change the word "file" to "lozenge." Comparing them, we see:
     > diff file1 file2
     3c3
     < file,
     ---
     > lozenge,
Searching for strings in files: grep
To search for a certain character string or other pattern in a file, use the 'grep' command.
Some of the more useful options to 'grep' are "-n", to display the line numbers where the string is found; "-l", to list filenames containing the string, instead of displaying the lines where it is found; and "-i", to ignore case (otherwise you'll find only strings in which the uppercase and lowercase letters are exactly as you typed them!).
For a list of the files in /usr/local/doc/news/newusers which discuss "netiquette," in any capitalization:
     > grep -il netiquette /usr/local/doc/news/newusers/*
     /usr/local/doc/news/newusers/README
     /usr/local/doc/news/newusers/emily-postnews
     /usr/local/doc/news/newusers/info-postings.1
     /usr/local/doc/news/newusers/info-postings.2
     /usr/local/doc/news/newusers/info-postings.3
     /usr/local/doc/news/newusers/news-answers-intro
Simple file creation
One of the strengths of UNIX is the variety of tools available for any given task, and creating files is no exception.
Using an application, such as 'script'
Sometimes you'll want to record a session, or part of a session, either to document something or to have an accurate and complete record of what happened. In such cases, you can make use of the 'script' command. When you type "script", a copy of everything that goes to your screen will also be written to a file in your current directory.
The default filename used by 'script' is "typescript". If you wish, you can specify another name:
     > script myfilename
The 'script' utility will continue merrily copying all terminal output to the file until you type a D, so don't forget and leave it running.
Many applications other than 'script' allow you to create files. For example, you can save messages from within an electronic mail program into a file.
Using input/output redirection, such as 'cat >>file'
You can create a new file most simply by typing:
     > cat >> newfile
(The first '>' represents your prompt, but the others you must actually type. What you're doing here is referred to as "input/output redirection," and is one of the features that makes UNIX so flexible.)
This will create a new file named "newfile" if there isn't one in that directory, or append to an existing one if there is. After you press the return key, anything you type will be copied into the file until you type a D. This method works well for short files, but since you can't go back and edit any line other than the current one, we recommend using an editor for longer files.
Using an editor
The best way to create long files, and the method which allows you the most freedom, is using an editor.
Controlling disk use
Because Quads, Ellis, Kimbark, and Woodlawn are used by so many people, all of whom store things in their directories, disk space is a constant problem. Remember that disk space is a shared resource, and all users of the ACS Server Cluster are expected to be responsible about the number and size of files kept on those machines.
You should always be aware of how much space you are using, and make an effort to keep it to a minimum. Fortunately, UNIX offers several tools to help you do this.
Showing disk usage and disk free: du -s, df
To see how much space your current directory is using (it, the files it contains, and all directories beneath it), use the 'du -s' (disk usage, summary) command:
     > du -s
     1039 .
This says your current directory and its contents take up 1039 kilobytes, or a little over one megabyte; if this were your home directory, you'd be in pretty good shape. You can use up to four or five megabytes of disk space before the system administrators take notice and ask you to clear out larger files and those you use infrequently.
To get a sense of how this fits into a larger picture, use the 'df' (disk free) command:
     > df
     Filesystem kbytes used avail capacity Mounted on
     /dev/sd0a 15487 9907 4032 71% /
     /dev/sd0g 93007 82787 920 99% /usre
     /dev/sd2a 181807 149137 23580 86% /usr/local
     /dev/sd0h 62359 53245 5997 90% /scratch
     /dev/sd0d 30991 1524 27918 5% /tmp
     /dev/sd0e 61999 14122 44778 24% /var
     /dev/sd0f 15487 2916 11797 20% /var/log
     /dev/sd2b 781838 724238 18509 98% /nfs/ellis/e1
     /dev/sd7a 963662 901026 14453 98% /nfs/ellis/e2
     /dev/sd5a 963662 881158 34321 96% /nfs/ellis/e3
     /dev/sd1a 961070 793193 71770 92% /nfs/ellis/e4
     midway:/var/spool/mail 145234 107101 23610 82% /var/spool/mail
     midway:/usr/local/share 1008502 838944 68708 92% /usr/local/share
     quads:/nfs/quads/q1 781838 764818 17020 98% /nfs/quads/q1
     quads:/nfs/quads/q2 1006959 967436 19384 98% /nfs/quads/q2
     quads:/nfs/quads/q3 963662 845974 69505 92% /nfs/quads/q3
     quads:/nfs/quads/q4 961070 840444 24519 97% /nfs/quads/q4
     kimbark:/nfs/kimbark/k1 638750 373788 201087 65% /nfs/kimbark/k1
     kimbark:/nfs/kimbark/k2 961070 835907 29056 97% /nfs/kimbark/k2
     kimbark:/nfs/kimbark/k3 959532 920575 10172 99% /nfs/kimbark/k3
     bluebird:/usr/local/share 300127 235306 49815 83% /usenet/share
     bluebird:/usr/local/lib/news
                               605886 78208 467090 14% /usenet/news
The important figure, so far as you're concerned, is the disk where your own files reside. If your home directory were in /q2, and there were only nine megabytes free for everyone to use, it would be a bad idea to copy a large file to your directory until more space were free.
Showing the size of files: du (-a); ls -s; wc
The 'du' command alone, with no options, will also display size summaries for each directory beneath the current one:
     > du
     828 ./Saved_mail
     183 ./Game_info
     1039 .
To see how much space each individual file, in each of the directories beneath the current one, is using, type the 'du -a' command.
Another command which shows the sizes of files only in the current directory (unlike 'du -a', which acts recursively) is 'ls -s' (list files, displaying size in kilobytes):
     > ls -s
     total 75
     9 aufs.docs                  1 refusing.mail
     7 cmds.2                     2 script.to.try
     3 cmds.5                     1 spell_files
     2 cmds.6                    19 stephen.wright.quotes
     3 faces                      1 tex_practice
     1 humor                      5 ultimate.cshrc
     1 info                       1 UNIX.quote
     1 matt.null                  1 UNIX_tricks
     2 mpage                      2 unvalentine.c
     1 name_gen                   2 uunet.info
Because 'ls -s' (without the additional "-F" option) does not distinguish between files and directories in determining size, its output can be misleading. In the above example, all names with underscores (such as "tex_practice" are directories, containing files of their own; the 'ls -s' command merely lists them as one-kilobyte files. You may wish to always combine the options, and type the command as "ls -sF".
A third useful command is 'wc' (word count), which displays the number of lines, words, and characters in each file you specify:
     > wc cmds.*
     144 1221 7126 cmds.2
      44 358 2136 cmds.5
      29 217 1216 cmds.6
     229 1880 10967 total
With all these commands, you should find keeping track of your disk usage easy.
Compressing files: compress/uncompress
Once you know how much space your files take up, you'll want to compress those that you don't use frequently, or that you haven't used in some time -- especially larger files.
The 'compress' command converts files into smaller binary files for storage, so they won't take up as much disk space. A compressed file's name ends in ".Z" (filename.Z). To reverse the compression, use the 'uncompress' command.
     > compress file
     > uncompress file.Z
You can glance at a compressed file without uncompressing it, by using the 'zcat' command.

Inode Cache

Inode-cache is made up of buckets containing a header with attached list of inodes, each header has a spin lock that controls access to the inode list while its being manipulated - locks are held for short times.

If inode entry is found in cache, then return it. If not in cache, then release lock, allocate new inode, acquire lock again, add inode to list, release lock again and initialize and return inode.

Between the time a new inode is allocated and the lock is reacquired, there may be another thread that is also allocating an inode for the same file (since it did not find the inode entry in the list either). So then both threads end up adding the same inode into the list which is not acceptable.

To avoid this, you may rescan the cache after allocating a new inode to see if the same inode has already been entered but this would cause additional overhead.

To avoid rescanning each time there is an entry operation, the OFS/1 uses timestamps to detect duplicates in the list.

When inode lookup fails, timestamp of the header is recorded. When its time to enter the new inode, the current time stamp is compared to the saved value. If it has changed then rescan the directory since a new inode since there most probably is a duplicate.

Because the directory is rescanned only when there is a change in the time- stamp, it reduces overhead.

[ back ] | [ next ]