ref: a351bcdccdf5a4273bc8dc3360a48fbb8b8aa9ea
dir: /ch3.ms/
.so tmacs .BC 3 Files .BS 2 "Input/Output .ix "I/O .LP It is important to know how to use files. In Plan 9, this is even more important. The abstractions provided by Plan 9 can be used through a file interface. If you know how to use the file interface, you also know how to use the interface for most of the abstractions that Plan 9 provides. .PP You already know a lot about files. In the past, we have been using .CW print to write messages. And, before this course, you used the library of your programming language to open, read, write, and close files. We are going to learn now how to do the same, but using the interface provided by the operating system. This is what your programming language library uses to do its job regarding input/output. .PP Consider .CW print , it is a convenience routine to print formatted messages. It writes to a file, by calling .ix "formatted output .CW write . .ix [write] Look at this program: .so progs/write.c.ms .ix [write.c] .LP This is what it does. It does the same that .CW print would do given the same string. .P1 ; 8.write hello .P2 .LP The function .CW write writes bytes into a file. Isn't it a surprise? To find out the declaration for this function, we can use .CW sig ⁱ. .ix [sig] .FS ⁱ Remember that this program looks at the source of the manual pages, in section 2, to find a function with the given name in any SYNOPSIS section of any manual page. Very convenient to get a quick reminder of which arguments receives a system function, and what does it return. .FE .P1 ; sig write long write(int fd, void *buf, long nbytes) .P2 .LP The bytes written to the file come from .CW buf , which was .CW msg in our example program. The number of bytes to write is specified by the third parameter, .CW nbytes , which was the length of the string in .CW msg . And the file were to write was specified by the first parameter, which was just .CW 1 for us. .PP Files have names, as we learned. We can use a full path, absolute or relative, to name a file. Files being used by a particular process have “names” as well. The names are called \fBfile descriptors\fP .ix "file descriptor .ix "file descriptor table and are small integers. You know from your programming courses that to read/write a file you must open it. Once open, you may read and write it until the file is closed. To identify an open file you use a small integer, its file descriptor. This integer is used by the operating system as an index in a table of open files for your process, to know which file to use for reading or writing. See figure [[!standard file descriptors!]]. .LS .PS right reset boxht=.2 boxwid=1 circle rad .4 "Process" spline -> right 1 then down "File descriptor" "table" D: [ down [ right box invis "0" ; F: box ] D0: last [].F [ right box invis "1" ; F: box ] D1: last [].F [ right box invis "2" ; F: box ] D2: last [].F [ right box invis "3" ; box invis "..."] [ right box invis "n" ; F: box ] ] spline -> from D.D0 right 1 then up then right ; box "Standard" "input" ht boxht*2 arrow from D.D1 right 1 then right ; box "Standard" "output" ht boxht*2 spline -> from D.D2 right 1 then down then right ; box "Standard" "error" ht boxht*2 reset .PE .LE F File descriptors point to files used for standard input, standard output, and standard error. .PP All processes have three files open right from the start, by convention, even if they do not open a single file. These open files have the file descriptors 0, 1, and 2. As you could see, file descriptor 1 is used for data output and is called .B "standard output" , File descriptor 0 is used for data input and is called .B "standard input" , File descriptor 2 is used for diagnostic (messages) output and is called .B "standard error" . .PP To read an open file, you may call .CW read . .ix [read] Here is the function declaration: .P1 ; sig read long read(int fd, void *buf, long nbytes) .P2 .LP It reads bytes from file descriptor .CW fd a maximum of .CW nbytes bytes and places the bytes read at the address pointed to by .CW buf . The number of bytes read is the value returned. Read does not guarantee that we would get as many bytes as we want, it reads what it can and lets us know. This program reads some bytes from standard input and later writes them to standard output. .so progs/read.c.ms .ix [read.c] .LP And here is how it works: .P1 ; 8.read from stdin, to stdout! \fI If you type this \fP from stdin, to stdout! \fI the program writes this\fP .P2 .LP When you run the program it calls .CW read , which awaits until there is something to read. When you type a line and press return, the window gives the characters you typed to the program. They are stored by .CW read at .CW buffer , and the number of bytes that it could read is returned and stored at .CW nr . Later, the program uses .CW write to write so many bytes into standard output, echoing what we wrote. .PP Many of the Plan 9 programs that accept file names as arguments work with their standard input when given no arguments. Try running .CW cat . .P1 ; cat .I "...it waits until you type something .P2 .LP It reads what you type and writes a copy to its standard output .ix [cat] .P1 ; cat from stdin, to stdout! \fI If you type this \fP from stdin, to stdout! \fI cat writes this\fP and again and again \fBcontrol-d\fP ; .P2 .LP until reaching the end of the file. The end of file for a keyboard? There is no such thing, but you can pretend there is. When you type a .I control-d by pressing the .CW d key while holding down .I Control , .ix "control-d" the program reading from the terminal gets an end of file. .PP Which file is standard input? And output? Most of the times, standard input, standard output, and standard error go to .CW /dev/cons . .ix console .ix "standard input .ix "standard output .ix "standard error This file represents the .I console for your program. Like many other files in Plan 9, this is not a real (disk) file. It is the interface to use the device that is known as the console, which corresponds to your terminal. When you read this file, you obtain the text you type in the keyboard. When you write this file, the text is printed in the screen. .PP When used within the window system, .CW /dev/cons .ix [/dev/cons] .ix "window corresponds to a fake console invented just for your window. The window system takes the real console for itself, and provides each window with a virtual console, that can be accessed via the file .CW /dev/cons within each window. We can rewrite the previous program, but opening this file ourselves. .so progs/read2.c.ms .ix [read.c] .LP This program behaves exactly like the previous one. You are invited to try. To open a file, you must call .CW open .ix [open] .ix "file name .ix path .ix "open mode specifying the file name (or its path) and what do you want to do with the open file. The integer constant .CW ORDWR . ix "[ORDWR] open~mode means to open the file for both reading and writing. This function returns a new file descriptor to let you call .CW read .ix [read] or .CW write .ix [write] for the newly open file. The descriptor is a small integer that we store into .CW fd , to use it later with .CW read and .CW write . Figure [[!descriptors opening!]] shows the file descriptors for the process running this program after the call to .CW open . It assumes that the file descriptor for the new open file was 3. .LS .PS right boxwid=1 boxht=.2 circlerad=.5 circle "Process" spline -> right 1 then down "File descriptor" "table" D: [ down [ right box invis "0" ; F: box ] D0: last [].F [ right box invis "1" ; F: box ] D1: last [].F [ right box invis "2" ; F: box ] D2: last [].F [ right box invis "3" ; F: box ] DN: last [].F [ right box invis ; box invis "..."] [ right box invis "n" ; F: box ] ] move right 2 ; C: box "\f(CW/dev/cons\fP" CC: circle invis at C spline -> from D.D1 right 1 then to CC chop spline -> from D.D0 right 1 then to CC chop spline -> from D.D2 right 1 then to CC chop spline -> from D.DN right 1 then to CC chop reset .PE .LE F File descriptors for the program after opening \f(CW/dev/cons\fP. .PP When the file is no longer useful for the program, it can be closed. This is achieved by calling .CW close , .ix [close] which releases the file descriptor. In our program, we could have open .CW /dev/cons several times, one for reading and one for writing .P1 infd = open("/dev/cons", OREAD); outfd = open("/dev/cons", OWRITE); .P2 .LP using the integer constants .CW OREAD and .CW OWRITE , .ix "[OREAD] open~mode .ix "[OWRITE] open~mode that specify that the file is to be open only for reading or writing. But it seemed better to open the file just once. .PP The file interface provided for each process in Plan 9 has a file that provides the list of open file descriptors for the process. For example, to know which file descriptors are open in the shell we are using we can do this. .ix "process [fd] file .ix "file descriptor .ix [$pid] .P1 .ps -2 ; cat /proc/$pid/fd /usr/nemo 0 r M 94 (0000000000000001 0 00) 8192 18 /dev/cons 1 w M 94 (0000000000000001 0 00) 8192 2 /dev/cons 2 w M 94 (0000000000000001 0 00) 8192 2 /dev/cons 3 r c 0 (0000000000000002 0 00) 0 0 /dev/cons 4 w c 0 (0000000000000002 0 00) 0 0 /dev/cons 5 w c 0 (0000000000000002 0 00) 0 0 /dev/cons 6 rw | 0 (0000000000000241 0 00) 65536 38 #|/data 7 rw | 0 (0000000000000242 0 00) 65536 81320369 #|/data1 8 rw | 0 (0000000000000281 0 00) 65536 0 #|/data 9 rw | 0 (0000000000000282 0 00) 65536 0 #|/data1 10 r M 10 (00003b49000035b0 13745 00) 8168 512 /rc/lib/rcmain 11 r M 94 (0000000000000001 0 00) 8192 18 /dev/cons ; .ps +2 .P2 .LP The first line reports the current working directory for the process. .ix "current directory Each other line reports a file descriptor open by the process. Its number is listed on the left. As you could see, our shell has descriptors 0, 1, and 2 open (among others). All these descriptors refer to the file .CW /dev/cons , whose name is listed on the right for each descriptor. Another interesting information is that the descriptor 0 is open just for reading (\f(CWOREAD\fP), because there is an .ix "[OREAD] open~mode .CW r listed right after the descriptor number. And as you can see, both standard output and error are open just for writing (\f(CWOWRITE\fP), because there is a .CW w .ix "[OWRITE] open~mode printed after the descriptor number. The .CW /proc/$pid/fd file is a useful information to track bugs related to file descriptor problems. Which descriptors has the typical process open? If you are skeptic, this program might help. .so progs/sleep.c.ms .ix [sleep.c] .ix [sleep] .LP It prints its PID, and hangs around for one hour. After running this program .P1 ; 8.sleep process pid is 1413. have fun. .I "...and it hangs around for one hour." .P2 .LP we can use another window to inspect the file descriptors for the process. .P1 .ps -2 ; cat /proc/1413/fd /usr/nemo/9intro 0 r M 94 (0000000000000001 0 00) 8192 87 /dev/cons 1 w M 94 (0000000000000001 0 00) 8192 936 /dev/cons 2 w M 94 (0000000000000001 0 00) 8192 936 /dev/cons 3 r c 0 (0000000000000002 0 00) 0 0 /dev/cons 4 w c 0 (0000000000000002 0 00) 0 0 /dev/cons 5 w c 0 (0000000000000002 0 00) 0 0 /dev/cons 6 rw | 0 (0000000000000241 0 00) 65536 38 #|/data 7 rw | 0 (0000000000000242 0 00) 65536 85044698 #|/data1 8 rw | 0 (0000000000000281 0 00) 65536 0 #|/data 9 rw | 0 (0000000000000282 0 00) 65536 0 #|/data1 .ps +2 .P2 .LP Your process has descriptors 0, 1, and 2 open, as they should be. However, it seems that there are many other ones open as well. That is why you cannot assume that the first file you open in your program is going to obtain the file descriptor number 3. It might already be open. You better be aware. .PP There is one legitimate question still pending. After we open a file, how does .CW read know from where in the file it should read? The function knows how many bytes we would like to read at most. But its parameters tell nothing about the .I offset in the file where to start reading. And the same question applies to .CW write as well. .PP The answer comes from .CW open , Each time you open a file, the system keeps track of a .B "file offset" for that open file, to know the offset in the file where to start working at the next .CW read or .CW write . Initially, this file offset is zero. When you write, the offset is advanced the number of bytes you write. When you read, the offset is also advanced the number of bytes you read. Therefore, a series of writes would store bytes .I sequentially , .ix "sequential access one write at a time, each one right after the previous one. And the same happens while reading. .PP The offset for a file descriptor can be changed using the .CW seek .ix [seek] system call. Its second parameter can be 0, 1, or 2 to let you change the offset to an absolute position, to a relative one counting from the old value, and to a relative one counting from the size of the file. For example, this sets the offset in .CW fd to be 10: .P1 seek(fd, 10, 0); .P2 .LP This advances the offset 5 bytes ahead: .P1 seek(fd, 5, 1); .P2 .LP And this moves the offset to the end of the file: .P1 seek(fd, 0, 2); .P2 .LP We did not use the return value from .CW seek , but it is useful to know that it returns the new offset for the file descriptor. .ix offset .BS 2 "Write games .LP This program is a variant of the first one in this chapter, but writes the salutation to a regular file, and not to the console .so progs/fhello.c.ms .ix [fhello.c] .LP We can create a file to play with by copying .CW /NOTICE .ix [/NOTICE] to .CW afile , and then run this program to see what happens. .P1 ; cp /NOTICE afile ; 8.fhello .P2 .LP This is what was at .CW /NOTICE : .P1 ; cat /NOTICE Copyright © 2002 Lucent Technologies Inc. All Rights Reserved ; .P2 .LP and this is what is in .CW afile : .P1 ; cat afile hello ght © 2002 Lucent Technologies Inc. All Rights Reserved .P2 .LP At first sight, it seems that something weird happen. The file has one .ix "new line extra line. However, part of the original text has been lost. These two things seem contradictory but they are not. Using .CW xd may reveal what happen: .P1 ; xd -c afile 0000000 h e l l o \en g h t c2 a9 2 0 0 0000010 2 L u c e n t T e c h n o l 0000020 o g i e s I n c . \en A l l R 0000030 i g h t s R e s e r v e d \en 000003f ; xd -c /NOTICE 0000000 C o p y r i g h t c2 a9 2 0 0 0000010 2 L u c e n t T e c h n o l 0000020 o g i e s I n c . \en A l l R 0000030 i g h t s R e s e r v e d \en 000003f .P2 .LP Our program opened .CW afile , which was a copy of .CW /NOTICE , and then it wrote “\f(CWhello\en\fP”. After the call to .CW open , .ix [open] the file offset for the new open file was set zero. This means that .CW write .ix [write] wrote 6 bytes into .CW afile starting at offset 0. The first six bytes in the file, which contained “\f(CWCopyri\fP”, have been overwritten by our program. But .CW write did just what it was expected to do. Write 6 bytes into the file starting at the file offset (0). Nothing more, nothing less. It does not truncate the file (it shouldn't!). It does not .I insert . It just writes. .PP If we change the program above, adding a second call to .CW write , so that it executes this code .P1 write(fd, "hello\en"); write(fd, "there\en"); .P2 .LP we can see what is inside .CW afile after running the program. .P1 ; cat afile hello there 2002 Lucent Technologies Inc. All Rights Reserved .P2 .P1 ; xd -c afile 0000000 h e l l o \en t h e r e \en 2 0 0 0000010 2 L u c e n t T e c h n o l 0000020 o g i e s I n c . \en A l l R 0000030 i g h t s R e s e r v e d \en 000003f .P2 .ix [xd] .LP After the first call to .CW write , the file offset was 6. Therefore, the second write happen .ix "file offset starting at offset 6 in the file. And it wrote six more bytes. Once more, it did just its job, write bytes. The file length is the same. The number of lines changed because the number of newline characters in the file changed. The console advances one line each time it encounters a newline, but it is just a single byte. .PP Figure [[!file offset!]] shows the elements involved in writing this file, after the first call to .CW write , and before the second call. The file descriptor, which we assume was 3, points to a data structure containing information about the open file. This data structure keeps the file offset, to be used for the following .CW read or .CW write operation, and record what the file was open for, e.g., .CW OWRITE . .ix "[OWRITE] open~mode Plan 9 calls this data structure a .CW Chan (Channel), .ix [Chan] and there is one per file in use in the system. Besides the offset and the open mode, it contains all the information needed to let the kernel reach the file server and perform operations on the file. Indeed, a Chan is just something used by Plan 9 to speak to a server regarding a file. This may require doing remote .ix 9P .ix "file server .ix "channel procedure calls across the network, but that is up to your kernel, and you can forget it. .LS .PS .CW down boxwid=.2 boxht=.2 circle rad .4 "\fRProcess\fP" line -> down " \fRFile descriptor\fP" ljust " \fRtable\fP" ljust D: [ down [ right box invis "0" ; F: box wid 1 ] D0: last [].F [ right box invis "1" ; F: box wid 1 ] D1: last [].F [ right box invis "2" ; F: box wid 1 ] D2: last [].F [ right box invis "3" ; F: box wid 1 ] D3: last [].F [ right box invis ; box invis wid 1 "..."] [ right box invis "n" ; F: box wid 1 ] ] arrow -> from D.D3 right 1 C: box wid 1.5 ht 3*boxht down X: [ down O: box invis "offset: 6" ljust box invis "mode: OWRITE " ljust F: box invis "file: " ljust ] with .nw at C.nw line invis from X.F.e right .5 A: [ spline -> right then down then left then down ] with .nw at last line.e H: [ right .R box "h" box "e" box "l" box "l" box "o" box "\en" S: box wid .6 "..." .R ] with .nw at A.sw box invis wid .75 "afile" line invis from X.O.e right 1; spline -> right then right then to H.S.nw dotted box invis "\fRChan\fP" with .sw at C.nw .R reset .PE .LE F The file offset for next operations is kept separate from the file descriptor. .PP We can use .CW seek .ix [seek] to write at a particular offset in the file. For example, the following code writes starting at offset 10 into our original version of .CW afile . .P1 int fd; fd = open("afile", OWRITE); seek(fd, 10, 0); write(fd, "hello\en", 6); close(fd); .P2 .LP The contents of .CW afile have six bytes changed, as it could be expected. .ix [xd] .P1 ; xd -c afile 0000000 C o p y r i g h t h e l l o \en 0000010 2 L u c e n t T e c h n o l 0000020 o g i e s I n c . \en A l l R 0000030 i g h t s R e s e r v e d \en 000003f .P2 .LP How can we write new contents into .CW afile , getting rid of anything that could be in the file before we write? Simply by specifying to .CW open that we want to .B truncate the file besides opening it. To do so, we can do a bit-or of the desired open mode and .CW OTRUNC , .ix "[OTRUNC] open~mode a flag that requests file truncation. This program does so, and writes a new string into our file. .so progs/thello.c.ms .ix [thello.c] .LP After running this program, .CW afile contains just the 6 bytes we wrote: .P1 ; 8.thello ; cat afile hello ; .P2 .LP The call to .CW open , caused the file .CW afile to be truncated. If was empty, open for writing on it, and the offset for the next file operation was zero. Then, .CW write wrote 6 bytes, at offset zero. At last, we closed the file. .PP What would the following program do to our new version of .CW afile ? .so progs/seekhello.c.ms .ix [seek] .ix [seekhello.c] .LP All system calls are very obedient. They do just what they are asked to do. The call to .CW seek changes the file offset to 32. Therefore, .CW write must write six bytes at offset 32. This is the output for .CW ls and .CW xd on the new file after running this program: .P1 ; 8.seekhello ; ls -l afile --r--r--r-- M 19 nemo nemo 38 Jul 9 18:14 afile ; xd -c afile 0000000 h e l l o \en 00 00 00 00 00 00 00 00 00 00 0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0000020 t h e r e \en 0000026 .P2 .LP The size is 38 bytes. That is the offset before .CW write , .ix [write] 32, plus the six bytes we wrote. In the contents you see how all the bytes that we did not write were set to zero by Plan 9. And we know a new thing: The size of a file corresponds to the highest file offset ever written on it. .PP A variant of this program can be used to create files of a given size. To create a 1 Gigabyte file you do not need to write that many bytes. A single write suffices with just one byte. Of course, that write must be performed at an offset of 1 Gigabyte (minus 1 byte). .PP Creating large files in this way is different from writing all the zeroes yourself. First, it takes less time to create the file, because you make just a couple of system calls. Second, it can be that your new file does .I not consume all its space in the disk until you really use it. Because Plan 9 knows .ix "disk space the new size of the file, and it knows you never did write most of it, it can just record the new size and allocate disk space only for the things you really wrote. Reading other parts of the file yield just zeroes. There is no need to store all those zero bytes in the disk. .PP This kind of file (i.e., one created using .CW seek and .CW write ), is called a .B "file with holes". The name comes from considering that the file has “holes” on it, where you did never write anything. Of course, the holes are not really stored in a disk. It is funny to be able to store files for a total amount of bytes that exceeds the disk capacity, but now you know that this can happen. .PP To append some data to a file, we can use .CW seek to set the offset at the end of the file before calling write, like in .P1 fd = open("afile", OWRITE); seek(fd, 0, 2); // move to the end write(fd, bytes, nbytes); .P2 .LP For some files, like log files used to append diagnostic messages, or mail folders, used to append mail messages, writing should always happen at the end of the file. In this case, it is more appropriate to use an .B "append only" permission bit supported by the Plan 9 file server: .ix [chmod] .ix "[chmod] flag~[+a] .P1 .ps -1 ; chmod +a /sys/log/diagnostics ; ls -l /sys/log/diagnostics a-rw-r--r-- M 19 nemo nemo 0 Jul 10 01:11 /sys/log/diagnostics .ps +1 .P2 .LP This guarantees that any write will happen at the end of existing data, no matter what the offset is. Doing a .CW seek in all programs using this file might not suffice. If there are multiple machines writing to this file, each machine would keep its own offset for the file. Therefore, there is some risk of overwriting some data in the file. However, using the .CW +a permission bit fixes this problem once and for all. .BS 2 "Read games .LP To read a file it does not suffice to call .CW read once. This point may be missed when using this function for the first few times. The problem is that .CW read does no guarantee that all the bytes in the file could be read in the first call. For example, early in this chapter we did read from the console. Before typing a line, there is no way for .CW read to obtain its characters. The result in that when reading from the console our program did read one line at a time. If we change the program to read from a file on a disk, it will probably read as much as it fits in the buffer we supply for reading. .PP Usually, we are supposed to call .CW read until there is nothing more to read. That happens when the number of bytes read is zero. For example, this program reads the whole file .CW /NOTICE , and prints what it can read each time. The program is unrealistic, because usually you should employ a much larger read buffer. Memory is cheap these days. .so progs/nread.c.ms .ix [nread.c] .LP Although we did not check out error conditions in most of the programs in this chapter. This program does so. When .CW open fails , it returns .CW -1 . The program issues a diagnostic and terminates if that is the case. Also, after calling .CW read , it does not just check for .CW "nr == 0" , which means that there is nothing more to read. Instead, it checks for .CW "nr <= 0" , because .CW read returns .CW -1 when it fails. The call to .CW write might fail as well. It returns the number of bytes that could be written, and it is considered an error when this number differs from the one you specified. .BS 2 "Creating and removing files .LP The .CW create .ix [create] .ix "file creation system call creates one file. It is very similar to .CW open . .ix [open] After creating the file, it returns an open file descriptor for the new file, using the specified mode. It accepts the same parameters used for open, plus an extra one used to specify permissions for the new file encoded as a single integer. .PP This program creates its own version of .CW afile , without placing on us the burden of creating it. It does not check errors, because it is just an example. .so progs/create.c.ms .ix [create.c] .LP To test it, we remove our previous version for .CW afile , run this program, and ask .CW ls and .CW cat to print information about the file and its contents. .P1 ; rm afile ; ls afile ls: afile: 'afile' file does not exist ; 8.create ; ls -l afile --rw-r--r-- M 19 nemo nemo 11 Jul 9 18:39 afile ; cat afile a new file .P2 .LP In fact, there was no need to remove .CW afile before running the program. If the file being created exists, .CW create .ix "truncate truncates it. If it does not exist, the file is created. In either case, we obtain a new file descriptor for the file. .PP Directories can be created by doing a bit-or of the integer constant .ix "directory creation .ix [DMDIR] .CW DMDIR with the rest of the permissions given to .CW create . This sets a bit (called DMDIR) in the integer used to specify permissions, and the system creates a directory instead of a file. .P1 fd = create("adir", OREAD, DMDIR|0775); .P2 .LP You cannot write into directories. That would be dangerous. Instead, when you create and remove files within the directory, Plan 9 updates the contents of the directory file for you. If you modify the previous program to try to create a directory, you must remove the line calling .CW write . But you should still close the file descriptor. .PP Removing a file is simple. The system call .CW remove .ix [remove] .ix "file deletion removes the named file. This program is similar to .CW rm . .ix [rm] .so progs/rm.c.ms .ix [rm.c] .LP It can be used like the standard .I rm (1) tool, to get rid of multiple files. When .CW remove fails it alerts the user of the problem. .P1 ; 8.rm rm.8 x.c afile 8.rm: 'x.c' file does not exist .P2 .LP Like other calls, .CW remove returns .CW -1 .ix "system call error when it fails. In this case we print the program name (\f(CWargv[0]\fP) and the error string. That suffices to let the user know what happen and .ix "error string take any appropriate action. Note how the program iterates through command line arguments starting at 1. Otherwise, it would remove itself! .PP A directory that is not empty, and contains other files, cannot be removed using .ix "empty directory .CW remove . To remove it, you must remove its contents first. Plan 9 could remove the whole file tree rooted at the directory, but it would be utterly dangerous. Think about .CW "rm /" . The system command .CW rm accepts option .CW -r .ix "[rm] flag~[-r] to recursively descend the named file and remove it and all of its contents. It must be used with extreme caution. When a file is removed, it is gone. There is nothing you can do to bring it back to life. Plan 9 does not have a .I wastebasket . .ix wastebasket If you are not sure about removing a file, just don't do it. Or move it to .CW /tmp or to some other place where it does not gets in your way. .PP Now that we can create and remove files, it is interesting to see if a file does exist. This could be done by opening the file just to see if we can. However, it is more appropriate to use a system call intended just to check if we can access a file. It is called, perhaps surprisingly, .CW access . .ix [access] .ix "checking~for access For example, this code excerpt aborts the execution of its program when the file name in .CW fname does not exist: .P1 if (access(fname, AEXIST) < 0) sysfatal("%s does not exist", fname); .P2 .LP .ix "[AEXIST] access~mode The second parameter is an integer constant that indicates what do you want .CW access to check the file for. For example, .CW AWRITE .ix "[AWRITE] access~mode checks that you could open the file for writing, .CW AREAD .ix "[AREAD] access~mode does the same for reading, and .CW AEXEC .ix "[AEXEC] access~mode does the same for executing it. .BS 2 "Directory entries .LP Files have data. There are many examples above using .CW cat and .CW xd to retrieve the data stored in a file. Besides, files have .B metadata , i.e., data about the data. File metadata is simply what the system needs to know about the file to be able to implement it. File metadata includes the file name, the file size, the time for the last modification to the file, the time for the last access to the file, and other attributes for the file. Thus, file metadata is also known as .B "file attributes" . .PP Plan 9 stores attributes for a file in the directory that contains the file. Thus, the data structure that contains file metadata is known as a .B "directory entry" . A directory contains just a sequence of entries, each one providing the attributes for a file contained in it. Let's see this in action: .P1 ; lc ; cat . ; .P2 .LP An empty directory is an empty file. .P1 ; touch onefile ; xd -c . 0000000 B 00 M 00 13 00 00 00 00 00 00 00 00 bf a1 01 0000010 00 00 00 00 00 a4 01 00 00 \er I b1 D \er I b1 0000020 D 00 00 00 00 00 00 00 00 07 00 o n e f i 0000030 l e 04 00 n e m o 04 00 n e m o 04 00 0000040 n e m o 0000044 .P2 .LP After creating .CW onefile in this empty directory, we see a whole bunch of bytes in the directory. Nothing that we could understand by looking at them, although you can see how there are several strings, including .CW nemo and .CW onefile within the data kept in the directory. .PP For each file in the directory, there is an entry in the directory to describe the file. The format is independent of the architecture used, which means that the format .ix "architecture independent .ix "network format is the same no matter the machine that stored the file. Because the machine using the directory (e.g., your terminal) may differ from the machine keeping the file (e.g., your file server), this is important. Each machine could use a different format to encode integers, strings, and other data types. .PP We can double-check our belief by creating a second file in our directory. After doing so, the directory has twice the size: .P1 ; touch another ; xd -c . 0000000 B 00 M 00 13 00 00 00 00 00 00 00 00 c0 a1 01 0000010 00 00 00 00 00 a4 01 00 00 ! I b1 D ! I b1 0000020 D 00 00 00 00 00 00 00 00 07 00 a n o t h 0000030 e r 04 00 n e m o 04 00 n e m o 04 00 0000040 n e m o B 00 M 00 13 00 00 00 00 00 00 00 0000050 00 bf a1 01 00 00 00 00 00 a4 01 00 00 \er I b1 0000060 D \er I b1 D 00 00 00 00 00 00 00 00 07 00 o 0000070 n e f i l e 04 00 n e m o 04 00 n e 0000080 m o 04 00 n e m o 0000088 .P2 .LP When programming in C, there are convenience functions that convert this portable (but not amenable) data structure into a C structure. The C data type declared in .CW libc.h .ix [libc.h] .ix "C library that describes a directory entry is as follows: .P1 typedef struct Dir { /* system-modified data */ ushort type; /* server type */ uint dev; /* server subtype */ /* file data */ Qid qid; /* unique id from server */ ulong mode; /* permissions */ ulong atime; /* last read time */ ulong mtime; /* last write time */ vlong length; /* file length */ char *name; /* last element of path */ char *uid; /* owner name */ char *gid; /* group name */ char *muid; /* last modifier name */ } Dir; .P2 .ix [Dir] .LP From the shell, we can use .CW ls .ix [ls] to obtain most of this information. For example, .P1 ; ls -lm onefile [nemo] --rw-r--r-- M 19 nemo nemo 0 Jul 9 19:24 onefile .P2 .IP • The file name is .CW onefile . The field .CW name .ix "file name within the directory entry is a string with the name. Just with the name. An absolute path to refer to this file would include all the names from that of the root directory down to the file; each component separated by a slash. But the file name is just .CW onefile . .IP • The times for the last access and for the last modification of the file (this one printed by .CW ls ) are kept at .CW atime .ix "file access time and .CW mtime .ix [mtime] .ix "file modification time respectively. These dates are codified in seconds since the epoch, as we saw for .CW /dev/time . .ix [/dev/time] .IP • The length for the file is zero. This is stored at field .CW length .ix "file length in the directory entry. The file is owned by user .CW nemo .ix "file owner .ix "file group .ix "permissions and belongs to the group .CW nemo . These values are stored as string, using the fields .CW uid .ix [uid] .ix "user id (user id) and .CW gid .ix [gid] .ix "group id (group id) respectively. .IP • The field .CW mode .ix "file mode records the file permissions, also known as the mode (that is why .CW chmod .ix [chmod] has that name, for “change mode”). Permissions are encoded in a single integer, as we saw. For .ix "octal permissions this file mode would be .CW 0644 . .IP • The file was last modified by user .CW nemo , and this value is encoded as a string in the directory entry, using field .CW muid (modification user id). .ix "modification user id .IP • The fields .CW type , .CW dev , and .CW qid .ix QID identify the file. They deserve a separate explanation on their own that we defer by now. .LP To obtain the directory entry for a file, i.e., its attributes, we can use .CW dirstat . .ix [dirstat] This function uses the actual system call, .CW stat , .ix [stat] to read the data, and returns a .CW Dir .ix [Dir] structure that is more convenient to use in C programs. This structure is stored in dynamic memory allocated with .CW malloc .ix [malloc] by .CW dirstat , and the caller is responsible for calling .CW free .ix [free] on it. .PP The following program gives some information about .CW /NOTICE , nothing that .CW ls could not do, and produces this output when run: .P1 ; 8.stat file name: NOTICE file mode: 0444 file size: 63 bytes ; .P2 .so progs/stat.c.ms .ix [stat.c] .ix [stat] .LP Note that the program called .CW free only once, for the whole .CW Dir . .ix [Dir] The strings pointed to by fields in the structure are stored along with the structure itself in the same .CW malloc -allocated memory. Calling .CW free once suffices. .PP An alternative to using this function is using .CW dirfstat , .ix [dirfstat] which receives a file descriptor instead of a file name. This function calls .CW fstat , .ix [fstat] which is another system call similar to .CW stat .ix [stat] (but receiving a file descriptor instead of a file name). Which one to use depends on what do you have at hand, a name, or a file descriptor. .PP Because directories contain directory entries, reading from a directory is very similar to what we have just done. The function .CW read .ix "directory read can be used to read directories as well as files. The only difference is that the system will read only an integral number of directory entries. If one more entry does not fit in the buffer you supply to .CW read , it will have to wait until you read again. .PP The entries are stored in the directory in a portable, machine independent, and not amenable, format. Therefore, instead of using .CW read , it is more convenient to use .CW dirread . .ix [dirread] This function calls .CW read to read the data stored in the directory. But before returning to the caller, it .I unpacks .ix "network format them into a, more convenient, array of .CW Dir structures. .PP As an example, the next program lists the current directory, using .CW dirread to obtain the entries in it. .PP Running the program yields the following output. As you can see, the directory was being used to keep a few C programs and compile them. .P1 ; 8.lsdot 8.lsdot create.8 create.c lsdot.8 lsdot.c ; .P2 .so progs/lsdot.c.ms .ix [lsdot.c] .LP The array of directory entries is returned from .CW dirread using a pointer parameter passed by reference (We know, C passes all parameters by value; The function receives a pointer to the pointer). Such array is allocated by .CW dirread using .CW malloc , like before. Therefore, the caller must call .CW free (once) to release this memory. The number of entries in the array is the return value for the function. Like .CW read would do, when there are no more entries to be read, the function returns zero. .PP Sometimes it is useful to change file attributes. For example, changing the length to zero may truncate the file. A rename within the same directory can be achieved by changing the name in the directory entry. Permissions can be changed by updating the mode in the directory entry. Some of the attributes cannot be updated. For example, it is illegal to change the modification type, or any of the .CW type , .CW dev , and .CW qid fields. .PP The function .CW dirwstat .ix [dirwstat] is the counterpart of .CW dirstat . .ix [dirstat] It works in a similar way, but instead of reading the attributes, it updates them. New values for the update are taken from a .CW Dir structure given as a parameter. However, the function ignores any field set to a null value, to allow you to change just one attribute, or a few ones. Beware that zero is not a null value for some of the fields, because it would be a perfectly legal value for them. The function .CW nulldir is to be used to null all of the fields in a given .CW Dir . .PP Here is an example. The next program is similar to .CW chgrp (1), change group, .ix [chgrp] and can be used to change the group for a file. The .CW main function iterates through the file name(s) and calls a .CW chgrp function to do the actual work for each file. .so progs/chgrp.c.ms .ix [chgrp.c] .LP The interesting part is the implementation of the .CW chgrp function. It is quite simple. Internally, .CW dirwstat .I packs the structure into the portable format, and calls .CW wstat .ix [wstat] (the actual system call). As a remark, there is also a .CW dirfwstat .ix [dirfwstat] variant, that receives a file descriptor instead of a file name. It is the counterpart of .CW dirfstat and uses the .CW fwstat .ix [fwstat] system call. Other attributes in the directory entry can be updated as done above for the group id. .LP The resulting program can be used like the real .I chgrp (1) .P1 ; 8.chgrp planb chgrp.c chgrp.8 ; ls -l chgrp.c chgrp.8 --rw-r--r-- M 19 nemo planb 1182 Jul 10 12:09 chgrp.8 --rw-r--r-- M 19 nemo planb 377 Jul 10 12:08 chgrp.c ; .P2 .BS 2 "Listing files in the shell .LP It may be a surprise to find out that there is now a section with this title. You know all about listing files. It is a matter of using .CW ls .ix [ls] .ix [lc] .ix "file list .ix "directory list and other related tools. Well, there is something else. The shell on its own knows how to list files, to help you type names. Look at this session: .P1 ; cd $home ; lc bin lib tmp ; echo * bin lib tmp .P2 .LP First, we used .CW lc to list our home. Later, we used just the shell. It is clear that .CW echo is simply echoing its arguments. It knows nothing about listing files. Therefore, the shell had to supply .CW bin , .CW lib , and .CW tmp , as the arguments for .CW echo (instead of supplying the “\f(CW*\fP”). It could be either the shell or echo the one responsible for this behavior. There is no magic, and no other program was involved on this command line. .PP The shell gives special meaning to certain characters (we already saw two: “\f(CW$\fP”, and “\f(CW'\fP”). One of them is “\f(CW*\fP”. When the a command line contains a word that is “\f(CW*\fP”, it is replaced with the names for all the files in the current directory. Indeed, “\f(CW*\fP” works for all directories: .P1 ; lc bin 386 rc ; echo bin/* bin/386 bin/rc ; .P2 .LP .ix [echo] .ix "shell variable .ix "environment variable .ix "variable expansion In this case, the shell replaced .CW bin/* with two names before running echo: .CW bin/386 and .CW bin/rc . This is called .B globbing , and it works as follows. When the shell reads a command line, it looks for .B "file name patterns" . A pattern is an expression that describes file names. It can be just a file name, but useful patterns can include special characters like “\f(CW*\fP”. The shell replaces the pattern with all file names .B matching the pattern. .PP For example, .CW * .ix "[*] pattern matches with any sequence of characters not containing “\f(CW/\fP”. Therefore, in this directory .P1 ; lc bin book lib tmp .P2 .LP the pattern .CW * matches with .CW bin , .CW book , .CW lib , and .CW tmp : .P1 ; echo * bin book lib tmp .P2 .LP The pattern .CW b* matches with any file name that has an initial “\f(CWb\fP” followed by “\f(CW*\fP”, i.e, followed by anything. This means .P1 ; echo b* bin book .P2 .LP The pattern .CW *i* matches with anything, then an .CW i , and then anything: .P1 ; echo *i* bin lib .P2 .LP Another example .P1 ; echo *b* bin book lib .P2 .LP showing that the part of the name matched by .CW * can be also an empty string! Patterns like this one mean .I "the file name has a .CW b .I "in it" . .PP Patterns may appear within path names, to match against .ix "file name different levels in the file tree. For example, we might want to search for the file containing .CW ls , and this would be a brute force approach: .P1 ; ls /ls ls: /ls: '/ls' file does not exist .P2 .LP Not there. Let's try one level down .P1 ; ls /*/ls /bin/ls .P2 .LP Found! But let's assume it was not there either. .ix "file searching .P1 ; ls /*/*/ls .P2 .LP It might be at .CW /usr/bin/ls . Not in a Plan 9 system, but we did not know. Each .CW * in the pattern .CW /*/*/ls matches with any file name. Therefore, this patterns means .I "any file named .CW ls , .I "inside any directory, which is inside any directory that .I "is found at .CW / . .PP This mechanism is very powerful. For example, this directory contains a lot of source and object files. We can use a pattern to remove just the object files. .P1 ; lc 8.out echo.c err.c open.c echo.8 err.8 open.8 sleep.c ; rm *.8 .P2 .LP The shell replaced the pattern .CW *.8 with any file name terminated with .CW .8 . .ix [rm] Therefore, .CW rm received as arguments all the names for object files. .P1 ; lc 8.out echo.c err.c open.c sleep.c .P2 .LP Patterns may contain a “\f(CW?\fP”, which matches a single character. .ix "[?] pattern For example, we know that the linkers generate output files named .CW 8.out , .CW 5.out , etc. This removes any temporary binary that we might have in the directory: .P1 ; rm ?.out .P2 .LP Any file name containing a single character, and then .CW .out , matches this pattern. The shell replaces the pattern with appropriate file names, and then executes the command line. If no file name matches the pattern, the pattern itself is untouched by the shell and used as the command argument. After the previous command, if we try again .P1 ; rm ?.out rm: ?.out: '?.out' file does not exist .P2 .LP Another expression that may be used in a pattern is a series of characters between square brackets. It matches any single character within the brackets. For example, .ix "character range pattern instead of using .CW ?.out we might have used .CW [58].out in the command line above. The only file names matching this expression are .CW 5.out and .CW 8.out , which were the names we meant. .PP Another example. This lists any C source file (any string followed by a single dot, and then either a .CW c or an .CW h ). .P1 ; lc *.[ch] .P2 As a shorthand, consecutive letters or numbers within the brackets may be abbreviated by using a .CW - between just the first and the last ones. An example is .CW [0-9] , which matches again any single digit. .PP The directory .ix "file dump .ix "file archive .ix [/n/dump] .CW /n/dump keeps a file tree that uses names reflecting dates, to keep a copy of files in the system for each date. For example, .CW /n/dump/2002/0217 is the path for the dump (copy) made in February 17th, 2002. The command below uses a pattern to list directories for dumps made the 17th of any month not after June, in a year beyond 2000, but ending in 2 (i.e., just 2002 as of today). .P1 ; ls /n/dump/2*2/0[1-6]17 /n/dump/2002/0117 /n/dump/2002/0217 /n/dump/2002/0317 /n/dump/2002/0417 /n/dump/2002/0517 /n/dump/2002/0617 .P2 .LP In general, you concoct patterns to match on file names that may be of interest for you. The shell knows nothing about the meaning of the file names. However, you can exploit patterns in file names using file name patterns. Confusing? .PP To ask the shell not to touch a single character in a word that might be otherwise considered a pattern, the word must be quoted. For example, .ix "quoting .P1 ; lc bin lib tmp ; touch '*' ; echo * * bin lib tmp .P2 .LP Because the .CW * for .CW touch was quoted, the shell took it verbatim. It was not interpreted as a pattern. However, in the next command line it was used unquoted and taken as a pattern. Removing the funny file we just created is left as an exercise. But be careful. Remember what .ix [rm] .P1 ; rm * .P2 would do! .BS 2 "Buffered Input/Output .ix "buffered I/O .LP The interface provided by .CW open , .CW close , .CW read , and .CW write .ix [open] .ix [close] .ix [read] .ix [write] suffices many times to do the task at hand. Also, in many cases, it is just the more convenient interface for doing I/O to files. For example, .CW cat .ix [cat] must just write what it reads. It is just fine to use .CW read and .CW write for implementing such a tool. But, what if our program had to read one byte at a time? or one line at a time? We can experiment using the program below. It is a simple .CW cp , .ix [cp] .ix "file copy that copies one file into another, but using the size for the buffer that we supply as a parameter. .so progs/bcp.c.ms .ix [bcp.c] .LP We are going to test our new program using a file created just for this test. To create the file, we use .CW dd . This is a tool that is useful to copy bytes in a controlled way from one place to another (its name stands for .I "device to device" ). Using this command .ix "device to~device .ix [dd] .P1 ; dd -if /dev/zero -of /tmp/sfile -bs 1024 -count 1024 1024+0 records in 1024+0 records out ; ls -l /tmp/sfile --rw-r--r-- M 19 nemo nemo 1048576 Jul 29 16:20 /tmp/sfile .P2 .LP we create a file with 1 Mbyte of bytes, all of them zero. The option .ix "file creation .CW -if lets you specify the input file for .CW dd , i.e., where to read bytes from. In this case, we used .CW /dev/zero , which a (fake!) file that seems to be an unlimited sequence of zeroes. Reading it would just return as many zeroes as bytes you tried to read, and it would never give an end of file indication. The option .CW -of lets you specify which file to use as the output. In this case, we created the file .CW /tmp/sfile , which we are going to use for our experiment. .PP This tool, .CW dd , reads from the input .ix "file block file one block of bytes after another, and writes each block read to the output file. A block is also known as a .I record , as the output from the program shows. In our case, we used .CW -bs (block size) to ask .CW dd to read blocks of 1024 bytes. We asked .CW dd to copy just 1024 blocks, using its .CW -count option. The result is that .CW /tmp/sfile has 1024 blocks of 1024 bytes each (therefore 1 Mbyte) copied from .CW /dev/zero . .ix [/dev/zero] .PP We are using a relic that comes from ancient times! Times when tapes and even more weird .ix tape artifacts were very common. Many of such devices required programs to read (or write) one record at a time. Using .CW dd was very convenient to duplicate one tape onto another and similar things. Because it was not common to read or write partial records, the diagnostics printed by .CW dd show how many entire records were read (\f(CW1024\fP here), and how many bytes were read from a last but partial record (\f(CW+0\fP in our case). And the same for writing. Today, it is very common to see always \f(CW+0\fP for both the data read in, and the data written out. By the way, for our little experiment we could have used just .CW dd , instead of writing our own dumb version for it, but it seemed more appropriate to let you read the code to review file I/O once more. .PP So, what would happen when we copy our file using our default buffer size of 8Kbytes? .ix buffer .ix [time] .ix "performance .P1 ; time 8.bcp /tmp/sfile /tmp/dfile 0.01u 0.01s 0.40r 8.bcp /tmp/sfile /tmp/dfile .P2 .LP Using the command .CW time , to measure the time it takes for a command to run, we see that using a 8Kbyte buffer it takes 0.4 seconds of real time (\f(CW0.40r\fP) to copy a 1Mbyte file. As an aside, .CW time reports also that .CW 8.bcp spent 0.01 seconds executing its own code (\f(CW0.01u\fP) and 0.01 seconds executing inside the operating system (\f(CW0.01s\fP), .ix "user time .ix "system time .ix "elapsed time e.g., doing system calls. The remaining 0.38 seconds, until the total of 0.4 seconds, the system was doing something else (perhaps executing other programs or waiting for the disk to read or write). .PP What would happen reading one byte at a time? (and writing it, of course). .P1 ; time 8.bcp -b 1 /tmp/sfile /tmp/dfile 9.01u 56.48s 755.31r 8.bcp -b 1 /tmp/sfile /tmp/dfile .P2 .LP Our program is .I "amazingly slow" ! It took 755.31 seconds to complete. That is 12.6 minutes, which is an eon for a computer. But it is the same program, we did not change anything. Just this time, we read one byte at a time and then wrote that byte to the output file. Before, we did the same but for a more reasonable buffer size. .PP Let's continue the experiment. What would happen if our program reads one line at a time? The source file does not have lines, but we can pretend that all lines have 80 characters of one byte each. .P1 ; time 8.bcp -b 80 /tmp/sfile /tmp/dfile 0.11u 0.74s 10.38r 8.bcp -b 80 /tmp/sfile /tmp/dfile .P2 .LP Things improved, but nevertheless we still need 10.38 seconds just to copy 1 Mbyte. What happens is that making a system call is not so cheap, at least it seems very expensive when compared to making a procedure call. For a few calls, it does not matter at all. However, in this experiment it does. Using a buffer of just one byte means making 2,097,152 system calls! (1,048,576 to read bytes and 1,048,576 to write them). Using an 8Kbyte buffer requires just 128 calls (.e., 1,048,576 / 8,192). You can compare for yourself. In the intermediate experiment, reading one line at a time, it meant 26,214 system calls. Not as many as 2,097,152, but still a lot. .PP How to overcome this difficulty when we really need to write an algorithm that reads/writes a few bytes at a time? The answer, as you probably know, is just to use buffering. It does not matter if your algorithm reads one byte at a time. It does matter if you are making a system call for each byte you read. .PP The .I bio (2) .ix [bio] .ix "buffered I/O library in Plan 9 provides buffered input/output. This is an abstraction that, although not provided by the underlying Plan 9, is so common that you really must know how it works. The idea is that your program creates a Bio buffer for reading or writing, called a .CW Biobuf . .ix [Biobuf] You program reads from the .CW Biobuf , by calling a library function, and the library will call .CW read .ix [read] only to refill the buffer each time you exhaust its contents. This is our (in)famous program, but this time we use Bio. .so progs/biocp.c.ms .ix [biocp.c] .LP The first change you notice is that to use Bio the header .CW bio.h .ix [bio.h] must be included. The data structure representing the Bio buffer is a .CW Biobuf . The program obtains two ones, one for reading the input file and one for writing the output file. The function .CW Bopen .ix [Bopen] is similar to .CW open , but returns a pointer to a .CW Biobuf instead of returning a file descriptor. .P1 ; sig Bopen Biobuf* Bopen(char *file, int mode) .P2 .LP Of course, .CW Bopen .I must call .CW open to open a new file. But the descriptor returned by the underlying call to .CW open is kept inside the .CW Biobuf , because only routines from .I bio (2) should use that descriptor. You are supposed to read and write from the .CW Biobuf . .PP To read from .CW bin , our input buffer, the program calls .CW Bread . This function is exactly like .CW read , but reads bytes from the buffer when it can, without calling .CW read . Therefore, .CW Bread does not receive a file descriptor as its first parameter, it receives a pointer to the .CW Biobuf used for reading. .P1 ; sig Bread long Bread(Biobufhdr *bp, void *addr, long nbytes) .P2 .LP The actual system call, .CW read , is used by .CW Bread .ix [read] .ix [Bread] only when there are no more bytes to be read from the buffer, e.g., because you already read it all. .PP To write bytes to a .CW BIobuf , the program uses .CW Bwrite . .ix [Bwrite] This is to .CW write what .CW Bread is to .CW read . .P1 ; sig Bwrite long Bwrite(Biobufhdr *bp, void *addr, long nbytes) .P2 .LP The call to .CW Bterm .ix [Bterm] releases a .CW Biobuf , .ix "[Biobuf] termination including the memory for the data structure. This closes the file descriptor used to reach the file, after writing any pending byte still sitting in the buffer. .P1 ; sig Bterm int Bterm(Biobufhdr *bp) .P2 .LP As you can see, both .CW Bterm and .CW Bflush .ix [Bflush] .ix "[Biobuf] flushing return an integer. That is how they report errors. They can fail because it can be that the file cannot really be written (e.g., because the disk is full), but you will only know when you try to write the file, which does not necessarily happen in .CW Bwrite . .PP How will our new program behave, now that it uses buffered input/output? Let's try it. .P1 ; time 8.biocp /tmp/sfile /tmp/dfile 0.00u 0.03s 0.38r 8.bcp /tmp/sfile /tmp/dfile ; time 8.biocp -b 1 /tmp/sfile /tmp/dfile 0.00u 0.13s 0.31r 8.bcp -b 1 /tmp/sfile /tmp/dfile ; time 8.biocp -b 80 /tmp/sfile /tmp/dfile 0.00u 0.02s 0.20r 8.bcp -b 80 /tmp/sfile /tmp/dfile .P2 .LP Always the same!. Well, not exactly the same because there is always some uncertainty in every measurement. In this case, give or take 2/10th of a second. But in any case, reading one byte at a time is far from taking 12.6 minutes. Bio took care of using a reasonable buffer size, and calling .CW read only when necessary, as we did by ourselves when using 8Kbyte buffers. .PP One word of caution. After calling .CW write , it is very likely that our bytes are already in the file, because there is probably no buffering between your program and the actual file. However, after a call to .CW Bwrite it is almost for sure that your bytes are .I not in the file. They will be sitting in the .CW Biobuf , waiting for more bytes to be written, until a moment when it seems reasonable for a Bio routine to do the actual call to .CW write . This can happen either when you fill the buffer, or when you call .CW Bterm , which terminates the buffering. If you really want to flush your buffer, i.e., to send all the bytes in it to the file, you may call .CW Bflush . .P1 ; sig Bflush int Bflush(Biobufhdr *bp) .P2 .LP To play with this, and see a couple of other tools provided by Bio, we are going to reimplement our little .CW cat program but using Bio this time. .so progs/biocat.c.ms .ix [cat] .ix [biocat.c] .LP This program uses two .CW Biobufs , like the previous one. However, we now want one for reading from standard input, and another to write to standard output. Because we already have file descriptors 0 and 1 open, it is not necessary to call .CW Bopen . The function .CW Binit .ix [Binit] .ix "[Biobuf] file descriptor initializes a .CW Biobuf for an already open file descriptor. .P1 ; sig Binit int Binit(Biobuf *bp, int fd, int mode) .P2 .LP You must declare your own .CW Biobuf . Note that this time .CW bin and .CW bout are .I not pointers, they are the actual .CW Biobufs used. Once we have our .CW bin and .CW bout buffers, we might use any other Bio function on them, like before. The call to .CW Bterm terminates the buffering, and flushes any pending data to the underlying file. However, because Bio did not open the file descriptor for the buffer, it will not close it either. .PP Unlike the previous program, this one reads one line at a time, because we plan to use it with the console. The function .CW Brdline .ix [Brdline] .ix "read line reads bytes from the buffer until the end-of-line delimiter specified by its second parameter. .P1 ; sig Brdline void* Brdline(Biobufhdr *bp, int delim) .P2 .LP We used .CW '\en' , which is the end of line character in Plan 9. The function returns a pointer to the bytes read, or zero if no more data could be read. Each time the program reads a line, it writes the line to its standard output through .CW bout . The .CW line returned by .CW Brdline is not a C string. There is not a final null byte after the line. We could have used .CW Brdstr , .ix [Brdstr] .ix "string read which returns the line read in dynamic memory (allocated with .CW malloc ), and terminates the line with a final null byte. But we did not. Thus, how many bytes must we write to standard output? The function .CW Blinelen .ix [Blinelen] .ix "line length returns the number of bytes in the last line read with .CW Brdline . .P1 ; sig Blinelen int Blinelen(Biobufhdr *bp) .P2 .LP And that explains the body of the .CW while in our program. Let's now play with our cat. .P1 ; 8.biocat !!one little !!cat was walking. \fBcontrol-d\fP one little cat was walking. ; .P2 .LP No line was written to standard output until we typed .I control-d . The program did call .CW Bwrite , but this function kept the bytes in the buffer. When .CW Brdline returned an EOF indication, the call to .CW Bterm terminated the output buffer and its contents were written to the underlying file. If we modify this program to add a call to .P1 Bflush(&bout); .P2 after the one to .CW Bwrite , this is what happens. .P1 ; 8.biocat !!Another little cat Another little cat !!did follow did follow \fBcontrol-d\fP ; .P2 .LP The call to .CW Bflush flushes the buffer. Of course, it is now a waste to use .ix "buffer flushing .CW bout at all. If we are flushing the buffer after each write, we could have used just .CW write , and forget about .CW bout . .SH Problems .IP 1 Use the debugger, .CW acid , to see that a program reading from standard input in a window is indeed waiting inside .CW read while the system is waiting for you to type a line in the window. .IP .I Hint : Use .CW ps to find out which process is running your program. .IP 2 Implement the .I cat (1) utility without looking at the source code for the one in your system. .IP 3 Compare your program from the previous problem with the one in the system. Locate the one in the system using a command. Discuss the differences between both programs. .IP 4 Implement a version of .I chmod (1) that accepts an octal number representing a new set of permissions, and one or more files. The program is to be used like in .P1 ; 8.out 0775 file1 file2 file3 .P2 .IP 5 Implement your own program for doing a long listing like .P1 ; ls -l .P2 .IP would do. .IP 6 Write a program that prints all the files contained in a directory (hierarchy) along with the total number of bytes consumed by each file. If a file is a directory, its reported size must include that of the files found inside. Compare with .I du (1). .ds CH .bp \c