ref: 5a51e900dd113a1a5f2d2a0e8bac067fbc7da3ae
dir: /sys/doc/fs/p4/
.SH Block Devices .PP The block device I/O system is like a protocol stack of filters. There are a set of pseudo-devices that call recursively to other pseudo-devices and real devices. The protocol stack is compiled from a configuration string that specifies the order of pseudo-devices and devices. Each pseudo-device and device has a set of entry points that corresponds to the operations that the file system requires of a device. The most notable operations are .CW read , .CW write , and .CW size . .PP The device stack can best be described by describing the syntax of the configuration string that specifies the stack. Configuration strings are used during the setup of the file system. For a description see .I fsconfig (8). In the following recursive definition, .I D represents a string that specifies a block device. .IP "\fID\fP = (\fIDD\fP...)" .br This is a set of devices that are concatenated to form a single device. The size of the catenated device is the sum of the sizes of each sub-device. .IP "\fID\fP = [\fIDD\fP...]" .br This is the interleaving of the individual devices. If there are N devices in the list, then the pseudo-device is the N-way block interleaving of the sub-devices. The size of the interleaved device is N times the size of the smallest sub-device. .IP "\fID\fP = {\fIDD\fP...}" .br This is a set of devices that constitute a `mirror' of the first sub-device, and form a single device. A write to the device is performed, at the same block address, on the sub-devices, in right-to-left order. A read from the device is performed on each sub-device, in left-to-right order, until a read succeeds without error, or the set is exhausted. One can think of this as a poor man's RAID 1. The size of the device is the size of the smallest sub-device. .IP "\fID\fP = \f(CWp\fP\fIDN1.N2\fP" .br This is a partition of a sub-device. The sub-device is partitioned into 100 equal pieces. If the size of the sub-device is not divisible by 100, then there will be some slop thrown away at the top. The pseudo-device starts at the N1-th piece and continues for N2 pieces. Thus .CW p\fID\fP67.33 will be the last third of the device .I D . .IP "\fID\fP = \f(CWf\fP\fID\fP" .br This is a fake write-once-read-many device simulated by a second read-write device. This second device is partitioned into a set of block flags and a set of blocks. The flags are used to generate errors if a block is ever written twice or read without being written first. .IP "\fID\fP = \f(CWx\fP\fID\fP" .br This is a byte-swapped version of the file system on D. Since the file server currently writes integers in metadata to disk in native byte order, moving a file system to a machine of the other major byte order (e.g., MIPS to Pentium) requires the use of .CW x . It knows the sizes of the various integer fields in the file system metadata. Ideally, the file server would follow the Plan 9 religion and write a consistent byte order on disk, regardless of processor. In the mean time, it should be possible to automatically determine the need for byte-swapping by examining data in the super-block of each file system, though this has not been implemented yet. .IP "\fID\fP = \f(CWc\fP\fIDD\fP" .br This is the cache/WORM device made up of a cache (read-write) device and a WORM (write-once-read-many) device. More on this later. .IP "\fID\fP = \f(CWo\fP" .br This is the dump file system that is the two-level hierarchy of all dumps ever taken on a cache/WORM. The read-only root of the cache/WORM file system (on the dump taken Feb 18, 1995) can be referenced as .CW /1995/0218 in this pseudo device. The second dump taken that day will be .CW /1995/02181 . .IP "\fID\fP = \f(CWw\fP\fIN1.N2.N3\fP" .br This is a SCSI disk on controller N1, target N2 and logical unit number N3. .IP "\fID\fP = \f(CWh\fP\fIN1.N2.0\fP" .br This is an (E)IDE or *ATA disk on controller N1, target N2 (target 0 is the IDE master, 1 the slave device). These disks are currently run via programmed I/O, not DMA, so they tend to be slower to access than SCSI disks. .IP "\fID\fP = \f(CWr\fP\fIN1\fP" .br This is the same as .CW w , but refers to a side of a WORM disc. See the .I j device. .IP "\fID\fP = \f(CWl\fP\fIN1\fP" .br This is the same as .CW r , but one block from the SCSI disk is removed for labeling. .IP "\fID\fP = \f(CWj(\fP\fID\d\s-2\&1\s+2\u\fID\d\s-2\&2\s+2\u\f(CW*)\fID\d\s-2\&3\s+2\u\f1" .br .I D\d\s-2\&1\s+2\u is the juke box SCSI interface. The .I D\d\s-2\&2\s+2\u 's are the SCSI drives in the juke box and the .I D\d\s-2\&3\s+2\u 's are the demountable platters in the juke box. .I D\d\s-2\&1\s+2\u and .I D\d\s-2\&2\s+2\u must be .CW w . .I D\d\s-2\&3\s+2\u must be pseudo devices of .CW w , .CW r , or .CW l devices. .PP For .CW w , .CW h , .CW l , and .CW r devices any of the configuration numbers can be replaced by an iterator of the form .CW <\fIN1-N2\fP> . N1 can be greater than N2, indicating a descending sequence. Thus .Ex [w0.<2-6>] .Ee is the interleaved SCSI disks on SCSI targets 2 through 6 of SCSI controller 0. The main file system on Emelie is defined by the configuration string .Ex c[w1.<0-5>.0]j(w6w5w4w3w2)(l<0-236>l<238-474>) .Ee This is a cache/WORM driver. The cache is three interleaved disks on SCSI controller 1 targets 0, 1, 2, 3, 4, and 5. The WORM half of the cache/WORM is 474 jukebox disks. Another file server, .I choline , has a main file system defined by .Ex c[w<1-3>]j(w1.<6-0>.0)(l<0-124>l<128-252>) .Ee The order of .CW w1.<6-0>.0 matters here, since the optical jukebox's WORM drives's SCSI target ids, as delivered, run in descending order relative to the numbers of the drives in SCSI commands (e.g., the jukebox controller is SCSI target 6, drive #1 is SCSI target 5, and drive #6 is SCSI target 0).