Sunday, September 30, 2007

Design of DOS

I set to work writing an operating system (OS) for the 16-bit Intel 8086 microprocessor in April of 1980. At that point my employer, Seattle Computer Products (SCP), had been shipping their 8086 computer system (which I had designed) for about 6 months. The only software we had for the computer system was Microsoft Stand-Alone Disk BASIC. “Stand-Alone” means that it worked without an OS, managing the disk directly with its own file system. It was fast for BASIC, but it wouldn’t have been suitable for writing, say, a word processor.

We knew Digital Research was working on a 16-bit OS, CP/M-86. At one point we were expecting it to be available at the end of 1979. Had it made its debut at any time before DOS was working, the DOS project would have been dropped. SCP wanted to be a hardware company, not a software company.

I envisioned the power of the 8086 making it practical to have a multi-user OS, and I laid out a plan to the SCP board of directors to develop a single-user OS and a multi-user OS that would share the same Application Program Interface (API). This would be a big design job that would take time to get right – but we were already shipping our computer system and needed an OS now. So I proposed to start with a “quick and dirty” OS that would eventually be thrown away.

Baseline Experience

I had graduated from college (BS in Computer Science) less than two years earlier. I spent the next year tentatively in graduate school while also working at SCP. So I didn’t have much experience in the computer industry.

This is not to say I had no experience at all. In college I had an IMSAI 8080 with a North Star floppy disk system. I made my own peripherals for it, which included a Qume daisy-wheel printer with its own Z80-based controller that I designed and programmed myself. I also designed my own Z80-based road-rally computer that used a 9-inch CRT video display mounted in the glove box of my car. And my school work included writing a multitasking OS for the Z80 microprocessor as a term project. The thrust of that project had been to demonstrate preemptive multitasking with synchronization between tasks.

For SCP, my work had been ostensibly hardware-oriented. But the 8086 CPU had required me to develop software tools including an assembler (the most basic tool for programming a new processor) and a debugger. These tools shipped with the product.

My hands-on experience with operating systems was limited to those I had used on microcomputers. I had never used a “big computer” OS at all. All programming projects in high school and college were submitted on punched cards. On my own computer I used North Star DOS, and at SCP we had Cromemco CDOS, a CP/M look-alike.

File System Performance

An important design parameter for the OS was performance, which drove my choice for the file system. I had learned a handful of file management techniques, and I spent some time analyzing what I knew before making a choice. These were the candidates:

North Star DOS and the UCSD p-system used the simplest method: contiguous file allocation. That is, each file occupies consecutive sectors on the disk. The disk directory only needs to keep track of the first sector and the length for each file – very compact. Random access to file data is just as fast as sequential access, because it’s trivial to compute the sector you want. But the big drawback is that once a file is boxed in by other files on the disk, it can’t grow. The whole file would then have to be moved to a new spot with more contiguous free space, with the old location leaving a “hole”. After a while, all that’s left are the holes. Then you have to do a time-consuming “pack” to shuffle all the files together and close the holes. I decided the drawbacks of contiguous allocation were too severe.

UNIX uses a clever multi-tiered approach. For small files, the directory entry for a file has a short table of the sectors that make up the file. These sectors don’t have to be contiguous, so it’s easy to extend the file. If the file gets too large for the list to fit in the table, UNIX adds a tier. The sectors listed in the table no longer reference data in the file; instead, each entry identifies a sector which itself contains nothing but a list of sectors of file data. If the file gets huge, yet another tier is added – the table entries each reference a sector whose entries reference a sector whose entries identify the file data. Random access to file data is very fast for small files, but as the files get larger and the number of tiers grow, if will take one or two additional disk reads just to find the location of the data you really want.

CP/M didn’t track disk space directly in terms of sectors. Instead it grouped sectors together into a “cluster” or “allocation unit”. The original CP/M was designed specifically around 8” disks, which had 2002 sectors of 128 bytes each. By making each cluster 8 sectors (1K), there were less than 256 clusters on a disk. Thus clusters could be indentified using only one byte. The directory entry for CP/M had a table of 16 entries of the clusters in file, so for a file of 16K or less both random and sequential access were fast and efficient. But when a file exceeded 16K, it needed a whole new directory entry to store an additional 16K of cluster numbers. There was no link between these entries; they simply contained the same name and a number indentifying which section of the file it represented (the “extent”). This led to a potential performance nightmare, especially for random access. When switching between extents, the system had to perform its standard linear search of the directory for a file of the correct name and extent. This search could take multiple disk reads before the requested data was located.

Microsoft Stand-Alone Disk BASIC used the File Allocation Table (FAT). Unlike all the other file systems, the FAT system separates the directory entry (which has the file name, file size, etc.) from the map of how the data is stored (the FAT). I will not give a detailed explanation of how that worked here as the system has been well documented, such as my 1983 article An Inside Look at MS-DOS at http://www.patersontech.com/dos/Byte/InsideDos.htm. Like CP/M, BASIC used a 1K cluster so that, once again, there were less than 256 on the standard 8” floppy disk of the day. The FAT needs one entry per cluster, and for BASIC the entry needed to be just one byte, so the FAT fit within two 128-byte sectors. This small size also meant it was practical, even with the limited memory of the time, to keep the entire FAT in memory at all times.

To me, the big appeal of the FAT system was that you never had to read the disk just to find the location of the data you really wanted. FAT entries are in a chain – you can’t get to the end without visiting every entry in between – so it is possible the OS would have to pass through many entries finding the location of the data. But with the FAT entirely in memory, passing through a long chain would still be 100 times faster than a single sector read from a floppy disk.

Another thing I liked about FAT was its space efficiency. There were no tables of sectors or clusters that might be half full because the file wasn’t big enough to need them all. The size of the FAT was set by the size of the disk.

When I designed DOS I knew that fitting the cluster number in a single byte, limiting the number of clusters to 256, wouldn’t get the job done as disks got bigger. I increased the FAT entry to 12 bits, allowing over 4000 clusters. With a cluster size of as much as 16K bytes, this would allow for disks as large as 64MB. You could even push it to a 32K cluster and 128MB disk size, although that large cluster could waste a lot space. These disk sizes seemed enormous to me in 1980. Only recently had we seen the first 10MB hard disks come out for microcomputers, and that size seemed absurdly lavish (and expensive).

Obviously I’m no visionary. Disk size has grown faster than any other computer attribute, up by a factor of 100,000. (Typical memory size is up by a factor of 30,000, while clock speed is only up 1000x.) Microsoft extended the FAT entry to 16 bits, then to 32 bits to keep up. But this made the FAT so large that it was no longer kept entirely in memory, taking away the performance advantage.

Hardware Performance

On the few computer systems I personally used, I had experienced a range of disk system performance. North Star DOS loaded files with lightening speed, while the CP/M look-alike Cromemco CDOS took much longer. This was not a file system issue – the difference still existed with files less than 16K (just one CP/M “extent”) and contiguously allocated.

North Star DOS did the best job possible with its hardware. Each track had 10 sectors of 256 bytes each, and it could read those 10 sectors consecutively into memory without interruption. To read in a file of 8KB would require reading 32 sectors; the nature of the file system ensured they were contiguous, so the data would be found on four consecutive tracks. When stepping from track to track, it would have missed the start of the new track and had to wait for it to spin around again. That would mean it would take a total of 6.2 revolutions (including three revolutions lost to track stepping) to read the 8KB. The 5-inch disk turned at 5 revolutions per second so the total time would be less than 1.3 seconds.

The standard disk with CP/M (CDOS) had a track with 26 sectors of 128 bytes each. CP/M could not read these sectors consecutively. It used an interleave factor of 6, meaning it would read every sixth sector. The five-sector gap between reads presumably allowed for processing time. An 8KB file would occupy about 2½ tracks, which, at 6 revolutions per track (because of the interleave), would take about 17 revolutions of the disk to be read (including two revolutions lost to track stepping). The 8-inch disk turned at 6 revolutions per second so the total time would be over 2.8 seconds. This is more than twice as long as the North Star system which used fundamentally slower hardware.

At least part of the reason CP/M was so much slower was because of its poor interface to the low-level “device driver” software. CP/M called this the BIOS (for Basic Input/Output System). Reading a single disk sector required five separate requests, and only one sector could be requested at a time. (The five requests were Select Disk, Set Track, Set Sector, Set Memory Address, and finally Read Sector. I don’t know if all five were needed for every Read if, say, the disk or memory address were the same.)

I called the low-level driver software the I/O System, and I was determined its interface would not be a bottleneck. Only a single request was needed to read disk data, and that request could be for any number of sectors. This put more of the work on the I/O System, but it allowed it to maximize performance. The floppy disk format did not use interleave, and an 8KB file could be read from an 8-inch disk in 4½ revolutions which is less than 0.8 seconds.

When hard disks were first introduced on the IBM PC/XT in 1982, the I/O System provided by IBM once again used an interleave factor of 6. Some aftermarket add-on hard disks were available with interleave factor as low as 3. In 1983 I founded Falcon Technology which made the first PC hard disk system that required no interleave. Once hard disks started having built-in memory, interleave was completely forgotten.

CP/M Translation Compatibility

For DOS to succeed, it would need useful applications (like word processing) to be written for it. I was concerned that SCP might have trouble persuading authors of application software to put in the effort to create a DOS version of their programs. Few people had bought SCP’s 16-bit computer, so the installed base was small. Without the applications, there wouldn’t be many users, and without the users, there wouldn’t be many applications.

My hope was that by making it as easy as possible to port existing 8-bit applications to our 16-bit computer, we would get more programmers to take the plunge. And it seemed to me that CP/M translation compatibility was what would make the job as easy as possible. Intel had defined rules for translating 8-bit programs into 16-bit programs; CP/M translation compatibility means that when a program’s request to CP/M went through the translation, it would become an equivalent request to DOS. My first blog entry explains this in more detail.

So I made CP/M translation compatibility a fundamental design goal. This required me to create a very specific Application Program Interface that implemented the translation compatibility. I did not consider this the primary API – there was, in fact, another API more suited to the 16-bit world and that had more capabilities. Both APIs used CP/M-defined constructs (such as the “File Control Block”); the compatibility API had to, and I didn’t see a reason to define something different for the primary API.

I myself took advantage of translation compatibility. The development tools I had written, such as the assembler, were originally 8-bit programs that ran under CP/M (CDOS). I put them through the translator and came up with 16-bit programs that ran under DOS. These translated tools were included with DOS when shipped by SCP. But I don’t think anyone else ever took advantage of this process.