FURTURE DATA 8" DISK FORMAT Chuck Guzis Sydex, Inc. July, 2013 This is a summary of what I can remember when I processed a number Future Data 10 years ago. Time does not always preserve accuracy, so this is what I've been able to deduce from the code I wrote back then. 1. Low-level format. The disks that I was given were standard 32-sector 8" single-sided hard-sectored floppies. The "hard sectored" aspect appears to be a red herring as sector boundaries do not correspond to physical sector holes. My guess is that the drive was set up to output sector holes on a different pin than that used to signal the index hole. Shugart, Qume and Siemens drives can be jumpered to do this, which allows the use of either hard- or soft-sectored floppies. The modulation (recording scheme) is neither FM, MFM or MMFM, so a conventional disk controller can't be used to read these disks. I used a Catweasel Mk 1 (ISA) controller to provide a histogram of flux changes. When these are accumulated and charted, three distinct peaks at t, 2t and 4t will be noticed. Since this isn't standard MFM (groupings would be at t, 1.5t and 2t), what each time period represents requires a bit of guessing. One of the first things that I do when I get a strange format disk is attempt to determine where sector headers start and where sector data starts. This is usually quite visible just looking at histogram data returned--and a lot of guessing. Eventually, a pattern emerges. Sector headers almost always have the cylinder (track) number embedded somewhere--this is the way that seek errors are usually detected. The great thing about these is that they'll stay the same for all the sectors on a track and increment by one for each following track. Guessing what t, 2t and 4t stand for is a real head-scratcher. It's definitely not MFM, as the timings would be t, 1.5t and 2t. So a natural guess would be group code. Since group code relies on keeping the average flux change frequency low, a good guess would be t='1', 2t='01' and 3t='001'. (It's little more complicated than this and involves a fair amount of scribbling and head-scratching, but hopefully you get the idea). It's pretty safe to assume that if this is group code, that a 5-to-4 bit mapping is being used, as we're probably dealing with an 8-bit byte system here. (Other schemes are certainly possible.) So what does the map look like? Well, one way is to look at the stream of raw bits (decoded as above) on a track. Sectors, even with hard-sectored disks, usually have some sort of ID information with each sector. There's also some sort of synchronization pattern preceding the sector ID information. It's entirely reasonable to assume that sector IDs contain sector numbers and the increment is 1 from sector to sector. Further, it's a safe bet to suspect that the track number is part of the ID information so that seek errors can be detected. And it's also a pretty safe bet that all sector ID fields on the same track have the same cylinder number. Given that, with a bit of staring and guessing, we can determine the 5-bit group codes for each 4-bit data group. All of that will reveal itself within the first 16 cylinders. What I came up with is a table that looks like this: 11001 = 0 11011 = 1 10010 = 2 10011 = 3 11101 = 4 10101 = 5 10110 = 6 10111 = 7 11010 = 8 01001 = 9 01010 = A 01011 = B 11110 = C 01101 = D 01110 = E 01111 = F The sole exceptions to the table of legal values above is the synchronization pattern before the ID field and data field start, which is a string of at least 10 '1' bits (this is an illegal string of group code). After the synchronization burst, 7 bytes of ID information follow. For sector IDs, the second and fourth bytes are the cylinder number and sector number respectively. I didn't bother discovering the signifcance of the other bytes in the ID field, but two are probably a CRC or checksum. When this is decoded, it's apparent that each cylinder has 52 sectors and that the sector ID numbers are 0 through 51. After the sector ID field, another synchronization burst of "1" bits precede the data field, which is 131 bytes long--the first 128 of which are data and the remainder are probably CRC and padding bytes. At this point, it's possible to form a complete raw binary image of the disk. The next step is to determine how the data is organized into files. Staring at the disk image, it's apparent that there's a directory on the first track, after the first sector on the disk (which appears to have bytes of no significance, perhaps a boot sector) and that each entry is 16 bytes long, or 8 entries per sector. Looking to see where non-directory data starts, it's obvious that there are 64 directory slots, followed by the first data sector. The first entry in the directory is called "DIR", so the directory has a lost in itself for that. It appears that files are not fragmented--each files appears to occupy consecutive sectors (this may not be correct, but that pattern was followed on the 6 disks that I was given). After a bit more staring, the apparent directory entry structure appears to be this way: 10 bytes of a file name, if deleted the name is hex FF. 1 byte data starting cylinder 1 byte of attribute, 03 = directory, 04 = text file; (tentatively) 2 bytes of the number of sectors in the file. 1 byte unknown; always seems to be 1. 1 byte unknown; seems to be 1 or 0 (for directories) After that, you have your files. Text files have lines ended by hex FB, FC or FD. The end of a text file is marked by a byte of hex FE. 110