As of June, 2017 there are over 92000 files including
over 3.9 million text pages in the archive.

The PDF Document Format

Documents here are kept in a minimal subset of PDF format, just using it as a
container for lossless Group 4 fax compression (ITU-T recommendation T.6) images.
Contributions are normally post-processed by tools to put them in exactly this format,
so that all of the documents here are the same and can be burst at some point in the
future when OCR technology is mature enough to do a good job of recognition.

Documents were scanned using a Ricoh IS520 30ppm duplex production scanner from the late 90's through 2007.
Conversion to higher performance Kodak DS 2500D scanning occured in July, 2007.
The 2500D is an OEM version of the Panasonic KV-S2055 scanner.
In 2008, the Kodak was replaced by a Panasonic KV-S3065W, which
is capable of color 600dpi scanning, and has the capability to scan
sheets several feet long.

Post-processing is done using Lemkesoft's Graphic Converter
TIFF to PDF conversion is done using Eric Smith's tumble

The preferred form for any contributed text scan is as a collection of lossless
Group 4 fax compression (ITU-T recommendation T.6) images saved as TIFF
files with a minium scan resolution of 400 dpi.

Lower scan resolutions produce noticable artifacts if a page needs to be
straightened in post-processing.

Lossy compression formats, such as JPEG, should NEVER be used to save pages
of text, since the compression format destroys edge resolution and contrast
would make it difficult to OCR in the future.

Tape processing over the years

