2. What is involved in scanning a text?

Some people choose to scan in their texts. OCR (optical character recognition) is the usual method of input, using a scanner. However, much depends on the quality of the original text, i.e, whether the typeface is readable by the scanning software, whether the copy is printed on good quality paper, etc. There are certain typefaces and formats that are not suitable for scanning, though some scanning software, e.g., Omnipage, can "learn" to read different character sets through user-defined tables. Often the amount of proofreading and editing involved is much more time-consuming than the user anticipates.

Scanning a text brings up questions of copyright and fair use of a text. These questions must be dealt with on a case-by-case basis as the laws of copyright are being developed to include the new technology.

Even if the characters and words in the text have been recognized correctly, further work is needed to make a scanned text usable. The text must have encoding to preserve any referencing structure in the text (e.g., line numbers and/or pagination for referring back to the original) and to mark textual content for retrieval and analysis (e.g., names of characters, themes, stage directions, variant readings, etc.).


Return to the Main FAQ Menu