4. What is the TEI and what does it mean to me?

The Text Encoding Initiative (TEI) is a major international project, sponsored jointly by the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC). Its task was to develop and disseminate guidelines for the interchange of machine-readable texts among researchers, and to make recommendations for the encoding of new texts. These guidelines were published in June 1994 as Guidelines for Electronic Text Encoding and Interchange (TEI P3), a two-volume, 1300 page document.

The TEI has created a modular SGML application to serve the needs of the humanities and language industries. It allows many different texts to be encoded in a compatible form which means that they can be analyzed by the same (SGML-conformant) software whilst still reflecting a diversity of scholarly opinion about the texts.

Textual data has been analyzed and otherwise manipulated by computer for over thirty years, but until now there has not been any common encoding format for scholarly machine-readable texts. Scholars have developed many different encoding schemes for representing non-standard characters, footnotes, marginalia, text-critical apparatus, for marking the logical divisions of the text (e.g., book, chapter, verse) or the analytic or interpretive information relevant to the text (e.g., syntactic, morphological, or semantic analysis) as well as for documenting the source of an encoded text and the nature of the recording.

Before the TEI, none of the existing encoding schemes was able to gain acceptance as a standard. Most reflected the research interests of their originators and were applicable in only one subject area. Some were created to serve the needs of large-scale projects such as the Thesaurus Linguae Graecae at Irvine or the Responsa Project at Bar-Ilan (Israel) or as specifications for input to text-analysis software such as OCP or WatCon. Others were devised by single individuals for their own projects. None was sufficiently flexible or generalizable to apply to the encoding of textual materials across the full spectrum of applications and research interests. The practice followed in encoding them was often poorly documented and much time was wasted writing software to translate from one format to another.

A common text encoding scheme developed for the needs of scholarly research would eliminate or minimize many of these problems and would also provide a text format to be used as a starting point by developers of new software.

For more information on the TEI, click here to visit the TEI home Page.


Return to the Main FAQ Menu

2002-05-24