CETH

A Brief Introduction to Humanities Computing and Electronic Text

1. The Beginnings of Humanities Computing


The earliest humanities projects involving an irreplaceable use of computers were in the domain of literary and textual studies, where the introduction of computer technology in the late 1940's made it possible to prepare exhaustive concordances. Most humanities computing specialists agree that the most notable of these pioneering concordancing projects is Father Roberto Busa's more than 60 volume word index of the eight million words in the works of St Thomas Aquinas. The Index Thomisticus was begun in 1949 and was not completed until 1980; it would not have been possible at all without the assistance of a computer, but still took a lifetime of manual data entry.

2. Developments in Concordancing


By the mid 1960's many similar computer-assisted concordances were being prepared, with the goal of a multi-volume, printed edition in mind. Today's technology has advanced to the point where computers can generate concordances more or less automatically, allowing humanities researchers to carry out more complex projects in computer assisted textual analysis, by retrieving text interactively rather than exactly as it is stored. Thus, even as early as the mid-1960's, computers could be used as a quantitative basis for stylistic analysis, such as Andrew Morton's work on the style of St Paul's Epistles based on sentence length and particles, or Frederick Mosteller and David Wallace's computer-based study of the authorship of disputed texts in the Federalist Papers. Other possibilities for textual analysis now include sophisticated vocabulary studies of collocations (co-occurrences of words), the collation of manuscript variants for critical editions and the writing of programs for the metrical analysis of verse.

3. Text Archives


The preparation of machine-readable texts for concordancing has led to the creation of major text archives such as the Thesaurus Linguae Graecae, the Oxford Text Archive, and the Trésor de la Langue Française, and even to the development of flexible hypertext systems, perhaps the most well known application of humanities computing known to humanists at large, through multimedia databases such as Perseus or via the World Wide Web.

4. Developments in Databases and Statistics


Technological advances have led to computing projects in other disciplines as well. In the 1970's, for example, historians began to use the computer to facilitate the use of statistical techniques in historical analysis. The complexity of historical data, however, soon presented more problems than could easily be solved within the frame works of simple data-processing software. To produce more methodologically relevant databases, systems were devised which could account for the vast differences between historical source material and establish links between important names in a variety of different documents and document types such as ephemera, baptism records and images. The most flexible of these, KLEIO, still remains a remarkable example of this kind of system. Other database systems have further extended the reach of humanities computing over the fields of art history and archaeology, notably ICONCLASS and ORACLE which have assisted the cataloguing and analysis of numerous collections of works and artefacts. Cutting edge computing in these fields is now focusing on the digitization of images, computer-aided reconstruction of sites, and the possibility of automated pattern recognition.

5. Humanities Computing Professional Associations


The profusion of new technologies and developments in the field of humanities computing has led to the creation of a number of international associations. Both the Association for Computers and the Humanities (ACH) and the Association for Literary and Linguistic Computing (ALLC) provide an environment for the discussion and support of humanities computing topics and projects and sponsor major international conferences in the field.

6. The Text Encoding Initiative


International humanities computing associations along with other more specialized organizations have collaborated to address what may be the single most important problem posed by the development of humanities computing over the past forty years--the need for a common encoding format for scholarly machine-readable texts. In the past, scholars have developed encoding schemes more or less at their own discretion, in accordance with their particular scholarly goals. The traditions of centuries of print culture incline us to take phenomena such as non-standard characters, footnotes, marginalia, markers for logical divisions (e.g., chapter, verse, stanza) and illustrations at face value. But the means for encoding such features electronically are often incompatible; they remain restricted to the features of a given text or corpus, and risk being completely dependent on one kind of hardware or software. In response to these issues, members of the various humanities computing associations developed the Text Encoding Initiative (TEI). The TEI guidelines specify a common interchange format for machine-readable texts, provide a set of recommendations for the encoding and representation of all possible features in the preparation of new textual materials, and document major extant encoding schemes, developing a metalanguage which allows the encoding schemes themselves to be encoded and described in a machine-readable form.

7. The Creation of CETH


As with any set of guidelines, the TEI requires a consensus. Many electronic text centers and electronic text archives have agreed to support and implement the TEI. Among these is the Center for Electronic Texts in the Humanities (CETH), established by Rutgers and Princeton Universities in October 1991. CETH provides a national focus for those involved in the creation, dissemination and use of electronic texts in the humanities. CETH's mission is to advance scholarship in the humanities through the use of high-quality electronic texts. To this end, CETH is now cataloguing on RLIN existing electronic texts for the Rutgers Inventory of Machine-Readable Texts in the Humanities. CETH also evaluates SGML software for high quality TEI-conformant electronic texts and builds test-beds for research on their use and users. In addition, CETH sponsors the CETH Summer Seminar to focus on practical and methodological issues in humanities computing, attracting teachers, scholars and librarians interested in humanities computing from around the world.


CETH Home Page