|
A Brief Introduction to Humanities Computing and Electronic Text
The earliest humanities projects involving an irreplaceable use of
computers were in the domain of literary and textual studies, where the
introduction of computer technology in the late 1940's made it possible to
prepare exhaustive concordances. Most humanities computing specialists agree
that the most notable of these pioneering concordancing projects is Father
Roberto Busa's more than 60 volume word index of the eight million words in the
works of St Thomas Aquinas. The Index Thomisticus was begun in
1949 and was not completed until 1980; it would not have been possible at all
without the assistance of a computer, but still took a lifetime of manual data
entry.
By the mid 1960's many similar computer-assisted concordances were
being prepared, with the goal of a multi-volume, printed edition in mind.
Today's technology has advanced to the point where computers can generate
concordances more or less automatically, allowing humanities researchers to
carry out more complex projects in computer assisted textual analysis, by
retrieving text interactively rather than exactly as it is stored. Thus, even
as early as the mid-1960's, computers could be used as a quantitative basis for
stylistic analysis, such as Andrew Morton's work on the style of St Paul's
Epistles based on sentence length and particles, or Frederick Mosteller and
David Wallace's computer-based study of the authorship of disputed texts in the
Federalist Papers. Other possibilities for textual analysis now
include sophisticated vocabulary studies of collocations (co-occurrences of
words), the collation of manuscript variants for critical editions and the
writing of programs for the metrical analysis of verse.
The preparation of machine-readable texts for concordancing has led
to the creation of major text archives such as the Thesaurus Linguae Graecae,
the Oxford Text Archive, and the Trésor de la Langue Française,
and even to the development of flexible hypertext systems, perhaps the most well
known application of humanities computing known to humanists at large, through
multimedia databases such as Perseus or via the World Wide Web.
Technological advances have led to computing projects in other
disciplines as well. In the 1970's, for example, historians began to use the
computer to facilitate the use of statistical techniques in historical analysis.
The complexity of historical data, however, soon presented more problems than
could easily be solved within the frame works of simple data-processing
software. To produce more methodologically relevant databases, systems were
devised which could account for the vast differences between historical source
material and establish links between important names in a variety of different
documents and document types such as ephemera, baptism records and images. The
most flexible of these, KLEIO, still remains a remarkable example of this kind
of system. Other database systems have further extended the reach of humanities
computing over the fields of art history and archaeology, notably ICONCLASS and
ORACLE which have assisted the cataloguing and analysis of numerous collections
of works and artefacts. Cutting edge computing in these fields is now focusing
on the digitization of images, computer-aided reconstruction of sites, and the
possibility of automated pattern recognition.
The profusion of new technologies and developments in the field of
humanities computing has led to the creation of a number of international
associations. Both the Association for Computers and the Humanities (ACH) and
the Association for Literary and Linguistic Computing (ALLC) provide an
environment for the discussion and support of humanities computing topics and
projects and sponsor major international conferences in the field.
International humanities computing associations along with other more
specialized organizations have collaborated to address what may be the single
most important problem posed by the development of humanities computing over the
past forty years--the need for a common encoding format for scholarly
machine-readable texts. In the past, scholars have developed encoding schemes
more or less at their own discretion, in accordance with their particular
scholarly goals. The traditions of centuries of print culture incline us to
take phenomena such as non-standard characters, footnotes, marginalia, markers
for logical divisions (e.g., chapter, verse, stanza) and illustrations at face
value. But the means for encoding such features electronically are often
incompatible; they remain restricted to the features of a given text or corpus,
and risk being completely dependent on one kind of hardware or software. In
response to these issues, members of the various humanities computing
associations developed the Text Encoding Initiative (TEI). The TEI guidelines
specify a common interchange format for machine-readable texts, provide a set of
recommendations for the encoding and representation of all possible features in
the preparation of new textual materials, and document major extant encoding
schemes, developing a metalanguage which allows the encoding schemes themselves
to be encoded and described in a machine-readable form.
As with any set of guidelines, the TEI requires a consensus. Many
electronic text centers and electronic text archives have agreed to support and
implement the TEI. Among these is the Center for Electronic Texts in the
Humanities (CETH), established by Rutgers and Princeton Universities in October
1991. CETH provides a national focus for those involved in the creation,
dissemination and use of electronic texts in the humanities. CETH's mission is
to advance scholarship in the humanities through the use of high-quality
electronic texts. To this end, CETH is now cataloguing on RLIN existing
electronic texts for the Rutgers Inventory of Machine-Readable Texts in the
Humanities. CETH also evaluates SGML software for high quality TEI-conformant
electronic texts and builds test-beds for research on their use and users. In
addition, CETH sponsors the CETH Summer Seminar to focus on practical and
methodological issues in humanities computing, attracting teachers, scholars and
librarians interested in humanities computing from around the world.
|