Text and Language Technology Group
Text-Tech

Text and Language Technology Group
William A. Kretzschmar, Jr.

Automated Content Analysis - TLTG can provide autonomous and semi-autonomous algorithms for determining document and corpus content. An example of content analysis in which TLTG members have been involved is the Tobacco Document Corpus. This is a National Cancer Institute funded project that involves the analysis of approximately 3.5 million documents released by the tobacco industry due to litigation. The analysis began with a stratified random sample of the document set. This sample was then subjected to statistical linguistic analysis which allowed the development of historical profiles of the tobacco industry based on the content of their own internal correspondence. In the commercial setting, TLTG members have also recently developed methods for analyzing the risk levels of email messages. This allowed the analyst to alert management to potential liabilities based on recurring, significant email content.

Automated Document Classification - TLTG can provide autonomous and semi-autonomous algorithms for the linguistic analysis of large document sets to provide classification of documents according to your specification. An example of content analysis in which TLTG members have been involved is the Tobacco Document Corpus. This is a National Cancer Institute funded project that involves the analysis of approximately 3.5 million documents released by the tobacco industry due to litigation. We have designed and implemented means to classify and analyze this large body of documents for the detection and assessment of "deceptive language." We are currently working on statistical methods for classifying email messages based on language type indexes.

Automated Risk Assessment - TLTG is able to provide autonomous and semi-autonomous algorithms for the linguistic analysis of large document sets to isolate individual documents which may present risk. We have recently designed and implemented means to identify the small proportion of documents within a corporate document set which were likely to present legal risks to management.

Document Comparison - TLTG is able to provide analysis and linguistic evidence indicating whether a given document is likely to have served as the source for another document. Our members have experience in cases of alleged plagiarism in which we documented the nature and extent of the relationship between two documents and determined which document was written before the other.

Authorship Attribution - Linguistic evidence can be used to determine the authorship of a particular document, with reference to other documents written by known authors. For instance, TLTG members have experience comparing 'poison-pen' letters to the known writings of a person suspected of writing them.

Linguistic Analysis of Court Evidence - Linguistic techniques can be used to evaluate the meaning of documents, or to assess the meaning or status of statements by parties to a matter. For example, members of our group have advised a defense team in a death penalty case about the judge's instructions to the jury, specifically with regard to the phrase "mitigating circumstances."

Trademarks - Linguistic evidence can be offered to document the status of trademarks with regard to use of the mark in common language. TLTG members have worked in the commercial sector to assess whether the name of a well-known company was in common use within its industry group and by the public, and consequently whether it should or should not be trademarked.

Identification/Profiling of Speakers and Writers - Comparisons of unidentified language samples to linguistic benchmarks, or comparisons of a disputed sample to a known speaker or writer, can assist investigation in determining likely suspect authors. The experience of our members in dialectology and sociolinguistics, more than 35 years of combined work, provides TLTG an advantage over other profilers. This is illustrated by the recent analysis of the Washington sniper letter by government "linguists," which overlooked several key clusters of linguistic features that created a distinct profile of the author(s).

Design of Mark-up and Archiving Protocols - TLTG is able to design mark-up and archiving protocols for any document set, tailored to any type of analysis. Our members have both the needed technical expertise in programming and experience on large text- and language-oriented projects. TLTG members are responsible for the design and implementation of mark-up protocols and/or archiving for the Tobacco Document Corpus, a National Cancer Institute funded project that involves the analysis of tobacco industry documents; for the Atlanta Survey Project, a National Science Foundation funded project collecting linguistics data and language samples from interviews in Atlanta, Georgia; and for several linguistic atlases including the Linguistic Atlas of the Middle and South Atlantic States and the Linguistic Atlas of the Gulf States.

Staff Training on Markup and Archiving - TLTG members have been and are currently involved in the design of university curricula and instruction in humanities computing. We have also provided instruction on linguistic computing in the commercial sector.

Arranging Subcontractors for Document Conversion and Markup - Although we generally do not provide keyboarding services for text conversion and markup, we can make arrangements with and supervise subcontractors.

Word Meaning and Usage Research - TLTG is able to provide the most up-to-date research on word meanings and usage--how words are currently used--based on a large collection of modern texts. We have, for example, provided this service to the commercial sector by researching the meaning of key words in order to clarify legal issues concerning institutional publications.

Word Pronunciations (American and British English) - TLTG maintains a 100,000 word database of both British and American pronunciations in a variety of pronunciation keys such as the International Phonetic Alphabet (IPA), American dictionary pronunciations, newspaper-style pronunciations, etc. These pronunciations are based on our own research and knowledge of dialects rather than traditional dictionary pronunciations. That is, our pronunciations reflect how words are pronounced in actual speech, not on archaic notions of right and wrong pronunciations. TLTG members have provided the pronunciations for a major international publisher's dictionary line.

Pronunciation Key Conversions - TLTG has designed, has implemented, and is prepared to provide software for maintaining dictionary content and for providing data conversion to and from commercial publication formats.

Lexicography Software and Training - TLTG provides consulting services to design and implement computer systems for lexicography, and is able to provide training for those systems. Our members have designed and implemented software for maintaining dictionary content and have experience in university level humanities computing instruction.


TLTG Home Forensic Doc. Analysis Text Encoding Lexicography Members Services Site Map

700 Oglethorpe Ave. •  Athens, Georgia 30606 •  mail to