Semi-Automated Methods for Refining a Domain-Specific Terminology Base

Report No. ARL-RP-0311
Authors: Gabriella Rose, Melissa Holland, Steve Larocca, and Robert Winkler
Date/Pages: February 2011; 22 pages
Abstract: A domain-specific term base may be useful not only as a resource for written and oral translation, but also for Natural Language Processing (NLP) applications, text retrieval, document indexing, and other knowledge management tasks. The objective of this investigation was to explore the use of alternative terminology extraction methods to refine and validate an existing military-specific bilingual dictionary. A series of semi-automatic methods was implemented to distill the existing term list by removing redundancies, resolving spelling variations, and separating individual expressions. Once the internal clean-up was completed, we compared two methods drawn from the terminology extraction literature in order to validate terms as military-specific and to propose a candidate list of non-specific terms for exclusion-term frequency calculations and terminology extraction lists. In this investigation, we wanted to find the best procedure to extract domain-specific terms for a low-resource domain; to demonstrate that terminology extraction methods can be used to validate and refine a domain-specific dictionary; and to provide the final, refined dictionary as a term base to support customization of machine translation systems for the military domain.
Distribution: Approved for public release
  Download Report ( 0.413 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.

Last Update / Reviewed: February 1, 2011