A Method for Correcting Broken Hyphenations in Noisy English Text

Report No. ARL-TN-0481
Authors: Jeffrey C. Micher
Date/Pages: April 2012; 18 pages
Abstract: The problem of rejoining broken hyphenations in processed English text is addressed. A basic algorithm is developed, which makes use of a word validation step. Results of running the algorithm over an English military training text is presented and analyzed. Precision and recall scores show that the algorithm works well for correcting broken hyphenations, but fails when certain types of noise are encountered in the data.
Distribution: Approved for public release
  Download Report ( 0.262 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.
 

Last Update / Reviewed: April 1, 2012