Disjoint and Overlapping Tuning Sets

Report No. ARL-TN-0504
Authors: John Morgan, Luis Hernandez, and Stephen LaRocca
Date/Pages: September 2012; 14 pages
Abstract: It is conventional in machine learning to train and tune on disjoint sets in order to avoid overfitting. Under severe data paucity conditions, it is difficult to adhere to this convention. An experiment was run to test a statistical machine translation (SMT) system constructed with disjoint and overlapping training and tuning sets with a small amount of training data. Results confirm the need to partition the sample text data into training and tuning sets even in the case of severe data paucity. An improvement of 3% was observed when the tuning set was disjoint from the training set.
Distribution: Approved for public release
  Download Report ( 0.226 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.

Last Update / Reviewed: September 1, 2012