Accurate Arabic Script Language/Dialect Classification

Report No. ARL-TR-6761
Authors: Stephen C. Tratz
Date/Pages: January 2014; 30 pages
Abstract: Correctly identifying the language/dialect of a text is a critical first step for many natural language processing systems, including machine translation systems. To date, most language identification efforts have focused on distinguishing between European languages. Increasingly, historically-unwrittenArabic dialects are appearing online in social media. This report describes state-of-the-art classifiers for automatically distinguishing between Arabic script languages and between Arabic dialects.
Distribution: Approved for public release
  Download Report ( 0.146 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.

Last Update / Reviewed: January 1, 2014