Source-Code Stylometry Improvements in Python

Report No. ARL-TN-0860
Authors: Gregory Shearer, Frederica Nelson
Date/Pages: December 2017; 18 pages
Abstract: This technical note covers the work in rewriting existing source-code stylometry software into Python, and describes improvements to performance and maintainability and validation of results. Source-code stylometry is the process of attributing the authorship of source-code samples based on lexical, layout, and syntactic features extracted from code using machine-learning techniques, specifically random forest classifiers. The original work was conducted as part of a collaboration between the US Army Research Laboratory and Drexel University.
Distribution: Approved for public release
  Download Report ( 0.310 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.
 

Last Update / Reviewed: December 1, 2017