Standardized Annotated Neurophysiological Data Repository (SANDR)


Standardized Annotated Neurophysiological Data Repository (SANDR)


Background: During the course of the CaN CTA a large number of human performance datasets were generated. These experiments measured the physical and cognitive performance of subjects engaged in real-world tasks in either real-world environments, or virtual environments modeled on real-world scenarios. In order to leverage collective results for a “big data” analysis, a significant number of these datasets were standardized, via ARL’s “Big Data for Human Sensing Pipeline”. The standardized datasets comprise the ARL Standardized Annotated Neurophysiological Data Repository (SANDR).

What it does: The “Big Data for Human Sensing” data processing pipeline was created to standardize SANDR datasets by producing a version of each dataset using the same ontological event tagging standard, and in the same objective format. The pipeline begins with the translation of Raw data to Standardized Level 0 data (STDL0). This process includes the aggregation and synchronization of data streams (if necessary) and the derivation of events. The final stage of STDL0 data generation is to tag events via the CTagger tool, which implements the Hierarchical Event Description standard. Standardized Level 1 data (STDL1) is generated by organizing STDL0 data within a container, based on the EEG Study Schema (ESS). In this stage of the pipeline, data are organized by session, which is defined as “EEG cap on to EEG cap off”, and associated with project-level metadata. Standardized Level 2 data (STDL2) is generated by executing the PREP (preprocessing) pipeline on the STDL1 data, producing data that has been cleaned (to a minimal extent) and verified. STDL2 data is ready for individual analysis efforts, or for use by other tools in generating further, customized versions of a dataset.

Why is this important?  Large scale analyses, across heterogeneous datasets, is the only way to infer brain-behavior relationships that can generalize across contexts (individuals, tasks, and states). Thus, datasets must be 1) uniformly annotated in such a way as to quantification of the manifold sources of single trial variability, and 2) brought into a common representation and format for aggregate analyses.

previous next