Gaian Database technology seeks to optimize data identification and processing
December 17, 2013
Data volumes are in a stage of continuous growth, particularly in the areas of science, industry, and the military. Thus, the issue of authenticated federated sharing and the breadth and depth of information available, as well as the capacity to process it in a timely manner, are ongoing challenges as it pertains to information management.
In order to satisfy the demand for enhanced information management and to fulfill the desires of the NATO Intelligence Fusion Centre for improved information sharing, the current trend is to move away from large database architectures to more distributed database technologies.
The type of database desired is one that empowers discovery, makes better use of network resources, and promotes security.
Scientists and analysts at the U.S. Army Research Laboratory (ARL) in the Computational and Information Sciences Directorate (CISD) and Sensors and Electron Devices Directorate, in collaboration with NIFC and IBM researchers in the U.K., recently completed a Coalition Warfare Program (CWP) project to field Gaian Federated Database technology, which leveraged Gaian Database capability, a research product of the ARL and U.K. Ministry of Defence International Technology Alliance.
The goal of the CWP, sponsored by the Office of the Secretary of Defense, was to develop and transition an extensible capability of performing distributed federated query and information dissemination across a coalition network of distributed disparate data and information sources.
Integrated within this capability is the development of the associated access-controlled policies with security measures that govern the sharing/dissemination of data/information to support coalition related intel missions.
Much of the continued research conducted on the Gaian Database occurs in the Network Science Research Laboratory at ARL's Adelphi Laboratory Center. The Network Science Research Laboratory (NSRL) was developed to provide a controlled, repeatable environment for network science experimentation.
"The NSRL is an ideal facility for understanding the real-world limits of this capability. Using our virtualized experimentation environment, we can subject the Gaian Database, or any other application, to the limitations of tactical networks in a repeatable way," said Andrew Toth, computer scientist and lead for the Secure Mobile Networking Team in ARL's Tactical Network Assurance Branch within CISD.
The project was led by ARL researchers Tien Pham and Andrew Toth, with software development and integration by IBM U.K. and support from the U.K. Defense Science and Technology Laboratory and is primarily concerned with data discovery, the federation of various data sources and the safeguarding of data access and dissemination.
The Gaian Database is a Dynamic Distributed Federated Database, meaning that its structure combines the capabilities of distributed databases, database federation, automated discovery and the semantics of data, allowing for controlled access to data and the flow of data through the network of distributed nodes within the database.
The nodes discussed have the ability to logically connect through a fully autonomous process that is biologically inspired and have the capability to access multiple federated data sources.
"The basic concept of Gaian was to answer the question, "What would be required to connect every data source in the world?" To accomplish this, database nodes would need automated discovery of other nodes," said Toth.
"Discovery only gets you part of the way. Another facet of the challenge is data movement. Rather than repeatedly replicating all of the data to a centralized data store in anticipation of some future need, the data is kept at its source, whether sensors, smartphones, or servers, and retrieved through a distributed query only when needed. This 'store locally query anywhere' approach is a more efficient use of resources," added Toth.
For the Gaian Database and the organizations, that it has the potential to support, much of the data to be queried is in the form of textual reports, email, and analyst notes.
In order for relevant, mission critical information to be federated and retrieved from these reports, Natural Language Processing (NLP) rules were developed using the IBM LanguageWare tool.
The combination of NLP rules for parsing raw data, a triplestore and content analysis create the capability to target people, places, organizations, dates, times, and measurements as well as improvised explosive devices, weapons, and vehicles when information is queried and are able to be linked together by event.
"The solution transitioned to NIFC allows the analyst to focus on discovering links between people, places, organizations, and events rather than the complex Structured Query Language queries underlying those links," Toth stated. When it comes to security, Toth said that the Gaian Database uses a combination of Policy Based Access and Kerberos user authentication.
IBM developed an extension to the standard Kerberos protocol for use with the Gaian Database, and in order to use the database, a user authenticates when they first login, which then negotiates with the Ticket Granting Service (Domain Controller) to obtain the ticket for the web application server.
Once the user has authenticated, the application server creates the query key for that user, which is then passed encrypted down a connection of nodes.
This key is generated by applying a one-way function to the user's Kerberos session key, which is then used to create message integrity codes for the queries performed by that user.
"Kerberos authentication was a key challenge of the NIFC transition. Ultimately, the product of the research needs to integrate seamlessly with deployed systems, which in this case were Kerberos-enabled systems on the U.S. Battlefield Information Collection and Exploitation System network," Toth said.
In terms of benefits to the Army, the Gaian Database assists Soldiers in performing missions by providing timely information collected from disparate data sources.
"By treating everything as a data source and enabling access to that data, the Soldier can have the latest intel to support the mission," said Toth.
"In addition, the policy enforcement mechanism that is an integral part of the Gaian Database means the analyst can easily share data sources with coalition partners based on mission needs," added Toth.
The final step of this CWP project was a demonstration of the Gaian Database technology to senior management at NIFC at the Royal Air Force Station Molesworth in the U.K.
The NIFC commander quickly recognized the benefits of this new capability and envisioned a use case with application to IED detection.
The ultimate intent is to work with NIFC to authorize the database for continued use on the NATO Battlefield Information Collection and Exploitation System network so that the technology can be extended to multiple agencies.
"What we have is an excellent example of ARL's involvement in the creation and transition of an exciting new capability that will benefit the Soldier," concluded Toth.