Project Summary

Systems biology involves the integration of multiple data types (across different data sources) to offer a more complete picture of the biological system being studied. While many existing biological databases are implemented using the traditional SQL (Structured Query Language) database technology, NoSQL database technologies have been explored as a more relationship-based, flexible and scalable method of data integration. In this paper, we describe how to use the Neo4J graph database to integrate a variety of types of data sets in the context of systems vaccinology. Specifically, we have converted into a common graph model diverse types of vaccine response measurement data from the NIH/NIAID ImmPort data repository, pathway data from Reactome, influenza virus strains from WHO, and taxonomic data from NCBI Taxon. While Neo4J provides a graph-based query language (Cypher) for data retrieval, we develop a web-based dashboard for users to easily browse and visualize data without the need to learn Cypher. In addition, we have prototyped a natural language query interface for users to interact with our system. In conclusion, we demonstrate the feasibility of using a graph-based database for storing and querying immunological data with complex biological relationships. Querying a graph database through such relationships has the potential to reveal novel relationships among heterogeneous biological data.