Enabling systems immunology through ontological normalization and analysis of literature-based standards

Steven H. Kleinstein, Ph.D.
Interdepartmental Program in Computational Biology and Bioinformatics
Department of Pathology, Pathology Informatics
Yale University School of Medicine
 
Kei Cheung, Ph.D.
Interdepartmental Program in Computational Biology and Bioinformatics
Yale Center for Medical Informatics
Yale University School of Medicine
 
Systems-level measurements and analysis are playing an increasingly important role in understanding the human immune response to infection and vaccination. Many studies seek to identify predictive “signatures” that characterize the immune state of an individual and can be used as diagnostic or prognostic biomarkers. For example, gene expression signatures have been associated with transplant rejection and successful vaccination responses. These projects almost always require the integration of data across multiple independent studies, either to combine data for increased power or as validation sets to test a proposed model. This kind of data integration is also an important goal of several large NIH-funded consortia, including the Human Immunology Project Consortium (HIPC), the Modeling Immunity for Biodefense (MIB) centers and many others. Many of these data sets are now being deposited into the NIH/NIAID ImmPort data repository. Despite their availability from a centralized resource, comparing and integrating these data sets remains a serious challenge due to their large scale, heterogeneous formats, lack of detailed annotation, and use of non-standard terminologies.
 
Our driving biological project (DBP) leverages NCBO technologies in order to develop tools to facilitate integration of systems immunology data and reporting of analysis results using community-based standard terminology. One of these tools will perform ontological normalization of immunology data templates. We will also standardize data presentation by identifying commonly-used terms in immunology community. A specific use-case is focused on identification of common “signatures” of human influenza vaccination responses from independent data sets, and across a diverse set of data types including gene expression (microarray), cytokine measurement (luminex) and flow cytometry.