Infectious Disease Ontology 2008

From NCBO Wiki
Revision as of 08:41, 18 April 2008 by Phismith (talk | contribs) (New page: Introduction The goal for the 2007 Infectious Disease Ontology workshop (IDO 2007) was to lay the foundation for a distributed, community-based approach to developing the ontology coverage...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction The goal for the 2007 Infectious Disease Ontology workshop (IDO 2007) was to lay the foundation for a distributed, community-based approach to developing the ontology coverage needed for the infectious disease domain. As described in the workshop report, the workshop was highly successful and had four primary outcomes that directly support attainment of this goal: • Training of a core set of infectious disease researchers in ontology-development methods, facilitating their participation in ontology development; • Development of a core Infectious Disease Ontology (IDO) which will serve as a consensus-based controlled vocabulary resource for annotation of data representing all entities relevant to infectious diseases generally; • Establishment of a method for creating a set of ontologies that can be developed in a distributed fashion yet together cover the entire infectious disease domain (the set consists of the above-described core IDO ontology plus sub-domain-specific extensions of the core, such as IDO-tuberculosis, IDO-malaria); • Formation of an Infectious Disease Ontology Consortium (IDOC) whose members have agreed to contribute towards continued development of the core IDO and to develop seven different sub-domain-specific ontologies.

These outcomes have positioned us to make rapid progress toward ontology coverage of the infectious disease domain by providing both a specific strategy for the proposed community-based approach to ontology development and a forum for engaging researchers with expertise in the relevant sub-domains and coordinating their efforts to ensure interoperability between the resulting ontologies.

To capitalize on the above-mentioned outcomes and sustain the momentum gained from the 2007 workshop, we propose a 2008 Infectious Disease Ontology workshop with the goals of: i) providing training for new consortium members, ii) critically evaluating the ontology development test cases initiated at IDO 2007 to improve both the ontologies themselves and the methodology used for their development, iii) identifying key application test cases, such as ontology-based natural language processing, to be developed over the coming year, iv) significantly expanding representation in the IDOC for sub-domains still lacking development effort, and v) involving representatives from key information sources and institutions that could be important contributors and users of the IDO set of ontologies.

The anticipated outcomes of the 2008 meeting are a fully tested development methodology that can be adopted on a broad scale by experts in a wide variety of infectious disease sub-domains, together with increased participation in the IDOC of individuals with the necessary training and expertise. The result would bring retrieval, analysis, and exploitation of a wide variety of infectious disease data to a new level, and also support communication of scientific results between highly dispersed groups of researchers.

Below we describe current efforts being carried out as result of IDO 2007 and propose a schedule and budget for a two-day workshop that would allow us to work towards our new set of goals.

Progress since IDO 2007 In the four months since IDO 2007, much progress has been made in the development of IDO and seven sub-domain-specific extensions of IDO. The sub-domain-specific extensions include a Vaccine Ontology (Yonggun He, University of Michigan) and six disease-specific ontologies for the following diseases: • Tuberculosis (Carol Dukes-Hamilton, Duke University Medical Center) • Staphylococcus aureus bacteremia (Vance Fowler, Duke University Medical Center) • infective endocarditis (Sivaram Arabandi, Cleveland Clinic Foundation) • malaria and other vector-borne diseases (Christos Louis, Institute for Molecular Biology and Biochemistry – FORTH) • Dengue fever (Saul Lozano-Fuentes, Colorado State) • Influenza (Stuart Sealfon, Mount Sinai School of Medicine; Richard Scheuermann, University of Texas, Southwestern Medical Center)

Development of IDO has continued along two fronts, expansion of content driven by development of the subdomain-specific ontologies and refinement of the approach to representing infectious disease-relevant entities ontologically.

The Vaccine Ontology is being developed in the context of a large database and textmining effort led by Dr. Yonggun He at the University of Michigan. Drs. Cowell, He, and Smith have collaboratively developed a draft Vaccine Ontology consisting of 68 terms. Current work includes defining these terms and relating them to other terms in the IDO ontology. Dr. He has just submitted an R01 proposal to the NIH to fund continued development of the Vaccine Ontology and its application to the annotation of data resulting from vaccine research, to natural language processing of primary scientific literature describing the results of vaccine research, and to the management of the resulting data and information in an online public information resource called the Vaccine Investigation and Online Information Network (VIOLIN) ( Drs. Cowell and Smith are co-investigators on this proposal.

In collaboration with Dr. Carol Dukes-Hamilton at Duke University Medical Center, Drs. Cowell and Smith have begun developing a draft ontology of tuberculosis and a method for defining ISO 11179 data elements using logical constructs based on terms derived from ontologies. Dr. Dukes-Hamilton’s research group has defined eighty tuberculosis data elements and curated these into the National Cancer Institute’s metadata repository, caDSR. Definition of these data elements using ontology terms provides not only a formal method for data element definition, significantly improving the resulting definitions, but also interoperability between data elements (along with the data associated therewith) and the vast amount of biomedical data and information annotated with terms from the same or an interoperable set of ontologies. Drs. Cowell, Dukes-Hamilton and Smith have submitted an R01 proposal to the NIH to support continued development of the tuberculosis ontology, ontology definition of tuberculosis data elements, and development of an ontology-based tuberculosis clinical decision support system.

In collaboration with Dr. Vance Fowler at Duke University Medical Center, Drs. Cowell and Smith have developed a draft ontology of Staphylococcus aureus bacteremia and submitted an R01 proposal to the NIH to support continued development of this ontology and its application in analyzing networks of functionally related gene products to identify host genes conferring susceptibility to Staphylococcus aureus bacteremia.

Dr. Sivaram Arabandi of the Cleveland Clinic Foundation is part of a large team developing SemanticDB technology, a semantic datastore with query functionality, having primary focus on Cardiology and Cardiothoracic Surgery. A portion of this work involves developing an IDO extension ontology for infective endocarditis.

Dr. Christos Louis’ research group at the Institute of Molecular Biology and Biochemistry (IMBB), one of the seven institutes of the Foundation for Research and Technology – Hellas (FORTH), based in Crete, is developing an IDO extension for malaria and other vector-borne diseases. The group is working in parallel to develop an ontology of the physiological processes of disease vectors that play a direct or indirect role in disease transmission. These ontology development efforts are being pursued within the context of VectorBase (, an NIAID Bioinformatics Resource Center for invertebrate vectors of human pathogens, and embracing efforts to construct decision support systems for vector-borne diseases.

A group of researchers under the direction of Dr. Richard Scheuermann, University of Texas, Southwestern Medical Center, has been working to develop an influenza virus sequence and surveillance ontology to describe data generated from the Center for Influenza Research and Surveillance (CEIRS) projects. The CEIRS projects consist of two research areas, influenza virus surveillance and basic influenza virus sequence and genetic reassortment research. A list of data fields that describe influenza virus isolates and surveillance data has been created by consolidating data fields from separate CEIRS participants. The list serves as a starting point for the integration of each data field into an existing OBO ontology, including IDO, or as the basis for creation of the influenza-specific ontology. The immediate goal is to apply the Influenza Virus Ontology to data collected as part of the CEIRS projects in an effort to enable influenza researchers to more easily elucidate the causes of influenza virulence and pathogenesis.

For more information about IDO and the sub-domain extensions, visit

IDO 2008 The methodology for a distributed, community-based approach to ontology development outlined at the 2007 workshop has been tested and refined through the efforts of the above founding members of the IDOC. In addition, we have seeded ontology development efforts in the broad sub-domains using test cases of bacterial, viral, and eukaryotic parasitic diseases. We are thus at a critical point where sufficient progress with the foundational work has been made, and the IDOC is now ready to support a broad community effort, offering a tested methodology, guidance in implementing this methodology through interaction with experienced IDOC members, and training for new members. In preparation for this next phase, we propose a 2008 workshop with the following specific goals: to provide training for new IDOC members, to critically evaluate the ontology development test cases, to identify key application test cases, and to expand the IDOC. Our strategy for attaining each goal is described below.

Goal 1: Providing Training for New IDOC Members – Pre-workshop Training Session To provide training in ontology development best practices for new consortium members and select trainees (see below), we will offer a two-hour pre-workshop training session the afternoon before the first day of the workshop. This session will include an introduction to basic categories in ontology, definitions, ontology formalisms, and the proper formulation of relations. We will discuss the most common errors in existing ontologies and the consequences of these errors for ontology use. We will emphasize the advantages of a principled approach to ontology development, using IDO as an example of a sound ontology.

While the meeting will be by invitation only to facilitate meeting the working goals of the meeting, we will actively recruit five early career participants who will be targeted especially for the benefits they will receive from the training activities. We will recruit these individuals using the same methods as were adopted for our initial meeting in Cold Spring Harbor.

Goal 2: Critically Evaluating the Ontology Development Test Cases – Workshop Sessions 1 – 5 and 7 – 9 To prepare for broader use of the IDO ontologies and their development methodology, the primary focus of the meeting will be critically evaluating the core IDO and each test-case ontology and to improving both the ontologies and the methodology for their development. We plan for eight sessions (see detailed scheduled below) devoted to evaluating each ontology for: • scope and content • biological accuracy • adherence to ontology development best practices and logical rigor • interoperability with other ontologies in the set.

Goal 3: Identification of Key Application Test Cases – Workshop Session 10 For each of the IDO extension ontologies currently under development, a specific application is being tested (described above). These applications include data annotation and management, text-mining, network analysis, data element definition, and decision support. To ensure that the ontologies are not developed in a way that later constrains their utility, one session at the workshop will be devoted to analyzing the ontology needs of each application, examining how those needs impact the ontology development process, and identifying application areas not yet being explored. A focus for research in the coming year will thus be developing these unexplored application areas.

Goals 4 and 5: Expanding IDOC Membership – Target Invitees The third and fourth goals of the meeting, to expand representation within the IDOC for specific subdomains and for information resource communities who are potential users of the IDO ontologies, will be realized through targeted invitations. We will target influential experts both within the subdomains for which ontology development is ongoing and for subdomains for which ontology development effort is still lacking. The purpose of the former is to gain additional expert input into the ontologies’ evaluation and to further engage the infectious disease research community in contributing to and using the ontologies. The purpose of the latter is to encourage development for subdomains thus far neglected. In addition, we will target invitations to representatives from specific information resources. As a small number of attendees is most effective for the type of working meeting proposed, we will hold the number of target invitees to six. Some of our invitees, however, represent both a specific subdomain and an important information resource. We propose to invite the following six individuals: • Jessica Kissinger (University of Georgia) – Dr. Kissinger’s research group focuses on the study of parasite genomics and genome evolution ( She is the Principal Investigator or Co-Principal Investigator on six different grants supporting the study of Apicomplexan genomics and the development of ApiDB (, PlasmoDB (, CryptoDB (, ToxoDB (, GiardiaDB (, and TrichDB ( Dr. Kissinger will be an important representative for these information resources and for the Apicomplexan research community. • Stefan Schultz (Freiburg University)– Dr. Schultz is a medical informaticist with a research focus on medical terminologies, ontology, and natural language processing ( Dr. Schultz is a current member of the content committee for the Systematized Nomenclature of Medicine (SNOMED CT), the largest and most widely used medical terminology. Dr. Schultz will thus serve as liaison between the IDO and SNOMED CT developer and user communities. • Georgia Tomaras (Duke University Medical Center) – Dr. Tomaras’ research focus is on the host-pathogen interactions that result in HIV escape from neutralizing antibodies ( She is a faculty member in Duke’s Center for Aids Research (CFAR), Human Vaccine Institute (HVI), and the Center for HIV AIDS Vaccine Immunology (CHAVI). In addition, she runs the Binding Antibody Core for CHAVI and for the HIV Vaccine Trials Network. Dr. Tomaras will serve as a representative for the HIV research community and as a liaison between IDO and each of these large-scale HIV vaccine research communities. • Bette Korber (Los Alamos National Laboratory) – Over the last decade, Dr. Korber has been a lead scientist developing the four LANL HIV databases ( for HIV genetic sequences, immunological epitopes, drug resistance-associated mutations, and vaccine trials. Dr. Korber’s research has focused on the use of these data to study HIV transmission patterns, HIV evolution in the face of a host immune response, and vaccine development ( Dr Korber will serve as a representative for the HIV research community and as a liaison between IDO and the LANL HIV databases. • David Lipman (National Center for Biotechnology Information) – Dr. Lipman is director of NCBI which hosts some of the most widely used biomedical databases, including GenBank, PubMed, OMIM, and dbSNP ( Including Dr. Lipman in the workshop will be an important first step towards exploring how IDO and the NCBI information resources could interact. • Cheryl Kraft (National Institutes of Health) – Ms. Kraft is the Associate Director of the Office of Biomedical Informatics for the Division of Allergy, Immunology, and Transplantation at the National Institutes of Allergy and Infectious Diseases. She is also the Lead Science Officer for the National Center for Biomedical Ontology. Ms. Kraft will greatly facilitate connecting members of the IDOC with NIAID-funded researchers whose areas of expertise or research interest suggest that they could contribute to or benefit from the infectious disease ontology effort.

Attainment of the five goals outlined for the meeting is critical to successfully achieving ontology coverage of the infectious disease domain and to achieving a distributed, community-based approach to the creation and adoption of a common vocabulary for the reporting and analysis of infectious disease-related research results. Our experience with analogous endeavors in other areas has made it clear that a group meeting allowing for dynamic discussion facilitated by visual media, including the spontaneous use of white boards, will be much more effective for attaining these five goals than email and wiki exchanges and teleconferences. The training of new IDOC members and the generation of lasting interest among the target invitees will be greatly facilitated by the group interaction and the opportunity to participate in the working sessions. Furthermore, the success of both the evaluation process and the analysis of application areas depends on the participation of experts from different fields, each with experience developing an IDO extension ontology for a different sub-domain. Such participants would in many cases be unlikely to invest the necessary effort in developing and disseminating a consensus-based vocabulary for use by their colleagues unless they had had the opportunity to gain trust in those other individuals who will form part of the consortium.

Schedule For each session below, one person has been designated as the session moderator. All sessions will, however, emphasize group discussion over presentation. Moderators of the ontology evaluation sessions will be responsible for beginning the session with a brief presentation of the ontology and will be prepared to navigate and display the ontology throughout the discussion. Moderators for the remaining sessions will be responsible for jumpstarting discussion with a brief outline of discussion points.

Pre-workshop training session 7:00pm to 9:00pm Introduction to Principles of Ontology Development Barry Smith and Lindsay Cowell

Day 1 9:00 am to 10:30 am Presentation and Critical Evaluation of the Core Infectious Disease Ontology Lindsay Cowell 10:30 am to 11:00 am Coffee break 11:00 am to 12:30 pm Presentation and Critical Evaluation of the Vaccine Ontology Yongun He 12:30 pm to 2:00 pm Lunch 2:00 pm to 3:00 pm Presentation and Critical Evaluation of the Tuberculosis Ontology Carol Dukes-Hamilton 3:00 pm to 4:00 pm Presentation and Critical Evaluation of the Staphylococcus aureus Ontology Vance Fowler 4:00 pm to 4:30 pm Coffee break 4:30 pm to 5:30 pm Presentation and Critical Evaluation of the Infective Endocarditis Ontology Sivaram Arabandi 5:30 pm to 7:00 pm Dinner 7:00 pm to 9:00 pm Ontologies in the Future of Infectious Disease Research A Discussion Introduced and Moderated by Christos Louis Day 2 9:00 am to 10:00 am Presentation and Critical Evaluation of the Vector-borne Disease Ontology Christos Louis 10:00 am to 11:00 am Presentation and Critical Evaluation of the Dengue Fever Ontology Saul Lozano-Fuentes 11:00 am to 11:30 am Coffee break 11:30 am to 12:30 pm Presentation and Critical Evaluation of the Influenza Ontology Richard Scheuermann 12:30 pm to 2:00 pm Working Lunch: Refine IDO to Better Support Interoperability of Extensions Barry Smith 2:00 pm to 3:00 pm Current Application Areas and Areas for New Development Alan Ruttenberg 3:00 pm to 4:00 pm Goals for the Coming Year Lindsay Cowell

Conclusion Biomedical and clinical research are becoming ever more reliant on the machine processing of data and information making progress in these fields ever more dependent on the availability of structured, logically consistent terminology resources. Until the IDO 2007 meeting, such resources were not available for the infectious disease domain. The IDO 2007 meeting was highly successful in laying the foundation for rapid progress toward the development of such resources by providing training for infectious disease researchers in ontology development and seeding formation of an Infectious Disease Ontology Consortium and development of a core Infectious Disease Ontology. To capitalize on the outcomes of this meeting and further advance the field of infectious disease ontology, we propose the IDO 2008 meeting with the specific goals of training more infectious disease researchers in ontology development, critically evaluating the ontology development work undertaken as a consequence of the IDO 2007 meeting, and expanding the Infectious Disease Ontology Consortium to connect the infectious disease ontology effort to more infectious disease research communities and information resources. We have outlined a set of meeting goals, a strategy for attaining these goals and a specific meeting schedule that we believe will bring to maturity the ontology work begun at IDO 2007 and allow the resulting ontologies to meet the data and information processing needs of infectious disease research.