Accelerating Candidate Gene Discovery through Ontological Indexing of Large Scale Data Repositories

Dr. Simon Twigger
Human and Molecular Genetics Center, Medical College of Wisconsin




One focus of modern biomedical research is the use of genomic reagents and associated data to identify the genetic basis for complex genetic diseases such as hypertension, cancer, diabetes, obesity, and many more. As a model system the laboratory rat, Rattus norvegicus, has long been a preeminent platform for the study of the genetic basis of diseases. The Rat Genome Database (RGD) is the premier repository of rat genomic and genetic data and as such our role is to provide data and tools to support these disease-centric studies. Systematically capturing and organizing the increasing amounts and types of data provides a challenge. To meet this challenge RGD has made significant use of a variety of ontologies, both as a data curation methodology and to organize and present data in a user-friendly fashion. However, genes with likely disease associations are manually identified through searches of existing databases and literature.

The aims of this project are to provide ontologies relevant to candidate gene evaluation to the BioPortal Annotator service and to enhance the speed and flexibility of this service to extract indexed tissue-specific gene expression data from GEO. The second aim is to develop data search, retrieval, and visualization tools to allow researchers to use the RGDextracted ontology data for candidate gene identification by integrating the GEO dataset index into RGD. Lastly, access to this data and tools will be via the RGD website which will house prototype tools, provide access to source code, documentation for developers, and tutorials.