NCBO User Profile: Andrew Su, The Scripps Research Institute
Submitted by hollsyl on September 8, 2011 - 18:22
Mining the Gene Wiki for Structured Gene Annotations Using the NCBO Annotator
“The NCBO Annotator made it exceptionally easy to mine biomedical concepts out of free text.”
Andrew Su, PhD
Department of Molecular and Experimental Medicine
The Scripps Research Institute
Research Interests: The Su Laboratory is interested in understanding all aspects of human gene function. In addition to mining large biological data sets for direct biomedical discovery, the group also builds computational tools for visualization and analysis of gene annotation data. In particular, two of these projects focus on harnessing the principle of community intelligence by directly “crowdsourcing” among biological scientists. BioGPS (http://biogps.org) targets the community of bioinformatics developers to create a customizable and extensible gene annotation portal. The Gene Wiki (http://en.wikipedia.org/wiki/Portal:Gene_Wiki) targets the entire biological community to write a continuously-updated, community-reviewed, and collaboratively-written review article for every human gene.
Use of NCBO Web Services:
The Gene Wiki provides a content-rich resource for information on gene function. In total, the Gene Wiki currently contains over 78 megabytes of text content over 10,000 gene-specific articles. These articles are collectively viewed over 50 million times per year.
While the Gene Wiki is already an excellent resource for human readers, the unstructured nature of these articles is not easily amenable to consumption and computation by computers. Therefore, a major initiative moving forward is to translate the unstructured content in the Gene Wiki into structured annotations. Led by postdoc Ben Good, we have used the NCBO Annotator to process the Gene Wiki text using the Gene Ontology and Disease Ontology. This analysis resulted in the identification of over 14,000 biomedical concepts corresponding to gene functions and human diseases.
Taking advantage of the gene-centric nature of the Gene Wiki, we have used the links between genes and biomedical concepts as candidate gene annotations. Not surprisingly, a significant number of those candidates agree with formal annotations from official curation organizations. In addition however, we find many candidate annotations that are not currently reflected in gene annotation databases. Expert review of a sampling of these candidates shows a high precision rate. These results support our hypothesis that the Gene Wiki is a useful resource for both human readers and bioinformatics algorithms.