Context-based Ontology Integration
The Harvard Medical School collaborative R01 is focused on context-based ontology integration.
The group has started to work on the lexical mapping algorithm. For the prototype, they started by getting sample ontology concepts using BioPortal REST services. To test the algorithm, a subset of Gene Ontology and NCI Thesaurus samples were accessed from BioPortal. Sample full-text papers from the Pubmed were downloaded. As a preprocessing step, suffix tree of these papers was constructed, to allow for faster string matching. Concept names in Gene Ontology and NCI Thesaurus were used to create a contingency table using the suffix tree for matching jointly appearing concepts in a given publication. The Bayes factors (BF) for each of the concepts was computed- and they are now working on determining to term mappings based on the corresponding p-values.
Reference Paper: Alterovitz G, Xiang M, Hill D, Lomax J, Liu J, Cherkassky J, Dreyfuss J, Mungall J, Harris M, Dolan M, Blake J, Ramoni MF. Ontology engineering. Nat Biotech. 2010 (28) 128-130.

