Making Sense of Unstructured Data in Medicine Using Ontologies
Changes in biomedical science, public policy, information technology, and electronic heath record (EHR) adoption have converged recently to enable a transformation in the delivery, efficiency, and effectiveness of health care. While analyzing structured electronic records have proven useful in many different contexts, the true richness and complexity of health records—roughly 80 percent—lies within the clinical notes, which are free-text reports written by doctors and nurses in their daily practice. We have developed a scalable annotation and analysis workflow that uses public biomedical ontologies and is based on the term recognition tools developed by the National Center for Biomedical Ontology (NCBO).This talk will discuss the applications of this workflow to 9.5 million clinical documents—from the electronic health records of approximately one million adult patients from the STRIDE Clinical Data Warehouse—to identify statistically significant patterns of drug use and to conduct drug safety surveillance. For the patterns of drug use, we validate the usage patterns learned from the data against FDA-approved indications as well as external sources of known off-label use such as Medi-Span. For drug safety surveillance, we show that drug–disease co-occurrences and the temporal ordering of drugs and disease mentions in clinical notes can be examined for statistical enrichment and used to detect potential adverse events.