Making Sense of Unstructured Data in Medicine using Ontologies
Nigam Shah (Stanford)
Changes in biomedical science, public policy, information technology, and electronic heath record (EHR) adoption have converged recently to enable a transformation in the delivery, efficiency, and effectiveness of health care. The true richness and complexity of health records lies within the clinical notes, which are free-text reports written by doctors and nurses in their daily practice. We have developed a scalable annotation and analysis workflow that uses public biomedical ontologies and is based on the term recognition tools developed by the National Center for Biomedical Ontology (NCBO).
We will review the underlying components of this annotation workflow (specifically the Annotator and Lexicon Builder components). We will then discuss the applications of this workflow to 9.5 million clinical documents--from the electronic health records of approximately one million adult patients from the STRIDE Clinical Data Warehouse, part of Stanford's CTSA Informatics platform--to identify statistically significant patterns of drug use and to conduct drug safety surveillance. For the patterns of drug use, we validate the usage patterns learned from the data against FDA-approved indications as well as external sources of known off-label use such as Medi-Span. For drug safety surveillance, we show that drug–disease co-occurrences and the temporal ordering of drugs and disease mentions in clinical notes can be examined for statistical enrichment and used to detect potential adverse events.