Difference between revisions of "Medline Analysis"

From NCBO Wiki
Jump to navigation Jump to search
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
 
+
The Unified Medical Language System (UMLS) Metathesaurus are the most widely used underlying sources for biomedical natural language processing (NLP) systems, even though they were not designed as terminologies for NLP tasks.  However the performances of these systems are not satisfactory. In this study, we systematically analyzed UMLS terms by analyzing their occurrences in over 18 million MEDLINE abstracts written in human natural language. Our goals are three folds: 1. analyze UMLS term frequency and syntactic distribution on MEDLINE; 2. build an automatically filtered UMLS Metathesaurus based on MEDLINE analysis; 3. build an augmented UMLS Metathesaurus where each term is associated with its MEDLINE frequency and syntactic distribution statistics. The automatically filtered and augmented UMLS Metathesaurus can be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks. After automatic MEDLINE filtering, the augmented UMLS contains 518,835 terms, roughly 13% of original terms. Each term in the augmented UMLS is associated with a vector of syntactic distribution statistics and its MEDLINE frequency.
  
 
Code repository: <on our g-forge server> .. https://bmir-gforge.stanford.edu/gf/project/
 
Code repository: <on our g-forge server> .. https://bmir-gforge.stanford.edu/gf/project/

Revision as of 14:53, 22 February 2010

Introduction

The Unified Medical Language System (UMLS) Metathesaurus are the most widely used underlying sources for biomedical natural language processing (NLP) systems, even though they were not designed as terminologies for NLP tasks. However the performances of these systems are not satisfactory. In this study, we systematically analyzed UMLS terms by analyzing their occurrences in over 18 million MEDLINE abstracts written in human natural language. Our goals are three folds: 1. analyze UMLS term frequency and syntactic distribution on MEDLINE; 2. build an automatically filtered UMLS Metathesaurus based on MEDLINE analysis; 3. build an augmented UMLS Metathesaurus where each term is associated with its MEDLINE frequency and syntactic distribution statistics. The automatically filtered and augmented UMLS Metathesaurus can be used to improve efficiency and precision of UMLS-based information retrieval and NLP tasks. After automatic MEDLINE filtering, the augmented UMLS contains 518,835 terms, roughly 13% of original terms. Each term in the augmented UMLS is associated with a vector of syntactic distribution statistics and its MEDLINE frequency.

Code repository: <on our g-forge server> .. https://bmir-gforge.stanford.edu/gf/project/


How to Use

File:Example.jpg