Lexicon Builder

From NCBO Wiki
Jump to navigation Jump to search


The LexiconBuilder project has been undertaken for creating a Web service that allows building of custom lexicons from BioPortal ontologies. The custom lexicons built by our system can be used for a variety of natural language processing and other ontology related tasks.

While generating the custom lexicons, several features of our system can be utilized for ease of creation, integration and maintainence of such lexicons. The users can specify ontology mappings in order to identify and retrieve relevant concepts in other ontologies which they might not otherwise know about, in order to assist in enriching the custom lexicons. It also facilitates the interpretation of datasets using the concept mappings between particular datasets. They can also make use of the visualization services provided by NCBO to view and understand the ontology structure retrieved, which otherwise can be a daunting task. Additionally, users can also retrieve identifiers for concepts etc using the lexicon builder and can then utilize the hierarchy services using those ids to identify and visualize the hierarchies(using the visualization service) in the lexicon and can then subsequently make changes to their lexicon building parameters, until they narrow down to their exact requirements. Our system also simplifies the task of performing version management of the custom lexicons. When changes occur to the underlying data, new data is added to the ontologies or old data is changed/deleted, the users can simply retrieve a newer version of their lexicon using our services.

The custom lexicons generated by using our system can help users in a variety of tasks, some exemplified by the users of our system. These lexicons can be used for domain or requirement specific text annotation tasks. e.g. annotation of protein mutations with disease terms. They can also be very useful for ontology related tasks such as ontology learning especially information extraction and also for the task of ontology enrichment. One of our users want to use our service for compiling a disease dictionary, which has actually served as our driving use case. They can also be employed for the purpose of tagging elements for enhancing the user browsing experience for domain specific information e.g. For the task of assisting the user when they are browsing the Web especially for documents related to the biomedical domain we could possible tag elements related to say Tumors and add more context specific information using a Tumor lexicon generated using our system. Another example comes from our user who are using our system to create workflow specific lexicons for concept recognition tasks.

Code location


Home page: http://ncbolabs-dev1.stanford.edu:8080/LexiconBuilder/

The Service end point: http://ncbolabs-dev1.stanford.edu:8080/LexiconBuilderService/

Note: The prototype is currently in the development phase and constantly undergoing changes.


Inclusion Criteria

Ontologies [default: all] (i.e., list of ontologies you want to use to expand separated with comma (without spaces) e.g., SNOMEDCT,NCI,MSH.

Parent Concept [default: all] (i.e. localConceptID of the concept to retrieve children concepts.

Synonyms true false [default: true] (i.e., do you want the synonyms of the concept too?)

Semantic Types For Concepts [default: all] (i.e., list of UMLS semantic types you want to use separated with comma (without spaces) e.g., T000,T047,T048.

Mapping Types For Mapped Concepts [default: none] (i.e., list of mapping types you want to use for the mapping expansion component separated with comma (without spaces) e.g., inter-cui,from-mrrel.)

MEDLINE Counts [default: all] (i.e., MEDLINE count of the concepts retrieved is greater than the specified count e.g. 10000)

Result Offset [default: None] (i.e., Retrieve results starting from this result. If Result Offset and NO Result Count then returns the top 'Result Offset' results) Result Count [default: None] (i.e., Number of results you want to retrieve starting from the offset.) e.g. 'Result Offset' = 0 'Result Count' = 100 For retrieving the top 100 results.

Exclusion Criteria

stopWords [default: empty] (i.e., list of stop words to use (i.e., not to use for annotation) separated with comma (without spaces) e.g., in,am,be,is.

withDefaultStopWords true false [default: false] (i.e., do you want to use our default stop word list available here: /obs/stopwords . This cancel the above one.)

Output Criteria

Output Columns (i.e., the columns you want in the results.) LocalConceptID LocalOntologyID OntologyName OntologyVersion OntologyDescription TermName LocalSemanticTypeID LocalSemanticTypeName MappedLocalConceptID MappingType

Unique Column (i.e., the unique column on which to group the results. e.g. using Concept will give one row for each unique concept.) LocalConceptID TermName The 'unique column' parameters are available only if you choose them in the Output Columns section too.

Output Format (i.e., the format of the results.) [default: asXml] asXML asText asTabDelimited

Compressed Output (Yes: Dictionary as a compressed ZIP file. No: Dictionary displayed on browser. ) [default: No] Yes No