Annotator Dataset Workflow Howto

From NCBO Wiki
Revision as of 15:35, 22 October 2010 by Palexand (talk | contribs)
Jump to navigation Jump to search

Synchronize Ontology Data with Ontology Services

The ontologies and related data that will be used with the Annotator is gathered from the Ontology Services (part of BioPortal Core). This process should be run any time a new ontology (or a new version of an existing ontology) is added to the Ontology Services, though it could theoretically be run from a cron script or scheduled job.

  1. Remove out-dated ontologies from Annotator Database (e.g. older version of ontologies that does not in BioPortal anymore). By invoking this restlet, it will remove all the outdated ontology data and the associated entities such as concepts, terms, relations, semantic types and hierarchy information.

    See the list of ontologies/versions to be removed: http://ncbobioportal/obs/admin/ontologies/list/old
    Remove old ontologies: http://ncbobioportal/obs/admin/ontologies/remove

  2. Add new ontologies from BioPortal to Annotator. By invoking this restlet, it will add all the new ontology data and the associated entities such as concepts, terms, relations, semantic types and hierarchy information.

    See the list of ontologies/versions to be added: http://ncbobioportal/obs/admin/ontologies/list/new
    Add new ontologies: http://ncbobioportal/obs/admin/ontologies/add

  3. Populate Concepts (For details, please refer to Chapter 2.1)

    http://ncbobioportal/obs/loaderBigConcepts/all

  4. Populate Hierarchy (For details, please refer to Chapter 2.2)

    http://ncbobioportal/obs/loaderBigPaths/all

    To monitor the progress and view any errors, refer to:

    1. The "status" field in the table obs_ontology in Annotator DB (obs_hibernate database).
    2. Check the Tomcat log (/var/logs/tomcat6/catalina.out)

Create Dictionary File

All the terms will be created as dictionary file. (Data is coming from the obs_term table)

  • Location : The directory location is specified in build.properties
  • # Dictionary File path
    obs.dictionary.path=/usr/local/tomcat6/webapps/annotator/WEB-INF/resources/dictionary/
    
  • The dictionary file is generated in several chunks to avoid memory problems. There is a 1,000,000 term limit per file.
  • Generate the dictionary file:
  • http://ncbobioportal/obs/createDictionary/0
    
  • After the files are generated, they will need to be concatenated together. Additionally, the MGREP server will need to be restarted.

Mapping Data Population

Mappings between ontologies can be used in the Annotator to find related terms. Loading the mapping information is currently a manual process, though this will be automated in the future. If you have mapping data you would like to include in annotator, please <a href="mailto:support@bioontology.org">contact NCBO</a>.