Difference between revisions of "Annotator Dataset Workflow Howto"

From NCBO Wiki
Jump to navigation Jump to search
(Replaced content with "As of Virtual Appliance v2.2, populating the Annotator Dataset is done automatically when an ontology is processed.")
 
(46 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Synchronize Ontology Data with Ontology Services ==
+
As of Virtual Appliance v2.2, populating the Annotator Dataset is done automatically when an ontology is processed.
 
The ontologies and related data that will be used with the Annotator is gathered from the Ontology Services (part of BioPortal Core). This process should be run any time a new ontology (or a new version of an existing ontology) is added to the Ontology Services, though it could theoretically be run from a cron script or scheduled job.
 
 
 
<ol>
 
<li>Remove out-dated ontologies from Annotator Database (e.g. older version of ontologies that does not in BioPortal anymore). By invoking this restlet, it will remove all the outdated ontology data and the associated entities such as concepts, terms, relations, semantic types and hierarchy information.
 
<p><code>
 
See the list of ontologies/versions to be removed:
 
http://ncbobioportal/obs/admin/ontologies/list/old<br/>
 
Remove old ontologies:
 
http://ncbobioportal/obs/admin/ontologies/remove
 
</code></p></li>
 
 
 
<li>Add new ontologies from BioPortal to Annotator. By invoking this restlet, it will add all the new ontology data and the associated entities such as concepts, terms, relations, semantic types and hierarchy information.
 
<p><code>
 
See the list of ontologies/versions to be added:
 
http://ncbobioportal/obs/admin/ontologies/list/new<br/>
 
Add new ontologies:
 
http://ncbobioportal/obs/admin/ontologies/add
 
</code></p></li>
 
 
 
<li>Populate Concepts (For details, please refer to Chapter 2.1)
 
<p><code>http://ncbobioportal/obs/loaderBigConcepts/all</code></p></li>
 
<li>Populate Hierarchy (For details, please refer to Chapter 2.2)
 
<p><code>http://ncbobioportal/obs/loaderBigPaths/all</code></p>
 
 
 
<p>To monitor the progress and view any errors, refer to:</p>
 
<ol>
 
<li>The "status" field in the table obs_ontology in Annotator DB (obs_hibernate database).
 
<li>Check the Tomcat log (/var/logs/tomcat6/catalina.out)</li>
 
</ol></li>
 
</ol>
 
 
 
== Create Dictionary File ==
 
 
 
<p>All the terms will be created as dictionary file. (Data is coming from the obs_term table)</p>
 
<ul>
 
<li>Location : The directory location is specified in build.properties</li>
 
<pre>
 
# Dictionary File path
 
obs.dictionary.path=/usr/local/tomcat6/webapps/annotator/WEB-INF/resources/dictionary/
 
</pre>
 
<li>The dictionary file is generated in several chunks to avoid memory problems. There is a 1,000,000 term limit per file.</li>
 
<li>Generate the dictionary file:</li>
 
<pre>
 
http://ncbobioportal/obs/createDictionary/0
 
</pre>
 
<li>After the files are generated, they will need to be concatenated together. Additionally, the MGREP server will need to be restarted.</li>
 
</ul>
 
 
 
== Mapping Data Population ==
 
<p>Mappings between ontologies can be used in the Annotator to find related terms. Loading the mapping information is currently a manual process, though this will be automated in the future. If you have mapping data you would like to include in annotator, please <a href="mailto:support@bioontology.org">contact NCBO</a>.</p>
 

Latest revision as of 17:06, 14 January 2015

As of Virtual Appliance v2.2, populating the Annotator Dataset is done automatically when an ontology is processed.