Difference between revisions of "Annotator Documentation"

From NCBO Wiki
Jump to navigation Jump to search
(Created page with "Category:NCBO Virtual Appliance = General workflow informations = * Cache and dictionary file updated at each ontology submission parsing * The dictionary.txt file is ge...")
 
(pointed to new content)
Line 1: Line 1:
 
[[Category:NCBO Virtual Appliance]]
 
[[Category:NCBO Virtual Appliance]]
  
= General workflow informations =
+
This content has been moved! Please find the new content for the 3.0 version of the Virtual Appliance at our new [https://ontoportal.github.io/administration/ OntoPortal Virtual Appliance Administration pages].
  
* Cache and dictionary file updated at each ontology submission parsing
+
In particular this content is mostly at the [https://ontoportal.github.io/administration/management/annotator_management/ Annotator Management page].
* The dictionary.txt file is generated from Redis data (annotator cache at localhost:6379)
 
* Annotator ruby lib (https://github.com/ncbo/ncbo_annotator)
 
** The Annotator calls mgrep on port 5555 which uses ''/srv/mgrep/dictionary/dictionary.txt'' to annotate the text.
 
** Mgrep uses /srv/mgrep/dictionary.txt to retrieve the mgrep ID (1st column of dictionary.txt)
 
** Redis use the mgrep ID to retrieve the class URI and other data (PREF or SYN, ontology URI)
 
 
 
= Update cache and dictionary =
 
 
 
== Redis prefix ==
 
When caching data about a concept Redis use a prefix (it allows BioPortal to update the cache on another prefix without interrupting the Annotator service)
 
 
 
To get the prefix used by the annotator in Redis:
 
 
<code>
 
redis-cli -h localhost -p 6379 <br/>
 
  6379> get current_instance
 
</code>
 
 
 
To change it:
 
<code>
 
6379> set current_instance "c1:"
 
</code>
 
 
 
 
 
== Totally clean the cache (optional) ==
 
Not always necessary, but can be useful if you have old annotations (for concept from deleted submissions) that won't go away.<br/>
 
Be careful it also delete the "current_instance", so you will have to set it.
 
 
 
<code>
 
redis-cli -h localhost -p 6379 <br/>
 
  6379> flushall
 
  6879> set current_instance "c1:"
 
</code>
 
 
 
 
 
== Setting up of config files ==
 
 
 
Running the generate cache CRON job will create new entries in Redis with the prefix defined in the config file with "annotator_redis_alt_prefix"
 
 
 
Config files location in the VM:<br/>
 
* /srv/ncbo/ncbo_cron/config/appliance.rb<br/>
 
* /srv/ncbo/ontologies_api/current/config/environments/appliance.rb
 
 
 
To generate the cache on a new "alt" prefix and continue to annotate with the one defined in Redis current_instance
 
<code>
 
Annotator.config do |config|
 
  config.annotator_redis_prefix  = ""
 
  config.annotator_redis_alt_prefix  = "c1:"
 
</code>
 
 
 
 
 
== Generate the Redis cache and dictionary for all ontologies ==
 
 
 
The following CRON job will generate the annotator cache in Redis, then generate the dictionary used by mgrep (in ''/srv/mgrep/dictionary/dictionary.txt'')<br/>
 
'''Be careful!''' It is using the _annotator_redis_alt_prefix_ to generate the cache, but the Redis _current_instance_ to generate the dictionary.<br/>
 
So if you are doing a total refresh of the cache and dictionary (after a flushall for example) you will have to define the same prefix in Redis current_instance and in config.annotator_redis_alt_prefix
 
 
 
<br/>
 
Then run the following command from the ncbo_cron project (''/srv/ncbo/ncbo_cron''):
 
<code>
 
bin/ncbo_ontology_annotate_generate_cache -a -r -d -l logs/all_cache.log
 
</code>
 
 
 
It will generates entry in Redis like this : "c1:term:-1762481778059933889"
 
Check for it in Redis:
 
 
 
<code>
 
redis-cli
 
  6379> keys c1:term:*
 
</code>
 
 
 
Restart mgrep : ''service mgrep restart'' or do a ''bprestart''
 
 
 
The annotator should be updated!
 
 
 
 
 
 
 
== More details on the generate cache options ==
 
 
 
The CRON job ''ncbo_ontology_annotate_generate_cache'' can be run with different options:
 
 
 
* -r or --remove-cache
 
Reset cache to scratch
 
DO NOT use the -r attribute when generating cache for individual ontologies, as it will delete the entire cache used by the Annotator and replace it with a new one that will only contain the given ontologies.
 
 
 
* -a or --all-ontologies
 
To generate the cache for all ontologies : run the process on the annotator_redis_alt_prefix
 
 
* -o STY,SNMI
 
Generate the cache only for the specified ontologies : run the process on the annotator_redis_prefix
 
 
* -d or --generate-dictionary
 
Also re-generate the dictionary for the terms prefixed by current_instance. Does the same job as the ncbo_ontology_annotate_generate_dictionary CRON job.
 
 
 
 
 
== Generate the dictionary only ==
 
 
 
To generate the dictionary from the Redis cache (terms prefixed with Redis ''current_instance''):
 
 
 
/srv/ncbo/ncbo_cron/bin/ncbo_ontology_annotate_generate_dictionary
 
 
 
 
 
 
 
= Troubleshooting =
 
 
 
== Check which dictionary mgrep is using ==
 
 
 
Restart mgrep and run it:
 
ps -ef | grep mgrep
 
 
 
It should give something like:
 
mgrep    27924    1  0 Jan27 ?        00:00:00 /usr/local/mgrep-4.0.2/mgrep --port=55555 -f /srv/mgrep/mgrep-55555/dict -w /usr/local/mgrep-4.0.2/word_divider.txt -c /usr/local/mgrep-4.0.2/CaseFolding.txt
 
 
 
 
 
== Run queries on mgrep ==
 
 
 
Run queries against the mgrep directly, using a telnet connection to the mgrep server
 
 
 
<code>
 
telnet my-bioportal-vm 55555
 
  ANY entity
 
</code>
 
 
 
It should return something like this (if entity is cached for the annotator, which should be since it is in STY)
 
 
 
1882193402596741431 2 7 entity
 

Revision as of 18:07, 12 June 2020


This content has been moved! Please find the new content for the 3.0 version of the Virtual Appliance at our new OntoPortal Virtual Appliance Administration pages.

In particular this content is mostly at the Annotator Management page.