Annotator Documentation

From NCBO Wiki
Jump to: navigation, search


General workflow informations

  • Cache and dictionary file updated at each ontology submission parsing
  • The dictionary.txt file is generated from Redis data (annotator cache at localhost:6379)
  • Annotator ruby lib (
    • The Annotator calls mgrep on port 5555 which uses /srv/mgrep/dictionary/dictionary.txt to annotate the text.
    • Mgrep uses /srv/mgrep/dictionary.txt to retrieve the mgrep ID (1st column of dictionary.txt)
    • Redis use the mgrep ID to retrieve the class URI and other data (PREF or SYN, ontology URI)

Update cache and dictionary

Redis prefix

When caching data about a concept Redis use a prefix (it allows BioPortal to update the cache on another prefix without interrupting the Annotator service)

To get the prefix used by the annotator in Redis:

redis-cli -h localhost -p 6379 
6379> get current_instance

To change it:

6379> set current_instance "c1:"

Totally clean the cache (optional)

Not always necessary, but can be useful if you have old annotations (for concept from deleted submissions) that won't go away.
Be careful it also delete the "current_instance", so you will have to set it.

redis-cli -h localhost -p 6379 
6379> flushall 6879> set current_instance "c1:"

Setting up of config files

Running the generate cache CRON job will create new entries in Redis with the prefix defined in the config file with "annotator_redis_alt_prefix"

Config files location in the VM:

  • /srv/ncbo/ncbo_cron/config/appliance.rb
  • /srv/ncbo/ontologies_api/current/config/environments/appliance.rb

To generate the cache on a new "alt" prefix and continue to annotate with the one defined in Redis current_instance

Annotator.config do |config|
  config.annotator_redis_prefix  = ""
  config.annotator_redis_alt_prefix  = "c1:"

Generate the Redis cache and dictionary for all ontologies

The following CRON job will generate the annotator cache in Redis, then generate the dictionary used by mgrep (in /srv/mgrep/dictionary/dictionary.txt)
Be careful! It is using the _annotator_redis_alt_prefix_ to generate the cache, but the Redis _current_instance_ to generate the dictionary.
So if you are doing a total refresh of the cache and dictionary (after a flushall for example) you will have to define the same prefix in Redis current_instance and in config.annotator_redis_alt_prefix

Then run the following command from the ncbo_cron project (/srv/ncbo/ncbo_cron):

bin/ncbo_ontology_annotate_generate_cache -a -r -d -l logs/all_cache.log

It will generates entry in Redis like this : "c1:term:-1762481778059933889" Check for it in Redis:

  6379> keys c1:term:*

Restart mgrep : service mgrep restart or do a bprestart

The annotator should be updated!

More details on the generate cache options

The CRON job ncbo_ontology_annotate_generate_cache can be run with different options:

  • -r or --remove-cache

Reset cache to scratch DO NOT use the -r attribute when generating cache for individual ontologies, as it will delete the entire cache used by the Annotator and replace it with a new one that will only contain the given ontologies.

  • -a or --all-ontologies

To generate the cache for all ontologies : run the process on the annotator_redis_alt_prefix

  • -o STY,SNMI

Generate the cache only for the specified ontologies : run the process on the annotator_redis_prefix

  • -d or --generate-dictionary

Also re-generate the dictionary for the terms prefixed by current_instance. Does the same job as the ncbo_ontology_annotate_generate_dictionary CRON job.

Generate the dictionary only

To generate the dictionary from the Redis cache (terms prefixed with Redis current_instance):



Check which dictionary mgrep is using

Restart mgrep and run it:

ps -ef | grep mgrep

It should give something like:

mgrep    27924     1  0 Jan27 ?        00:00:00 /usr/local/mgrep-4.0.2/mgrep --port=55555 -f /srv/mgrep/mgrep-55555/dict -w /usr/local/mgrep-4.0.2/word_divider.txt -c /usr/local/mgrep-4.0.2/CaseFolding.txt

Run queries on mgrep

Run queries against the mgrep directly, using a telnet connection to the mgrep server

telnet my-bioportal-vm 55555
  ANY entity

It should return something like this (if entity is cached for the annotator, which should be since it is in STY)

1882193402596741431	2	7	entity
Personal tools