Annotation Summarizer

From NCBO Wiki

Jump to: navigation, search

This utility (RANSUM) provides a mechanism for interpreting ontological annotation data. There are two ways to use the software (a web and a REST interface). Each exposes the same two functions:

  • Processing annotations - in the form of (elementID, conceptID) pairs
  • Processing raw text - in the form of (elementID, "associated text") pairs

Note: processing files with more than several hundred concepts will take some time (with greater than a thousand lines, expect several hundred seconds). This applies doubly to processing of elementID - text pairs, as there is the additional overhead in annotating those strings.

Contents

Web (Browser) Interface

Process Annotations: Generate Summary for elementID - conceptID pairs

Parameters:

  • vid: the virtual ontology ID. A dropdown list of possible choices is available.
  • output: output format. Options are 'TagCloud' and 'XML'. XML output optionally includes background information from MEDLINE and/or Google.
  • file: '\n'-separated file containing '\t'-delimited (elementID, conceptID) pairs.
  • useMEDLINE: True or False. If True, the output for each concept will include information on concept frequency in MEDLINE (ie, the number of times text was annotated with that concept in the MEDLINE corpus).
  • useGoogle: True or False. If True, the output for each concept will include the number of results for a naive search for the given concept's name, as well as a very rough guesstimate of the total size of Google English-language index.

An example file, associating genes with concepts in the mammalian phenotype ontology.

Process Text: Generate Summary for elementID - text pairs

Parameters:

  • vid: the virtual ontology ID to use in annotation.
  • output: output format. Options are 'TagCloud' and 'XML'.
  • types: ','-delimited list of semantic types to use in annotation. If not specified, all will be used. For a list of available types, see http://ransum.stanford.edu/types/ or the NIH's Semantic Network documentation.
  • file: '\n'-separated file containing '\t' delimited (elementID, text) pairs.
  • useMEDLINE: True or False. If True, the output for each concept will include information on concept frequency in MEDLINE (ie, the number of times text was annotated with that concept in the MEDLINE corpus).
  • useGoogle: True or False. If True, the output for each concept will include the number of results for a naive search for the given concept's name, as well as a very rough guesstimate of the total size of Google English-language index.

An example file that can be annotated (suggest using SNOMEDCT or MeSH).

REST Interface

Note: all REST URLs are relative to http://ransum.stanford.edu/rest/

Process Annotations: Generate Summary for elementID - conceptID pairs

POST to /process/pairs/. Parameters:

  • vid: the virtual ontology ID.
  • output: output format. Options are 'TagCloud' and 'XML'.
  • lines: '\n'-separated list of '\t'-delimited (elementID, conceptID) pairs.
  • useMEDLINE: True or False. If True, the output for each concept will include information on concept frequency in MEDLINE (ie, the number of times text was annotated with that concept in the MEDLINE corpus).
  • useGoogle: True or False. If True, the output for each concept will include the number of results for a naive search for the given concept's name, as well as a very rough guesstimate of the total size of Google English-language index.

Process Text: Generate Summary for elementID - text pairs

POST to /process/text/. Parameters:

  • vid: the virtual ontology ID to use in annotation.
  • types: ','-delimited list of semantic types to use in annotation. If not specified, all will be used. For a list of available types, see http://ransum.stanford.edu/types/ or the NIH's Semantic Network documentation.
  • output: output format. Options are 'TagCloud' and 'XML'.
  • lines: '\n'-separated list of '\t' delimited (elementID, text) pairs.
  • useMEDLINE: True or False. If True, the output for each concept will include information on concept frequency in MEDLINE (ie, the number of times text was annotated with that concept in the MEDLINE corpus).
  • useGoogle: True or False. If True, the output for each concept will include the number of results for a naive search for the given concept's name, as well as a very rough guesstimate of the total size of Google English-language index.
Personal tools