CTS2 BioPortal wrapper
This page contains a summary of the current state of the CTS2 / BioPortal wrapper. It covers the goals of the project, the approach and methodology that was used and finishes with a summary of the current state of the project, a discussion of some of the issues that were encountered and a list of what remains to be resolved.
The NCBO BioPortal was created "to access and share ontologies that are actively used in biomedical communities." To meet this goal, BioPortal has developed the BioPortal REST API, which can be used to access BioPortal using http. One of the primary applications of this API are web browsers that can use Ajax widgets to browse and access ontology content for a variety of uses (see Using_NCBO_Technology_In_Your_Project).
The Common Terminology Services 2 (CTS2) specification was created in response to a set of requirements published by Health Level Seven (HL7) and an RFP that was issued by the Object Management Group. This Platform Independent Model (PIM) was designed to be fully compatible with Fielding's notion of the RESTful Architectural Style and one of the key Platform Specific Models (PSM's) is based on http/REST. The model, documentation, schema and WADL can be found on the home page of this wiki.
The NCBO community believes that it will be advantageous to be able to access the BioPortal content through both the existing BioPortal REST API and, where appropriate, the nascent CTS2 REST API. The BioPortal API was used as an one of the inputs to the CTS2 specification. The CTS2 specification was heavily influenced by the LexGrid terminology model and the LexEVS service specification and LexEVS is one of the back end components of the BioPortal implementation. There were, however, decisions made in the CTS2 specification that weren't fully compatible with the existing BioPortal model.
The purpose of this project was to create a mapping between the existing BioPortal API and the corresponding components of the CTS2 REST specification to determine where potential issues and incompatibilities may lie and to use the results of this evaluation to determine (a) the best approach would be to creating a complete, robust CTS2 REST wrapper (b) uncover errors and omissions in the CTS2 specification and (c) to come up with recommendations about how the BioPortal REST API might be enhanced or improved.
We began by gathering a list of the key BioPortal resources - Ontology, AbstractConcept, Class, Property and Instance along with various lists. Lacking a formal XML Schema for these resources, we used a combination of sample content from the REST service and the java bean classes for each of the resources to assemble lists of the properties for each of these resources, their types and, where it could be determined their cardinality. We went through these lists, gathering sample input from the REST API - both in form of lists of elements and individual elements.
There were a number of conceptual issues that were uncovered in this process, including:
- CTS2 has a notions of Code System and Code System Version. While BioPortal has similar concepts - "virtual ontology" and "ontology" in the ontology interface and "ontology" and "ontology version" in the search interface, the "virtual ontology" has no attributes besides its identifier.
- BioPortal treats both full ontologies and subsets derived from full ontologies as instances of "ontology". Lists and queries apply to both types of resource - a list of the latest version of ontologies returns both the ontologies themselves as well as all subsets. Similarly, term queries return both the ontology in which the term is defined and any subsets that include that term. CTS2 treats Code Systems (ontologies) and Value Sets (subsets) as separate resources. Lists and queries go against one or the other resource but not both.
- BioPortal assumes that all terms are instances of exactly one of "class", "property" or "instance". CTS2 allows entity (the equivalent of "term") to exist without making this distinction. In addition, while the CTS2 REST model does not clearly show how this could be done, the intent of the CTS2 model is to allow an entity to simultaneously be a Class and Instance, Class and Property, etc.
These documents were then used to construct a CTS2 REST Server that used the BioPortal REST services as the back end implementation. In addition, we took a number of the interesting BioPortal Ajax widgets and modified them to use the CTS2 REST api instead. A synopsis of the REST services that were implemented can be found here and a list of the translated Ajax Widgets can be found here .
As expected, a number of issues were encountered in this process including:
- This is the link to the subset of the CTS2 rest service was implemented is in the CTS2 BioPortal wrapper
- Resource/Resource Version mismatch - discussed earlier
- MetaOntology and MetaOntology mapping - BioPortal has several enumerations (Category, Group, Status) that aren't first class ontologies and, even if they were, might be better served were they drawn from OMV or a similar resource
- URI's - BioPortal has its own URI's but many of these ontologies have one or more "official" URI's drawn from outside sources.
- REST hyperlinks - one of the key aspects of the REST architectural style is to provide the ability to navigate the web of resources without having to know how to construct URI's. As an example, an entity reference in CTS2 (e.g. http://informatics.mayo.edu:19280/exist/cts2/codesystem/SCT/version/SCT_2010_01_31/entity/900000000000002006) carries in it a link to both the code system and the code system version in which it is described. Similarly, the descriptionType, language and any other attribute that references an ontology component has the potential for containing a hyperlink. Constructing some of these hyperlinks from the BioPortal REST service can be non-trivial.
The proposed next steps would be to review the mapping and determine whether to: (a) Complete the remaining tasks using the current wrapper paradigm (b) Re-implement the interfaces against the lower level BioPortal interfaces and databases (c) Produce a hybrid for the time being and focus on an RDF based implementation.