OboInOwl:URIs

From NCBO Wiki
Jump to navigation Jump to search

URIs in the obo2owl mapping

OBO IDs have no defined syntax, but they are recommended to be of the form <IDSPACE> ":" <LOCALID>. Future versions of the obo format spec may get stricter. Currently all IDs in extant Obo files conform to the recommended form, with the exception of IDs for relations defined in the same ontology, and IDs for subsetsdefs. These IDs are "flat", as they contain no ":". Flat IDs are problematic, and their use will be clarified in future versions of the spec.

OBO IDs must be converted to URIs for use in OWL. The rules for converting OBO IDs to URIs in the current mapping are as follows:

IF the obo header declares an idspace mapping of the form:

 idspace: GO http://www.go.org/owl/

(note the trailing slash)

THEN all OBO IDs of form GO:0000001 will be mapped to URIs:

 http://www.go.org/owl/0000001

OTHERWISE all OBO IDs of form GO:0000001 will be mapped to URIs:

 <default-base-uri>GO/0000001

UNLESS the OBO ID is 'flat' (ie contains no ':'), in which case 'foo' will be mapped to:

 <default-base-uri>_global/foo

(flat IDs are problematic. Currently they only occur in some relation declarations. We will work towards deprecating these, and forcing people to qualify all IDs with an ID-space)

The proposed <default-base-uri> will move from the geneontology.org namespace to the bioontology.org namespace, and will be something like "http://www.bioontology.org/oboContent/".

This will give default URIs of the form "http://www.bioontology.org/oboContent/GO/0000001". However, GO will explicitly declare an idspace and other OBO ontologies will be invited to do likewise.

The current XSLT uses a default base URI of "http://www.bioontology.org/2006/02/". I'm relatively neutral as to the form of the URI. I always feel the date-oriented method gives a false impression that a resource may be outdated. I'm hoping some other NCBO members will chip in here. I'm mostly neutral on whether the final character is a hash or a slash. See below.

By declaring an idspace, ontology maintainers can decide for themselves on hash-vs-slash, and whether they want URIs to be dereferenceable.

For example, with an idspace declaration

 idspace: GO http://www.go.org/owl#

The OBO ID "GO:0000001" will be converted to:

 http://www.go.org/owl#0000001

There are two problems with the scheme above.

(a) In the RDF/XML serialisation, we cannot declare an XML namespace for the OBO idspace, because the local ID typically starts with a number, which is not a valid NMTOKEN. This is not a showstopper, as there is no requirement (AFAIK) to declare a namespace for URIs. We can use XML entity declarations to achieve compactness in the resulting document. I don't know enough about non RDF/XML serialisations - N3, Turtle, OWL abstract syntax - to comment on whether there will be problems here

(b) Protege treats everything after the # or / as the local ID, and having a numeric ID causes problems with the Protege display. SWOOP doesn't suffer from this problem.

Because of this problem, I considering an alternate rule. Here we would do some string hacking to make the URIs more conformant to the style typically seen in semantic web documents.

With the "underscore" URI rule, the IDspace would be concatenated to an underscore, followed by the local ID

For example, with the following header

 idspace: GO http://www.go.org/owl#

We would have URIs of the form http://www.go.org/owl#_0000001

And with the following header

 idspace: GO http://www.go.org/owl/

We would have URIs of the form http://www.go.org/owl/_0000001

The current XSLT allows this as an option by passing in the localid_prefix parameter.

I have not yet decided whether to make this the default. On the one hand, I find any kind of string hacking on the ID distasteful, it complicates round-tripping. One the other hand, it appears that if one deviates from common forms yet remains within the XML/RDF spec stack one still pays the price in unusual software behaviour.

Xrefs

Obo format also allows "xrefs" to external database resources. Unlike relationship tags, these have no fixed semantics. Sometimes it is used to indicate an analagous resource. These may not always be ontological resources - they can include a pubmed ID for example, or an instance of a GO curator(!).

Xrefs follow similar rules to OBO IDs. However, a different base URI is used. This by default is in the bioontology space for now, but it would be good if between GO, NCBO, BioPAX, BioMoby, LSRN and whoever else is interested in this kind of thing came up with a proposal. I don't think the service performing the obo2owl mapping should have to use some web-service to get this information. Rather, all xrefs should by default map to a central base URI which can dynamically resolve these.

For example

 http://xref.somewhere.org/foo/EC:1.1.1.1

In the absence of a somewhere.org we'll use biointology.org in the XSLT for now.

Hash-vs-slash

Using the idspace header tag, ontology maintainers can choose whether to make their URIs hashy or slashy. The choice depends partly on what the function of a URI is, and what the URI should dereference to, if anything, and the method of dereferencing.

If a URI is dereferenceable by a human, it may be inappropriate to use

  1. , since dereferencing in a web browser will force loading of the

whole ontology (which may be huge). If the URI is intended to be dereferenced by a machine, one that specifically expects OWL, then it makes sense to return the whole ontology and not just an OWL document with one class.

One nice thing about the LSID standard is that is provides a clear separation between these use cases. We can still do the same thing with http URIs, but we need consistent ways of implementing this.

Ontologies lacking an idspace header will get assigned a default IDspace, probably in the bioontology.org domain. The NCBO will make the decision as to whether this will be hash-or-slash.

Comments on best practices welcome.

I have added some examples of OWL exported using various policies:

using a hash-based URI:

  • llm-hash.owl
  • llm-hash-with-prefix.owl (with a _ in the local part of the ID)

using a slash-based URI:

  • llm-slash.owl
  • llm-slash-with-prefix.owl (with a _ in the local part of the ID)

These can all be found in go-dev cvs

http://geneontology.cvs.sourceforge.net/geneontology/go-dev/xml/examples/