MMI Guideslines for URIs

From NCBO Wiki
Revision as of 12:03, 11 November 2008 by Graybeal (talk | contribs) (final for now)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This document is largely a copy of a page of the MMI recommendations for ontology providers, a document that we created to both document our own practices, and make our approaches public in case they prove useful elsewhere (or our own community has other opinions). It is offered on this site with the permission of the BioPortal team, to get additional reviewers, and possibly offer useful input to the BioPortal community. Your comments are welcome, either here or at the MMI site.

Last updated from the original 2008.11.11.

Introduction

As noted in the previous MMI guide, MMI recommends URLs to uniquely identify ontology resources. The use of URLs provides an intuitive and self-contained link to additional resource information, essentially allowing it to be self-documenting as fully as desired.

This document provides additional suggestions on how to create URIs that provide a good mix of utility, transparency, and maintainability. To some extent these traits work against each other; for example, the most maintainable URIs may be those that are semantically opaque (they appear meaningless), but these are relatively difficult for humans to work with (see below). We have tried to identify the best possible long-term benefit in our approaches to creating URIs; only time will tell if we have been successful.

URIs: Opaque vs Transparent, Label vs Concept

While many people recommend an 'opaque' format for the URI, in which the meaning of the URI is not apparent from its form (see for example [10]), we have chosen to recommend URIs constructed according to a semantically meaningful format, as described below. While there are definitely some portability and persistence costs to this approach, we believe that when representing semantic terms, usability and social benefits are more important at this stage of semantic web development. We also consider that the costs of this approach over the long term can be mitigated by other means (for example, ensuring the persistence of the URIs themselves).

Note that there is an important semantic model implicit, and explicitly embedded, in this approach. The concepts represented by resources on the web may change their name; for example, the company called "Apple Inc." may become "Apple", even though the company itself is unchanged. In that case, it is important that a unique persistent resource for the Apple company continue to represent the company itself, and that only the label corresponding to that resource should change. However, in the case of the vocabularies MMI represents, we have concluded that it is in fact the label itself that we are documenting and creating a web resource for: the string for a given term, say "sst", is a significant object in and of itself. Therefore, we associate the resource for a term in a given vocabulary to the string object "sst" that is that term. Over time, the meaning associated with this object may change; "sst" may come to mean "saline solution temperature" rather than "sea surface temperature". Since the resource identifies the label itself, not the meaning, the resource (and corresponding URI) will now relate to a new meaning associated with "sst".

If you want your vocabulary to be constructed around codes or opaque strings, rather than meaningful strings, you can still do that while using our repository service. Simply make the desired code the instance of the term, and provide a corresponding label in addition to the definition. Then the label can change over time, according to usage.

If disambiguation is required, it is possible with our approach, using one of several conceptual facets that are incorporated in the URI, and described below.

While we expect this process can be supportive of terms that embed http-unfriendly characters (e.g., accents, unicode, or '/'), we do not look forward to proving that. Therefore, we only support basic ASCII characters in our current implementation of elements used to make up ontology and term URIs.

URL Construction for Ontology Files

The URL representing an ontology file should follow the following basic scheme (some modifiers are described later):

http://{hostdomain}/{ontologiesRoot}/{authority}/{version}/{resourceType}.owl

For an RDF or SKOS file, the scheme may be modified to replace .owl with .rdf or .skos, depending on the goals of the provider. MMI intends and suggests using .owl for any ontology file that does not conflict with the OWL specification, as this distinguishes these files most clearly from more general-purpose RDF files.

An example of this URL scheme is:

http://mmisw.org/ont/mmi/20080706/platform.owl The following section discusses these components.

URL Construction for Ontology Files: Additional Recommendations

These recommendations apply to the assignment of URLs to ontology files. (Although in principal URNs could also be assigned to ontology files, current practice rarely if ever does so.)

A) Include the file extension, since this will help search engines to access the ontology files. For example typing “buoy filetype:owl” in Google will query ontologies with the term ‘buoy’. So, if the ontology is expressed in OWL use “.owl” (even though this is not an approved MIME type). RDF files may be expressed as .rdf, but if it is a vocabulary file compatible with the OWL format, MMI suggests ending it with .owl. So form 1 below is preferred over forms 2 and 3:

1) http://mmisw.org/ont/mmi/200807/platform.owl <-- PREFERRED 2) http://mmisw.org/ont/mmi/200807/platform 3) http://mmisw.org/ont/mmi/200807/platform.xml

B) Include version number, using for example the “YYYYMM” (year and month) pattern. Even though some suggest putting the most significant elements toward the left side, we recommend placing the “YYYYMM” just before the ontology name. This is because the short name of the ontology will be placed at the end of the path, matching the file name of the ontology (and allows the ontology name to gracefully end in .owl, as recommended above). It also follows the W3C pattern. So the first case is preferred over the second case:

1) http://mmisw.org/ont/mmi/200807/platform.owl <-- PREFERRED 2) http://mmisw.org/ont/mmi/platforms/200605

If for any reason year and month are not sufficient to identify the file version, it is acceptable to extend the pattern through components of DD.hhmmss as needed.

C) The file name should encode information about the resource type (please see the Resource Types section below for more details) or other distinguishing characteristic, and can include the authority as a prefix for clarity. The name should not contain spaces, and if it contains an authority, should contain at least one underscore “_” that separates the authority from the object type. The use of “-“ inside the name is not recommend since it will sometimes confuses search engines. (The character“-“ means exclusion, and even if the string is searched as a quoted string (“word1-word2”) you are not guaranteed to get pages with that exact string. For example, searching for “"moored-buoy" in Google, returns not only pages containing “moored-buoy”, but also pages containing “moored buoy” and “moored. Buoy”. This doesn’t occur if we use the underscore character “_”.)

D) If an ontology is replaced by another one, use the Dublin Core [6] element isReplacedBy (http://purl.org/dc/terms/) to inform the user or the program about the new ontology. However, if an OWL document is to be created, an ontology that contains the Dublin Core concept term, should be imported, if not the ontology will not be valid. The property should go inside the ontology element. An RDF vocabulary or a SKOS vocabulary can use the Dublin Core property without any problem. If SKOS [9] is used, the property tag should go inside the ConceptScheme element.

E) The 'auth' may be tailored for circumstances as required. For example, if three ontologies from an organization must have the same name, the organization may create a separate 'authority' with the ontology repository for each ontology. Even if all 3 ontologies are in fact run by the same organization, the different auth fields can map to the same entity. This technique also can support organizational hierarchies or separate managers within a single organization. However, note that some organizations may prefer to assign managers on a per-ontology basis, while keeping the overall 'authority' the same for all those ontologies, so the authority does not necessarily map directly to the person with privileges to change the ontology.

F) Techniques for resolving duplicate terms in vocabularies are beyond the scope of this proposals. In brief, each term in the ontology must be unique, and in many vocabularies this may be the case even for terms in many different hierarchical branches. However, if the vocabulary has the same term name appear in different hierarchies, it is necessary to disambiguate the different terms. Possible techniques for doing so include (a) specifying the complete path as part of the term name; (b) specifying path components in reverse order until disambiguation is achieved; (c) using a specific numbering or appending scheme that distinguishes specific terms, but can be easily disregarded by viewers. We prefer approach (b), as it is the most intuitive while being the least intrusive. We suggest '__' as a component separator. In any case, the definition of the term should include the original name of the term in a [TBD] attribute. We are not aware of any technique which consistently and unambiguously translates vocabulary term names into graceful unique names, and then reverses the process.

G) Creating consistently formed term names from unique vocabulary strings is also not explicitly addressed by this document. Briefly, we recommend terms be formed as camel case, without underscores or dashes, or other characters outside [A-Za-z0-9], and that they start with a letter. (Spaces and slashes are particularly unfortunate characters to embed in term names, or any other component of the URI, as these must be escaped in URLs.) These forms are easily translated into URL components and other concepts on the web. If our recommended forms are not followed, substitution is usually necessary, and as noted above, the original name of the term should be preserved in a [TBD] attribute.

URL Construction for Ontology Files: Backus-Nauer Specification

In Backus-Nauer form, the ontology resource specification is represented as follows:

<MMI-URI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/” <authority} "/" <version> “/” <resourceType> ".owl"

<version> := <ShortISO8601> | <NumberVersion> <ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDD.hh> | <YYYYMMDD.hhmm> | <YYYYMMDD.hhmmss> <NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>

In these schemes (and the ones in the following section), the following explanations hold:

hostdomain is the URL owned by the administering authority of the repository, and typically resolves to the server hosting the ontology. For example: marinemetadata.org

ontologiesRoot is the node of the host that serves as the base of the ontologies; typically it is a directory where the ontologies live. For example: /ont.

authority is a code representing an administrative authority responsible for the particular set of terms. This may be the repository authority, or may be a different authority on whose behalf the repository is acting. For example: 'mmi' or 'cf' may be appropriate authority values. The authority is meant to provide a reference to the actual authority, not be a one-to-one mapping with the authority; their may be multiple authorities in a given organization: noaa, noaa1, noaa_dif, and so on.

version represents the publication timestamp (or part of it) of the resource or a version control number; the MMI default will be an abbreviated publication date.

resourceType is the type of objects that are being represented by the vocabulary (as categorized by the vocabulary authority). Some examples are parameter, ucum_units, agu_index_terms and mmi_platforms. The authority may or may not be reflected in this term, which often is effectively the name of the ontology file.

Additional disambiguation may be necessary for a resourceType, for example for different types of files dealing with the same resource type, or multiple organizations dealing with that resource: Disambiguation term: If the file name conflicts with another, add a number or term to distinguish it, for example '_01' or '_marine':

http://mmisw.org/ont/mmi/200807/platform_marine.owl

Mapping files: If a file is a mapping file for a given resource type, append '_map' at the end of the name of the resourceType:

http://mmisw.org/ont/mmi/200807/platform_marine_map.owl


URL and URN Construction for Terms

URL for a Term

The URL representing a term should follow the following scheme:

http://{hostdomain}/{ontologiesRoot}/{authority}/{version}/{resourceType}/{shortName}

An example of this URL scheme is:

http://mmisw.org/ont/mmi/200807/platform/moored_buoy

More information is available in the next page about resolving this URL scheme.

Backus-Nauer of Term URL

In Backus-Nauer form, this is represented as follows:

<MMI-URI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/” <authority} "/" <version> “/” <resourceType> "/" <shortName>

<version> := <ShortISO8601> | <NumberVersion> <ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDD.hh> | <YYYYMMDD.hhmm> | <YYYYMMDD.hhmmss> <NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>

URN for a Term

If expressed as a URN, this scheme becomes:

urn:x-term:authority:version:resourceType:shortName

This latter proposition is particularly notional, as it depends on the existence of a URN namespace called x-term (eventually 'term'), which has not been proposed to IETF. There are multiple existing namespaces which can potentially be used today for expressing URNs for terms. The Open Geospatial Consortium (OGC) in particular has obtained the 'ogc' urn namespace to enable this functionality. However, protocols for easily obtaining custom names in this namespace have not been fully established. For this reason, using URNs for terms remains primarily a future development.

URLs Involving Versions

URLs for the Unversioned Resource (Ontology or Term)

The URLs above point to a specific version of an ontology or term. In many cases, including most mapping exercises and many semantic inferencing exercises, the desired resource should be constant, even as specific aspects of that resource (definition, preferred label, or even its semantics) may change. A non-semantic example: "the latest New York Times content" is available on the web at http://nytimes.com, even as the content itself changes over time. (Note: We are not saying the URI for this resource is the URL of their web site; the URI of the resource for the concept described as "the latest New York Times content" may be entirely different. ) Since the URI for the term changes whenever its version changes, how do we accommodate the need for a resource for the concept "today's meaning of the term 'sst'"?

We call this the unversioned form of the resource. To create an unversioned form of an ontology or term URI, simply delete the version string (and slash) from the original URL. So, for example, the unversioned form of http://mmisw.org/ont/mmi/200807/platform/moored_buoy is

http://mmisw.org/ont/mmi/platform/moored_buoy

Note: For this to be unambiguous, the version can never begin with an alphabetic character, and the resourceType can never begin with a digit. We are enforcing this by convention at this time.)

To be clear: The resource that is identified by this URI is the term moored_buoy in the MMI platform ontology. Even if the definition for this term changes, the URI represents the same resource, and the concept of that resource is the term represented by the string 'moored_buoy'. If the spelling of this term changes someday, there will be a new resource called 'moord_buoy', with a new URI. If this spelling is no longer needed and no longer has a meaning, it will be deprecated, but still available as a term in the ontology and a corresponding resource.

Special note about unversioned ontologies: If an unversioned ontology is requested, the user will receive the ontology containing all its terms (including deprecated terms), but in unversioned form. This allows convenient mappings to be created to all the unversioned terms of an ontology. If the desired result is the most recent ontology, see the next paragraph.

When mappings are made to these unversioned resources, it is understood that the mapping is intended to persist through all versions of the ontology. This has the effect of mapping the most current meaning of a term (=label) to the most current meaning of another term (label), even if those meanings change significantly over time.

URLs to Specify the Most Recent Version of an Ontology or Term

It is often the case that a user wants to obtain the most recent version of a given ontology or term. In this case, the version string is replaced by '$'. (The resourceType can never begin with a dollar sign, so confusion is unlikely.) Thus, these requests should obtain the most recent version of the corresponding ontology or term:

http://mmisw.org/ont/mmi/$/platform.owl

http://mmisw.org/ont/mmi/$/platform/moored_buoy

The ontology or term returned by this URL should contain metadata necessary to establish which version it represents.

Note that while this could also be considered a unique concept (resource) represented by a web URI, we are not declaring this is a resource. In particular, it should not be mapped in ontologies.