Ontology Metrics

From NCBO Wiki
Jump to navigation Jump to search

Ontology Metrics in BioPortal

This page describes the metrics that BioPortal calculates for the ontologies in its repository. In general, BioPortal will calculate the metrics when the ontology is uploaded and will store it as part of the Ontology Metadata. There are two groups of metrics:

  • Statistical metrics
  • Quality-control and quality-assurance metrics

Each ontology may have all, some, or no values filled in for its metrics. Some metrics are meaningful only for ontologies in a specific representation language (e.g., number of axioms matters only for OWL ontologies; there are no individuals to count in the ontologies in OBO format).

Statistical Metrics

We calculate the following statistics on each ontology in BioPortal:

  • number of classes: the number of classes in the ontology. For OWL ontologies
    • does not include imported classes
    • includes only named classes (no anonymous classes in this count)
  • number of properties: the number of properties or slots in the ontology.
  • number of individuals: the number of individuals in the ontology. Note:
    • Individuals exist only in OWL and Protege ontologies.
    • An individual in OWL is a resource that is neither a class, nor a property.
  • number of axioms: the number of axioms in an OWL ontology.
  • maximum depth: the maximum depth of the hierarchy tree.
    • For OWL, RDFS, Protege, and RRF ontologies, we consider only the is-a relationship as a hierarchical relationship
    • For OBO format ontologies, we consider the following relationships as hierarchical relationships: is-a, has-part, inverse of develops-from
  • average number of siblings: the average number of siblings at one level in the tree.
  • maximum number of siblings: the maximum number of siblings in the ontology.
  • auditing information: a set of <author, classes> pairs that indicates which lists the classes that each author contributed to the ontology. We use the property that the ontology developers specify in their metadata to determine who is the author of the class.

Please, contact us at support@bioontology.org, if there are additional statistics that you would like to see.

Quality-Control and Quality-Assurance Metrics

We calculate the following metrics that may give some indication of the quality of the ontology and help ontology authors improve the quality.

  • classes with only one subclass: a list of classes that have only one subclass in the is-a hierarchy. While technically there is no problem in having only one subclass, this situation often indicates that either the hierarchy is under-specified, or the distinction between the class and the subclass is not appropriate.
  • classes with too many subclasses: a list of classes that have more direct subclasses than a certain threshold. The specific threshold is an internal parameter in BioPortal (currently set to 25). This situation may be an indication that additional distinctions and categorization are needed.
  • classes with no definition: a list of classes that have no value for the definition property. For ontologies in OBO and RRF formats, the property for definition is part of the language. For OWL ontologies, the authors specify this property as part of the ontology metadata (the default is skos:definition).