PATO:Pre vs Post Coordinating

From NCBO Wiki

Jump to: navigation, search

Contents

Background

See also

http://wiki.geneontology.org/index.php/Category:Cross_Products

There is also a draft of a paper in progress - email cjm to view

Reconciling pre and post coordinated phenotype descriptions

PRELIMINARY DOCUMENT! IN PROGRESS!


There are two paradigms for representing phenotypes and phenotype-like entities (1) use ontologies of pre-coordinated phenotypes, such as MP or plant_trait (PT) or (2) post-coordinate the phenotype description using terms from an ontology of qualities (eg PATO) and terms from ontologies of quality-bearers (eg GO, AOs, CL, ...); this is the EQ-annotation methodology

Both approaches have their advantages, and they are in fact entirely compatible, provided you follow the SOP laid out here. It is even possible to mix and match these approaches.

This methodology is applicable not just to "phenotypes" or "traits", however we choose to define these terms, but to any kind of Dependent Entity, such as diseases, syndromes, disorders and even roles.

The methodology hinges on the provision of Aristotelian Definitions of pre-coordinated phenotype terms in a computable format.

Below we provide some examples, taken from MP and PT

Defining Specific Phenotypes

Here are some examples of (somewhat trivial) aristotelian (genus-differentia) definitions of mammalian phenotypes:

If we take the MP term "big ears" (MP:0000017), this can be defined as the genus "large"/"largeness" (PATO:0000586) inhering in an "ear" (MA:0000236).

In OBO-1.2 syntax this can be represented as:

 [Term]
 id: MP:0000017
 name: big ears
 namespace: MPheno.ontology
 def: "outer ears of a greater than normal size" []
 is_a: MP:0002177     ! abnormal outer ear morphology
 intersection_of: PATO:0000586             ! large size
 intersection_of: inheres_in MA:0000236    ! ear

Of course, an ontology editor will not manipulate the syntax directly. This will be visible in oboedit in the "cross products" box

Note that our definition here is different from the text definition, which has the genus as "ear" and the differentia has "having a greater than normal size". We could write this in OBO syntax as:

 [Term]
 id: MP:0000017
 name: big ears
 namespace: MPheno.ontology
 def: "outer ears of a greater than normal size" []
 is_a: MP:0002177     ! abnormal outer ear morphology
 intersection_of: MA:0000236    ! ear
 intersection_of: has_quality PATO:0000586             ! large size

The first form is prefered, despite the fact the english phrasing is more contorted. The reason is that a phenotype ontology should define phenotypes, and not the entities in which phenotypes occur. The genus term is the primary is_a parent, and it makes sense for the phenotype ontology to be arranged on the quality axis, and not the anatomical axis.

If we go the second route, consider the problems that will be encountered defining terms like "sensitivity to chlorine"

Another way to record the same information is pheno-syntax. This is a convenient shorthand for combinatorial descriptions of phenotypes

E= MA:0000236 Q= PATO:0000586

Results: plant_trait

So far we've analyzed the plant_trait ontology. We refer to this ontology as PT ontology here.

Our focus is on phenotypes that are applicable to human health and disease; however, the PT ontology is easier to use to illustrate patterns that can also be applied to MP (Mammalian Phenotype)

Obol was used to generate prospective definitions for all PT terms - these were examined and fixed manually by a non-domain expert (CJM).

Downloading

Logical definitions file available at:

OBO-Edit

If you are using OBO-Edit, then you can paste the following URL directly into the obo-edit load box:

(you may want to use the 'advanced' option and allow dangling references)

The above .obo file contains just import statements - to pull in the relevant obo files, as well as the main xp file (this means you will need an internet connection, even if you download the file for use later)

OWL Editors

Not yet fully tested. You may need to allocate additional heap, as this imports a lot of ontologies.

Summary

Summary of Results as follows:

Standard EQ Terms

The standard PT term is a pre-coordinated EQ term; this can easily be defined as:

 A <Q> *which* inheres_in an <E>

ie a quality carried by a bearer entity

This can easily be represented as a logical definition; below is an example in oboformat

 [Term]
 id: TO:0000227
 name: root length
 namespace: plant_trait_ontology
 def: "Average maximum length of the root of a plant in a study." []
 synonym: "MRD"  []
 synonym: "MRL"  []
 synonym: "RTLG"  []
 synonym: "maximum root depth"  []
 synonym: "maximum root length"  []
 is_a: TO:0000043     ! root anatomy and morphology trait
 intersection_of: PATO:0000122     ! length
 intersection_of: inheres_in PO:0009005     ! root

Providing these definitions in oboformat has the following advantages:

- PATO definitions of qualities can be shared and reused

- The oboedit reasoner can keep PT in sync with PO and PATO, and perform automatic DAG placement of terms

- PATO terms can be used to query PT-annotated phenotypes

- PO terms can be used to query PT-annotated phenotypes (eg "halogen sensitivity" can be used as a query, even though this term is not in PT)

Sensitivity Terms

This is the other major class of terms. There are lots of sensitivity terms. These are represented using the definition pattern:

 sensitivity which is *towards* <chemical>

For example (again, in obo format):

 [Term]
 id: TO:0000029
 name: chlorine sensitivity
 namespace: plant_trait_ontology
 def: "Sensitivity to the chlorine content in the growth medium. Chloride helps regulate the correct pH (acid/alkaline) balance. This is a major electrolyte in the living cell besides sodium and potassium. It is available to the cell mainly in the form of NaCl and KCl salts." []
 synonym: "CHLORSN"  []
 is_a: TO:0000080     ! micronutrient sensitivity
 intersection_of: PATO:0000085     ! sensitivity
 intersection_of: towards CHEBI:23116     ! chlorine


Providing these definitions in oboformat has the following advantages:

- CHEBI definitions of qualities can be shared and reused

- The oboedit reasoner can keep PT in sync with CHEBI, and perform automatic DAG placement of terms when/if CHEBI changes

- CHEBI terms can be used to query PT-annotated phenotypes

Nutrients

See this tracker item:

Environment ontology

There is also a plant-centric environment ontology (EO). This could be generalised for other uses.

Some of the sensitivity terms referred to environments or parts of the environment that are not at the chemical/molecular level. EO was used to define some of these, for example:

 [Term]
 id: TO:0000188
 name: drought sensitivity
 namespace: plant_trait_ontology
 def: "Drought sensitivity is highly interactive with crop phenology, plant growth prior to stress, and timing, duration, and intensity of drought stress. For many soils, it takes at least 2 rainless weeks to cause marked differences in drought sensitivity during the vegetative stage and at least 7 rainless days during the reproductive stage to cause severe drought injury. Leaf rolling precedes leaf drying during drought. Repeated ratings are recommended through progress of the drought." []
 synonym: "DRS"  []
 synonym: "DRSN"  []
 synonym: "drought susceptibility"  []
 is_a: TO:0000394     ! drought related trait
 intersection_of: PATO:0000085     ! sensitivity
 intersection_of: towards EO:0007404 ! drought environment

these will need closer examination. Some of the EO terms could themselves be decomposed, eg soil alkilinity

Ratios, proportions and compositions

Many PT terms include ratios:

 [Term]
 id: TO:0000278
 name: root to shoot ratio
 namespace: plant_trait_ontology
 synonym: "RTSHRO"  []
 synonym: "r/s"  []
 synonym: "root/shoot ratio"  []
 is_a: TO:0000043     ! root anatomy and morphology trait

Course of action: we will add a term "proportionality" to PATO http://sourceforge.net/tracker/index.php?func=detail&aid=1606404&group_id=76834&atid=595654

 intersection_of: PATO:<<new-term-proportionality>>
 intersection_of: towards PO:0009005      ! root
 intersection_of: relative_to  PO:0009006 ! shoot

Some of the PT ratios can be expressed using the relational quality PATO:composition, which can take an additional 2 arguments:

 [Term]
 id: TO:0000372
 name: amylose to amylopectin ratio
 namespace: plant_trait_ontology
 def: "Ratio of amount of amylose to amylopectin content." []
 synonym: "AMYAMYPCTRO"  []
 is_a: TO:0000097     ! amylopectin content
 is_a: TO:0000196     ! amylose content
 intersection_of: PATO:0000025             !composition
 intersection_of: towards CHEBI:28102      ! amylose
 intersection_of: relative_to CHEBI:28057  ! amylopectin

Note the is_a links to amylopectin content and amylose content. Is this correct? In actual fact we have a reversal of magnitude here, as the magnitude of this parent increases, the magnitude of the child will decrease

Synonyms

PT uses lots of synonyms, but does not assign a scope (exact/broad/narrow). This makes it harder to glean the exact meaning of a term from the synonym.

Relative

PT uses terms like relative growth rate. It is not clear how this is different from growth rate. This seems to be a difference in how the term is used.

Disease resistance phenotypes

PT has many disease resistance terms; eg:

 [Term]
 id: TO:0000323
 name: stem rot disease resistance
 namespace: plant_trait_ontology
 def: "Causal agent: Magnaporthe salvinii (Nakataea sigmoidea, Sclerotium oryzae), and Helminthosporium sigmoideum var. irregulare. Symptoms: dark lesions develop on the stems near the water line. Small, dark bodies (sclerotia) develop, weaken the stem and cause lodging." []
 synonym: "CULMROTRS"  []
 synonym: "SR"  []
 is_a: TO:0000439     ! fungal disease resistance

These would be easy to define if we had an orthogonal ontology of plant diseases and infectious agents. The definition pattern would be:

 A <TO:resistance> which *towards* <InfectiousAgent>

This will be easier to do for mammalian phenotypes since these orthogonal ontologies are being developed. It is recommended the POC develops such a separate ontology for plants.

Assay-specific terms

Example: root dry weight

It is unclear how to proceed with these. No action was taken for these.

Conjunctive terms

Example: lemma and palea related traits

Example: lemma and palea pubescence

I think these should be defined as:

 a pubescence which inheres_in the lemma and inheres_in the palea

ie the necessary and sufficient conditions are that both the lemma and palea are pubsescent

can the same quality instances inhere in 2 entities?

Inconsistency of Term name syntax

Compare: "shrunken endosperm" with "leaf length"

In one case the adjectival noun (naming the quality) precedes the noun (naming the bearer entity); in other cases it succeeds it.

It is recommended all prefered terms follow the same lexical pattern (unless community usage or plain english trumps this).

Missing anatomy terms

Example: canopy temperature

"canopy" should be the name of a type in the plant_anatomy ontology

Use of "other"

Example: other miscellaneous trait Example: other nutrient sensitivity

OBO Foundry principles dictate avoidance of "other"

Use of word "quality"

PATO and PT's use of the word quality is not univocal. PATO uses the term to mean "property" whereas TO uses it to mean a more nebulous subjective(?) quality. Perhaps within the plant trait community the meaning of this term is more fixed, if so, it should be defined somewhere. Does it simply mean healthiness? If so, it would be good to use the same PATO term, at least as a synonym

 [Term]
 id: TO:0000587
 name: endosperm quality
 alt_id: TO:0000150
 related_synonym: "End" []
 related_synonym: "endosperm quality (sensu Poaceae)" []
 is_a: TO:0000162 ! seed quality

Redundancy

There is some redundancy with GO Molecular Function activity terms

For example:

 [Term]
 id: TO:0000284
 name: ADP glucose pyrophosphorylase activity
 namespace: plant_trait_ontology
 def: "Catalysis of the reaction: ATP + alpha-D-glucose 1-phosphate = diphosphate + ADP-glucose." []

A phenotype ontology should be orthogonal to an ontology of molecular function. It is not clear how the above term would be used in phenotype annotation.

There is a case for including malfunctions, loss of function and gain of function in a phenotype ontology, but this should be made explicit

Complex Phenotypes

It's too hard to even attempt to automatically generate definitions for these; logical definitions should be provided manually. For example:

 [Term]
 id: TO:0000169
 name: chinsurah boro CMS
 namespace: plant_trait_ontology
 def: "Abortion of microspore development at trinucleate stage" []

This could be defined in terms of "microspore development" (GO:0009555), "trinucleate stage" (term required in plant developmental stage ontology)

Other difficult cases

Example: photoperiod sensitivity

Errors and inconsistencies detected

The following were detected as a direct result of this approach

in PO, panicle is a synonym for inflorescence; in TO, panicle color is_a inflorescence color

 [Term]
 id: TO:0000201
 name: panicle color
 namespace: plant_trait_ontology
 def: "Variation in color of the panicle inflorescence." []
 synonym: "PNCL"  []
 is_a: TO:0000581     ! inflorescence color

similarly, panicle/inflorescence length, weight, shape

Results: Mammalian Phenotype (MP)

The MP ontology proved more difficult to analyze automatically - changes to Obol and/or manual editing of logical definitions may be required.

Downloading

Logical definitions file available at:

OBO-Edit

If you are using OBO-Edit, then you can paste the following URL directly into the obo-edit load box:

(you may want to use the 'advanced' option and allow dangling references)

The above .obo file contains just import statements - to pull in the relevant obo files, as well as the main xp file (this means you will need an internet connection, even if you download the file for use later)

OWL Editors

Not yet fully tested. You may need to allocate additional heap, as this imports a lot of ontologies.

Visualising Results

As well as oboedit and owl editors, the cross-products can be viewed in the experimental Obol branch of Amigo. For example, to see what terms are defined using the PATO term hypoplastic, see:

Here we can see the following terms defined:

  1. nasal bone hypoplasia
  2. maxilla hypoplasia
  3. mandible hypoplasia
  4. liver hypoplasia
  5. adrenal gland hypoplasia
  6. bulbourethral gland hypoplasia
  7. spleen hypoplasia
  8. forebrain hypoplasia
  9. telencephalon hypoplasia
  10. cerebellum hypoplasia
  11. pulmonary hypoplasia
  12. skin hypoplasia

Following the like to pulmonary hyoplasia shows you the term in the original MP context (a DAG) and also a "decomposed view" showing separate trees for the genus and differentium:

Image:amigo-pulmonary-hypoplasia.jpg

This page is just meant for illustration - for display to a biologist end-user this would be compacted; the upper levels of PATO would not be shown.

Term Syntax

The syntax of the term names in MP differs from PT; a common lexical pattern in MP is:

  • VALUE - ENTITY - ATTRIBUTE

Example: increased brown fat cell amount

One example of an MP term following the above pattern is:

  • "<increased> <brown fat cell> <amount>"

Which has the text definition: "increased amount of thermogenic tissue in the body that is composed of cells containing multiple small fat droplets"

As a first attempt at a logical definition:

  • genus: PATO:0000420 ("increased number")
  • differentia: inheres_in MA:0000057 ("brown fat")

Represented in obo format as:

 intersection_of: PATO:0000420                ! increased number
 intersection_of: inheres_in MA:0000057       ! brown fat tissue

However, this is not quite right, as it is not an "increased number" of "brown fat" tissue, it is a larger "portion size".

At the time of writing this PATO term is not defined so we're not sure if this is applicable.

(TODO)

There are many advantages to explicitly defining MP terms using PATO and MA in this fashion. Reasoners can be used to keep the ontologies in sync (either the oboedit reasoner or one of any number of owl reasoners). PATO definitions and MP definitions can be shared and reused.

Complex phenotypes in mammals

Many phenotypes such as "ovary hypoplasia" or "degenerate molars" are a single quality inhering in a single entity (or collection of entities - eg the collection of all molars in an individual organism).

For others such as "holoprosencephaly" or "osteoporosis", it may be an interconnected collection of qualities inhering in the same or different parts. We can still define these as conjunctions of EQs, but this is where we start crossing the grey line between the dependent continuants known as 'phenotypes' into the dependent continuants known as 'disorders'...

It is also important to distinguish between the definition, which captures the necessary and sufficient conditions, from the gloss/additional notes; eg MP defines holoprosencephaly as:

presence of a single forebrain hemisphere or lobe; often accompanied by a deficit in median facial development

Thus the logical definition should only refer to the part before the semicolon. (genus: PATO:having_a_single_part, differentia: towards MA:forebrain_hemisphere). The part after the semicolon ("often accompanied by...") is useful, but is statistical and are neither necessary nor sufficient so should be captured by other means.


Issue: species-specificity

Whilst the MP represents mammalian phenotypes, we do not have a common mammalian anatomical ontology. It will take some time before CARO reaches the level of granularity required.

We recommend that since the focus of MP is biased towards the mouse and mouse models of human disease [CHECK THIS STMT], we treat it is a mouse phenotype ontology, with a view to migrating the definitions towards some kind of FMA/MA synthesis when such an ontology becomes available.

Absence-oriented terms

The absence case is different from normal EQ phenotypes - the absence does not inhere in the entity that is absent, since this entity does not exist!

We treat these the same as "sensitivity to..." terms. The absence inheres in the organism as a whole (or the anatomical entity which is missing the part)

 [Term]
 id: MP:0000766
 name: absent tongue squamous epithelium
 namespace: MPheno.ontology
 def: "missing the scaly epithelial layer of the tongue" []
 synonym: "absence of tongue squamous epithelium"  []
 synonym: "loss of tongue squamous epithelium"  []
 is_a: MP:0000765     ! abnormal tongue squamous epithelium morphology

Could be defined as:

 An <lacking_of_a_part> which inheres_in <organism> and towards <tongue squamous epithelium>

The "inheres_in <organism>" part of the definition can be omitted

in obo format:

 intersection_of: PATO:lacks_part
 intersection_of: towards MA:tongue_squamous_epithelium

See PATO:Absent for more discussion on this

"Other" categories

MP contains some terms such as "other metabolic defect" - the use of "other" is to be avoided

Root term

The root of the MP is the term "phenotype ontology". There are relations such as "cellular phenotype" is_a "phenotype ontology". The root term should be changed to "phenotype".


Inconsistencies detected

A number of inconsistencies were detected.

Example of inconsistency between MP and OBO-Cell:

Image:oboedit-amacrine.jpg

Here the oboedit reasoner has found an unstated implied link (blue squiggly arrow) between 'abnormal amacrine cell morphology' and 'abnormal interneuron morphology'. In the split pane display we see the reason for this inference: OBO-Cell states that 'amacrine cell' is_a 'interneuron'.

The logical definitions will also be used for comparing phenotypes across organisms once OBD is populated.

Results: Worm Phenotype (WP)

Defining xps for the WP is still in preliminary stages

Downloading

Logical definitions file available at:

Results: Human Phenotype (HP)

Defining xps for the HP is still in preliminary stages

Example:

[Term]
id: HP:0000110 ! renal dysplasia
intersection_of: PATO:0000640 ! dysplastic
intersection_of: inheres_in FMA:7203 ! kidney

A few tracker items have been submitted to the HP tracker based on running the oboedit reasoner over HP-XP


Downloading

Logical definitions file available at:

Quick URLs:

Conclusions

Providing formal computable definitions of pre-coordinated phenotype terms in terms of basic qualities (PATO) and bearer entities (eg AOs) will make both styles of phenotype annotation commensurable, promote sharing of core ontologies and reusable ontology "building blocks"

More work is required to curate these logical definitions in phenotype ontologies. This preliminary analysis suggests some of this can be automated.

Personal tools