Blue Collar Bioinformatics

Note: new posts have moved to http://bcb.io/ Please look there for the latest updates and comments

Standard Ontologies in BioSQL

leave a comment »

A recent thread by Peter on the BioSQL mailing list initiated some thinking about formalizing ontologies and terms in BioSQL. The current ad-hoc solution is that BioPerl, Biopython and BioJava attempt to use the same naming schemes. The worry is that this is not documented, no one is likely in a big hurry to document it, and we are essentially inventing another ontology.

The BioSQL methodology of storing key/value pair information on items can be mapped to RDF triples as:

BioSQL RDF
Bioentry or Feature Subject
Ontology Namespace of predicate
Term Predicate term, relative to namespace
Value Object

Thus, a nice place to look for ontologies is in standards intended for RDF. Greg Tyrelle thought this same way a while ago and came up with a XSLT to transform GenBank XML to RDF, using primarily the Dublin Core vocabulary. On the biology side, the Sequence Ontology project provides an ontology meant for describing biological sequences. This includes a mapping to GenBank feature table names.

Using these as a starting point, I generated a mapping of GenBank names to names in the Dublin Core and SO ontologies. This is meant as a basis for standardizing and documenting naming in BioSQL. The mapping file thus far covers almost all of the header and feature keys, and more than half of the qualifier keys:

I would welcome suggestions for missing GenBank terms, as well as corrections on the terms mapped by hand.

Some notes on the mapping:

  • Cross references to other identifiers are mapped with the Dublin Core term ‘relation’. These can occur in many places in the GenBank format. Using a single term allows them to be flattened, with mapping values in form of ‘database:identifier.’ This is consistent with the GenBank /db_xref qualifier.
  • Multiple names or descriptions of an item, also stored in multiple places in GenBank files, receive the Dublin Core term ‘alternative.’
  • Organism and taxonomy ontologies are a whole project onto themselves, so I didn’t try to tackle them here.

Some other useful links for biological ontology mapping:

Written by Brad Chapman

December 14, 2008 at 9:40 pm

Leave a comment