Blue Collar Bioinformatics

Parsing GoPubMed RDF with Sparta

leave a comment »

GoPubMed provides a semantic layer on top of PubMed articles, linking them to GO and other ontologies and providing this metadata as RDF.

Starting out with RDF can be a bit hairy when dealing with the namespaces that prefix many of the URIs. Sparta is a python library that sits on top of rdflib providing nice shortcuts for accessing terms. For instance, if you have the following RDF namespace and term:

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/">
</rdf:RDF>
<dc:title>Example</dc:title>

Instead of having to access this attribute with http://purl.org/dc/elements/1.1/title in rdflib, sparta allows you to do so with the shortcut dc_title. Here is a short example with GoPubMed which prints out a PubMed record and its title:

import urllib
from rdflib import ConjunctiveGraph as Graph
import sparta

url = 'http://www.gopubmed.org/GoMeshPubMed/gomeshpubmed/Search/' + \
      'RDF?q=18463287&amp;type=RdfExportAll'
gopubmed_handle = urllib.urlopen(url)
graph = Graph()
graph.parse(gopubmed_handle)
gopubmed_handle.close()

graph_subjects = list(set(graph.subjects()))
sparta_factory = sparta.ThingFactory(graph)
for subject in graph_subjects:
    sparta_graph = sparta_factory(subject)
    print subject, [unicode(t) for t in sparta_graph.dc_title][0]

The Sparta library had not yet been updated to work with rdflib version 2.4, and also needed a few smaller fixes in matching URIs to attribute names. These are fixed in an updated version available here.

Written by Brad Chapman

December 21, 2008 at 1:38 am

Posted in semanticweb

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: