Blue Collar Bioinformatics

Note: new posts have moved to Please look there for the latest updates and comments

Parsing GoPubMed RDF with Sparta

leave a comment »

GoPubMed provides a semantic layer on top of PubMed articles, linking them to GO and other ontologies and providing this metadata as RDF.

Starting out with RDF can be a bit hairy when dealing with the namespaces that prefix many of the URIs. Sparta is a python library that sits on top of rdflib providing nice shortcuts for accessing terms. For instance, if you have the following RDF namespace and term:

<rdf:RDF xmlns:dc="">

Instead of having to access this attribute with in rdflib, sparta allows you to do so with the shortcut dc_title. Here is a short example with GoPubMed which prints out a PubMed record and its title:

import urllib
from rdflib import ConjunctiveGraph as Graph
import sparta

url = '' + \
gopubmed_handle = urllib.urlopen(url)
graph = Graph()

graph_subjects = list(set(graph.subjects()))
sparta_factory = sparta.ThingFactory(graph)
for subject in graph_subjects:
    sparta_graph = sparta_factory(subject)
    print subject, [unicode(t) for t in sparta_graph.dc_title][0]

The Sparta library had not yet been updated to work with rdflib version 2.4, and also needed a few smaller fixes in matching URIs to attribute names. These are fixed in an updated version available here.

Written by Brad Chapman

December 21, 2008 at 1:38 am

Posted in semanticweb

Tagged with , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: