Blue Collar Bioinformatics

Note: new posts have moved to Please look there for the latest updates and comments

Posts Tagged ‘rdf

Parsing GoPubMed RDF with Sparta

leave a comment »

GoPubMed provides a semantic layer on top of PubMed articles, linking them to GO and other ontologies and providing this metadata as RDF.

Starting out with RDF can be a bit hairy when dealing with the namespaces that prefix many of the URIs. Sparta is a python library that sits on top of rdflib providing nice shortcuts for accessing terms. For instance, if you have the following RDF namespace and term:

<rdf:RDF xmlns:dc="">

Instead of having to access this attribute with in rdflib, sparta allows you to do so with the shortcut dc_title. Here is a short example with GoPubMed which prints out a PubMed record and its title:

import urllib
from rdflib import ConjunctiveGraph as Graph
import sparta

url = '' + \
gopubmed_handle = urllib.urlopen(url)
graph = Graph()

graph_subjects = list(set(graph.subjects()))
sparta_factory = sparta.ThingFactory(graph)
for subject in graph_subjects:
    sparta_graph = sparta_factory(subject)
    print subject, [unicode(t) for t in sparta_graph.dc_title][0]

The Sparta library had not yet been updated to work with rdflib version 2.4, and also needed a few smaller fixes in matching URIs to attribute names. These are fixed in an updated version available here.

Written by Brad Chapman

December 21, 2008 at 1:38 am

Posted in semanticweb

Tagged with , ,