Parsing GoPubMed RDF with Sparta
GoPubMed provides a semantic layer on top of PubMed articles, linking them to GO and other ontologies and providing this metadata as RDF.
Starting out with RDF can be a bit hairy when dealing with the namespaces that prefix many of the URIs. Sparta is a python library that sits on top of rdflib providing nice shortcuts for accessing terms. For instance, if you have the following RDF namespace and term:
<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"> </rdf:RDF> <dc:title>Example</dc:title>
Instead of having to access this attribute with
http://purl.org/dc/elements/1.1/title in rdflib, sparta allows you to do so with the shortcut
dc_title. Here is a short example with GoPubMed which prints out a PubMed record and its title:
import urllib from rdflib import ConjunctiveGraph as Graph import sparta url = 'http://www.gopubmed.org/GoMeshPubMed/gomeshpubmed/Search/' + \ 'RDF?q=18463287&type=RdfExportAll' gopubmed_handle = urllib.urlopen(url) graph = Graph() graph.parse(gopubmed_handle) gopubmed_handle.close() graph_subjects = list(set(graph.subjects())) sparta_factory = sparta.ThingFactory(graph) for subject in graph_subjects: sparta_graph = sparta_factory(subject) print subject, [unicode(t) for t in sparta_graph.dc_title]
The Sparta library had not yet been updated to work with rdflib version 2.4, and also needed a few smaller fixes in matching URIs to attribute names. These are fixed in an updated version available here.