Blue Collar Bioinformatics

Note: new posts have moved to http://bcb.io/ Please look there for the latest updates and comments

Thoughts from BOSC 2009; Python in bioinformatics

with 6 comments

I’d like to share my thoughts on two major themes that emerged from my trip to Sweden for the 2009 Bioinformatics Open Source Conference (BOSC). I talked briefly on publishing biological data on the web; you can check out the slides on Slideshare. This lead to discussion with several folks from the open source and Python bioinformatics communities. The major themes of these conversations were: organization within the Python bioinformatics community and growth of a platform for developing web enabled applications.

Python in Bioinformatics

One of the unique elements of the Python bioinformatics community is that the work is distributed amongst several different packages. Unlike Perl, where many programmers regularly consolidate their code into the BioPerl project, the Python community has settled on a few different packages: Biopython, bx-python, pygr and PyCogent are a few popular ones. Instead of working with a monolithic code base, users can pick functionality amongst several choices.

This distinctive organization is not a bad thing. It avoids creating an unwieldy package which can be hard to maintain and re-factor. It allows programmers to explore solutions in ways better suited to their particular problems. Finally, it provides individual recognition for the hard work researchers put into building and maintaining reusable code.

What the distribution does mean is that the Python bioinformatics community needs to work harder at communication and coordination. One great idea was to write a grant to help bring Python biology programmers together for a conference and hacking session, ideally alongside a conference like BOSC or SciPy. This would provide an impetus for contributors to learn and discuss each others platforms. Beyond goodwill and community, the deliverables would be documentation and code contributing to integration between projects. This would enable scientists by lowering the learning curves to producing useful biology related code in Python.

Galaxy

My talk focused on developing small, reusable presentation and backend components for building web enabled applications. One really insightful question asked whether the community should focus on building a platform these components could be plugged into, or if components themselves would eventually evolve towards a larger structure. The first idea was taken very successfully by the Firefox browser with their plugin architecture, which allows end users to build amazing web interfacing applications within the browser. The second approach is the one I am more used to taking: build relatively smaller things that work, with an eye towards integration.

A discussion with James Taylor convinced me that it was worthwhile to take a longer look at the Galaxy project. Galaxy is an excellent web based front end to many bioinformatics programs and scripts, allowing biologists to put together analysis pipelines. They have a powerful public site which means using Galaxy requires no installation for many use cases.

The code is also publicly available and you can run your own local Galaxy instances, plugging in custom programs through their XML based tool interface. The architecture of the system is remarkably similar to the one I converged on for my work. It features a Pylons like backend, SQLAlchemy for databases interactivity, and jQuery for the javascript interface. Deployment to cloud infrastructure can be done on Amazon EC2.

We have installed Galaxy locally and are taking it for a spin for our data presentation tasks. The tool plugin interface works as described and we have had good luck integrating it with custom input types. I will be trying more complex integrations with custom display and more Python code on the backend and hopefully have future posts covering that. Generally, I hope Galaxy can serve as a platform in which custom presentation code can be built, distributed, and reused.

I’d be happy to hear your thoughts about either the biology Python community or Galaxy as a platform for web presentation work.

Written by Brad Chapman

July 19, 2009 at 8:13 pm

Posted in OpenBio

Tagged with , ,

6 Responses

Subscribe to comments with RSS.

  1. Galaxy needs two things to be really compelling:
    * the ability to run workflows from the command line
    * the ability to compose workflows from other workflows

    It also needs a lot of work in making workflows really usable such as named outputs, parameters, etc. Of course, you could say that I should contribute these myself, only the core team aren’t that responsive in dealing with simple patches (YMMV).

    James Casbon

    July 20, 2009 at 6:08 am

    • James;
      I have only played with the workflows a bit and am at the point of being happy they are thinking about the problem. From this naive viewpoint, I’d like to see integration with something like myExperiment (http://www.myexperiment.org/). Workflows are a hard problem and working towards a single solution makes a lot of sense.

      Responsiveness is a key component of Galaxy being a successful platform. This is the only way to engage the wider community by encouraging outside programmers to contribute. I know the difficulties of keeping up coming from Biopython, but it’s essential to doing it right. You can fork the code and provide patches as new issues on the bitbucket site (http://bitbucket.org/galaxy/galaxy-central/), which is a good start. But ultimately you need someone from the project to integrate, or at least recognize and discuss your ideas, in a timely fashion. I submitted a patch there for something I needed with the secondary goal of finding out how it would be addressed.

      Thanks for the thoughts; these are really interesting discussion points.

      Brad Chapman

      July 20, 2009 at 5:44 pm

  2. Should there be a Turbogears of bioinformatics packages for Python?

    Looking at SciPy’s “most wanted” page, it seems like none of these distinct bio-oriented packages are getting enough mindshare to keep Python programmers from duplicating the same basic functions:

    http://bio.scipy.org/wiki/index.php/Most_wanted

    The requested cookbook items are all core features of Biopython and PyCogent — I think something must be wrong if SciPy’s bio community is that far removed from Biopython’s user base. Is it a technical issue, where Pythonistas are repelled by Biopython’s API, or find the features inadequate? Or is it just a matter of evangelism? Neither seems insurmountable.

    Eric Talevich

    July 25, 2009 at 8:23 am

    • Eric;
      That’s an interesting link. Looking at the history it appears as if the last update was 2007, so hopefully this does not reflect current thinking about what is needed. My guess is that the proposal is due to a lack of familiarity with Biopython, rather than an intense dislike of the API.

      For the larger question, my take on the right answer is that the various projects need to be proactive in learning the other code bases. In the short term, this can be documentation and cross-project contributions. Medium term, it would be great to meet and discuss common problems and solutions to flesh out more cross-project links.

      Longer term, a meta-project as you are proposing may evolve from this communication. My sense is to let these evolve organically as trying to create something new will stretch the limited resources of Python bioinformatics programmers even more.

      Nice discussion,
      Brad

      Brad Chapman

      July 26, 2009 at 4:10 pm

  3. I also think that there must be wrong if SciPy’s bio community is that far removed from Biopython’s user base. Is for technical issue, where Pythonistas are repelled by Biopython’s API.

    Venus Hair

    April 28, 2011 at 8:07 am

  4. Hey Brad,
    It seems like most things that communication between groups sharing information and working together usually gets the best results, good luck with that!

    Daiwa Rods

    May 1, 2011 at 6:47 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: