Blue Collar Bioinformatics

Note: new posts have moved to http://bcb.io/ Please look there for the latest updates and comments

Talking at BOSC 2009 about publishing biological data on the web

with 3 comments

The Bioinformatics Open Source Conference (BOSC) is taking place later this month in Stockholm, Sweden. I will be attending for the first time in a few years, and giving a short and sweet 10 minute talk about ideas for publishing biological data on the web. BOSC provides a chance to meet and talk with many of the great people involved in open source bioinformatics; the schedule this year looks fantastic. The talk will be held in conjunction with The Data and Analysis Management special interest group, also full of interesting talks.

The talk will promote development of reusable web based interface libraries layered on top of existing open source projects. The PDF abstract provides the full description and motivation; below is a more detailed outline based on some brainstorming and organization:

  • Motivation: rapidly organize and display biological data in a web accessible format.
  • Current state: reusable bioinformatics libraries targeted at programmers — Biopython, bx-python, pygr, PyCogent
  • Current state: back end databases for storing biological data — BioSQL, GMOD
  • Current state: full featured web applications targeted at users — Galaxy, GBrowse
  • My situation: biologist and developer with organized data that needs analysis and presentation, internally with collaborators and externally with larger community.
  • Proposal: integrate bioinformatics libraries, database schemas, and open source web development frameworks to provide re-usable components that can serve as a base for custom data presentation.
  • Framework: utilize cloud infrastructure for reliable deployment — Google App Engine, Amazon EC2
  • Framework: make use of front end javascript frameworks — jQuery, ExtJS.
  • Framework: make use of back end web frameworks — Pylons
  • Implementation: Demo server for displaying sequences plus annotations
  • Implementation: Utilizes BioSQL schema, ported to object oriented data store; Google App engine backend or MongoDB backend
  • Implementation: Data import/export with Biopython libraries — GenBank in and GFF out
  • Implementation: Additional screenshots from internal web displays.
  • Challenges: Generalizing and organizing display and retrieval code without having to buy into a large framework.
  • Challenges: Re-usable components for cross-language functionality; javascript front end displays for multi-language back ends.
  • Challenges: Build a community that thinks of reusing and sharing display code as much as parsing and pipeline development code.

I would be happy to hear comments or suggestions about the talk. If you’re going to BOSC and want to meet up, definitely drop me a line.

Written by Brad Chapman

June 11, 2009 at 7:41 am

3 Responses

Subscribe to comments with RSS.

  1. Hi Brad,

    It look’s like a great talk, however, I want to suggest that you at least touch on how you license data you publish. I realise this is the *open source* conference but I think *open data* would be interesting to the members, since IMHO a lot less is known about this.

    If you want a quick guide, I suggest you head over to the Open Knowledge Foundation’s page on the subject here:
    http://wiki.okfn.org/OpenDataLicensing

    Disclaimer: I am on the board of OKFN.

    cheers,
    James

    James Casbon

    June 15, 2009 at 7:57 am

    • James;
      Really good idea. How do you feel about the Creative Commons options for open licensing of data:

      http://creativecommons.org/about/licenses

      I like their presentation of the licenses, as it makes it easy for the law disinclined among us. Are there things not covered by their choices that are commonly used for open data? Starting this discussion is a great idea, and I’ll definitely include a slide touching on it.

      Brad

      Brad Chapman

      June 17, 2009 at 7:49 am

      • Creative commons are certainly a good set of licenses and, as you say, the page is very clear.

        The problem with CC from the data point of view is that CC is designed for content and not data. This is a problem since databases have special terms under (at least) EU law.

        It is only ten minutes – so I would stick to the fact:
        * you should license your data
        * good licenses are available from CC open data commons (http://www.opendatacommons.org/licenses/)

        Best of luck with the talk, I am going to see if I can persuade my company to stump up the registration fee!

        James Casbon

        June 17, 2009 at 8:50 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: