LifeFormulae Blog » Page 'Effective Bioinformatics Programming - Part 2'

Effective Bioinformatics Programming - Part 2

Effective Bioinformatics Programming – Part 2

Instrumentation Programming

Instrumentation Programming usually concerns computer control over the actions of an instrument and/or the streaming or download of data from the device. Instrumentation in the Life Sciences covers data loggers, waveform data acquisition systems, pulse generators, image capture, and others used extensively in LIMS (Laboratory Information Management Systems), Spectroscopy, and other scientific arenas.

Most instruments are controlled by codes called “control codes”. These codes are usually sent or received by a C/C++ program. Some instrumentation manufacturers, however, have a proprietary programming language that must be used to “talk” to the instrument.

Some companies are nice enough to provide information on the structure of the data that comes from their instrument. When they don’t you may have to use good old “reverse engineering”. That’s where the Unix/Linux od utility comes in handy, because lots of time will be spent poring over hex dumps.

As you can tell, programming instruments requires a lot of patience. This is especially true if everything hangs or gets into a confused state. There is nothing you can do but recycle the power to everything and start over. This is usually accompanied by a banging of keyboards and the muttering of a few choice words.

Development Platforms or IDEs (Integrated Development Environment)

I have to mention development platforms as they can be useful, but also problematic. My favorite is Eclipse (http://www.eclipse.org). Originating at IBM, Eclipse was supported by a consortium of software vendors. Eclipse has now become the Eclipse open source community, supported by the Eclipse Foundation.

Eclipse is a development platform for programmers comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the lifecycle. You can find plug-ins that will enable you to accomplish just about anything you want to do. A plug-in is an addition to the Eclipse platform that is not included in the base package, like an Eclipse memory manager or a debugging a Tomcat servlet.

Sun offers NetBeans (“The only IDE you need.”). I used NetBeans (http://netbeans.org) at lot on the Mac. Previously, Sun offered StudioOne and Creator. I used StudioOne (on Unix) and Creator (on Linux). I haven’t worked with NetBeans lately because they’re currently mostly Swing-centric (GUI) development and are not fully JSF (java Server Faces) aware. NetBeans will make a template for JSF but doesn’t (as yet) provide an easy way to create a JSF interface.

There are two main problems with development platforms. For one, the learning curve is fairly steep. There area lot of tutorials and examples available, but you still have take the time to do it.

The best way to use a development platform is to divide the work. One group does web content, one group does database, one group does middleware (the glue that holds everything together), etc. Each group or person can then become knowledgeable in their area and move on or absorb other areas as needed.

The second problem with these tools in that you are stuck with their developmental approach.

You have to do things a certain way and adhere to a certain structure. Flexibility can be a problem.

This is especially true of interface building. You are stuck with the code the tool generates and the files and file structures created. With most tools, you have to use that tool to access files that the tool created.

IDEs can be useful in that they will perform mundane coding tasks for you. For instance, given a database record, the IDE can use those table elements to generate web forms and the SQL queries driving those forms. You can then expand the simple framework or leave as is.

Open Source/Free Software and Bioinformatics Libraries

There a lot of good an not-so-good Open Source code out there for the Life Sciences.

There are several “gotchas” to look out for, including –

Is the code reliable? Are others using it? Are they having problems?

Will the code run on your architecture? What will it take to install

What kind of user support is available? What’s the response time?

Is there a mailing list available for the library, package, or project of interest?

The are several bioinformatics software libraries available for various languages. All of these libraries are OpenSource/Free Software. Installing these libraries takes a little more that just downloading and uncompressing a package. There are “dependencies” (other libraries, modules, programs, and access to external sites) that must be resident or accessible before a complete build of these libraries is possible.

The following is a list of the most popular libraries and their respective dependencies.

BioPerl 1.6.1: Modules section of http://www.cpan.org/

Required modules:
perl               => 5.6.1
IO::String         => 0
DB_File            => 0
Data::Stag         => 0.11
Scalar::Util       => 0
ExtUtils::Manifest => 1.52

Required modules for source build:
Test::More    => 0
Module::Build => 0.2805
Test::Harness => 2.62
CPAN          => 1.81

Recommended modules:  some of these have circular dependencies
Ace                       => 0
Algorithm::Munkres        => 0
Array::Compare            => 0
Bio::ASN1::EntrezGene     => 0
Clone                     => 0
Convert::Binary::C        => 0
Graph                     => 0
GraphViz                  => 0
HTML::Entities            => 0
HTML::HeadParser          => 3
HTTP::Request::Common     => 0
List::MoreUtils           => 0
LWP::UserAgent            => 0
Math::Random              => 0
PostScript::TextBlock     => 0
Set::Scalar               => 0
SOAP::Lite                => 0
Spreadsheet::ParseExcel   => 0
Spreadsheet::WriteExcel   => 0
Storable                  => 2.05
SVG                       => 2.26
SVG::Graph                => 0.01
Text::ParseWords          => 0
URI::Escape               => 0
XML::Parser               => 0
XML::Parser::PerlSAX      => 0
XML::SAX                  => 0.15
XML::SAX::Writer          => 0
XML::Simple               => 0
XML::Twig                 => 0
XML::Writer               => 0.4

Some of these modules such as SOAP::Lite depend upon many other
modules.

BioPython 1.53:  http://biopython.org/

Additional packages:
NumPy     (recommended) http://numpy.scipy.org/
ReportLab (optional)    http://www.reportlab.com/software/opensource/
MySQLdb   (optional)    May be in core Python distribution.

BioRuby 1.4.0:  http://www.bioruby.org/

The base distribution is self-contained and uses the RubyGems installer.
Optional packages.

RAA:xmlparser
RAA:bdb
RubyForge:ActiveRecord and at least one driver (or adapter) from
   RubyForge:MySQL/Ruby, RubyForge:postgres-pr, or RubyForge:ActiveRecord
   Oracle enhanced adapter.
RubyForge:libxml-ruby (Ruby language bindings for the GNOME Libxml2 XML toolkit)

BioJava 1.7.1:  http://www.biojava.org/

biojava-1.7.1-all.jar:  self-contained binary distribution with
  all dependencies included.

biojava-1.7.1.jar:  bare distribution that requires the following additional
  jar files.  These are required for building from source code.
  Most are from http://www.apache.org/

bytecode.jar:                  required to run BioJava
commons-cli.jar:               used by some demos.
commons-collections-2.1.jar:   demos, BioSQL Access
commons-dbcp-1.1.jar:          legacy BioSQL access
commons-pool-1.1.jar:          legacy BioSQL access
jgraph-jdk1.5.jar:          NEXUS file parsing

Don’t forget to sign up for the mailing list for that library or libraries of interest to get the lastest news, problems, solutions, etc. for that library or just life science topics in general.

Software Hosting and Indexing Sites

There are several Software Hosting and Indexing Sites that serve as software distribution points for bioinformatics software.

SourceForge.net – Search on bioinformatics for a list of software available. Projects include:MIAMExpress - http://sourceforge.net/projects/miamexpress/

freshmeat– The Web’s largest index of Unix and cross-platform software

Bioinformatics Organization – The Open Access Institute

Open Bioinformatics Foundation (O|B|F) - Hosts Many Open Bioinformatics Projects

Public Domain Manifesto

In this time of curtailment of civil rights, the Public Domain Manifesto seems appropriate (http://www.publicdomainmanifesto.org/node/8). Sign the petition while you’re there.

This is the end of Part 2. Part 3 will explore more software skills, project management, and other computational topics.

Like this post? Spread the word!
delicious digg google
stumbleupon technorati Yahoo!

Leave a comment

You need to log in to comment.

Top of page / Subscribe to new Entries (RSS)