LifeFormulae Blog » Archive of 'Jan, 2009'

Common Sense to Live By No comments yet

As promised, given below is a list of bioinformatics “horror stories”.    These are a few of the situations and people I have encountered through the years.   Names have been changed to protect the innocent.

 

People Issues

 

The following are “people” problems.  (Just about every discipline has these denizens.)

 

1) Make sure the person selected for the task has the skills required for the task, or at least the desire to learn those skills.

 

We were sent a programmer, Joseph, who was underwritten by another lab, but who would be using our equipment.   We were to assist him in supporting a research lab’s website.  His duties consisted of capturing genomics data from various sites and data resulting from analysis of the research lab’s data.  This data was then to be displayed on the lab’s web site. 

He was given a workstation and access to our support literature and introduced to another programmer, Jay,  that he could turn to for assistance.  

I realized he was in trouble a couple of days later when Jay came to me and said that Joseph was having problems.  He had referred him to the books and manuals we had in the lab that gave him exact examples of what he was trying to do, but Joseph said he couldn’t understand them and that they were basically a waste of time.  Jay even worked out the code that Joseph needed, but Joseph said it didn’t work.  The code consisted of about 10 lines of pretty straight-forward Java.

It didn’t work because Joseph had several “typos” when he typed in the code and tried to compile it.

I tried to help him with batching URL retrievals, but he didn’t understand even after he said he had read the Perl books we had.  I ended up writing the five line of Perl code needed for the Simple HTTP  retrieval because the job had to get done.

I talked to the P.I. of Joseph’s home lab.  Seems that Joseph had a little experience programming on the Windows platform and wanted to learn more about bioinformatics.  The P.I. did acknowledge that Joseph really did have a tendency to get other people to “help” him do his tasks.

Joseph was sent back to his home lab.

 

2) Find out everything you can about the person you are considering

 

My P.I. hired a programmer that had sterling referrals from a previous labs.  Didn’t take long to find out that the lab was giving out glowing references because they wanted to dump this person, Jane, because they said she couldn’t program among other things.  It seems this was the case in all of the other labs she had been with.

I asked my P.I.  why, why??  He said that he thought we would be ones that would finally be able to develop her skills.

I said that I didn’t think so.  She didn’t know the basics, and she tried to cover by saying the other programmers in the lab were out to undermine her.  Consequentially, this caused a lot of unrest in the lab.

She was given a support role and eventually went to another lab.

 

Project Management

 

1)  Set goals

 

There was this five-year grant that was in it’s last year.  The first four years had been spent working out the program design, very little coding had been accomplished other than a demo or two.  I got involved in the final year because they needed to produce something that could be referred to as a product.

The group was situated in a small area of some four offices and three cubicles.  Every wall was covered with white board for my arrival– where we could design the final model!

Design is good, but know when to say enough is enough.

The project was never really finished. The grant was not renewed.

 

2) Let those who can help you, help you

 

I got involved in a project whose purpose was to accumulate data from various sources for storage in an Oracle database.

After determining the data required and gathering that data, I generated the six tables in UML (Universal Modeling Language) and subsequently the SQL that could handle the data.  One table was a sequence identification table, or a table that held the id associated with that sequence in various databases such as GenBank and ENSEMBL. 

One project member, a P.I. of another lab involved in the project, stated that she had read a book on SQL and she knew what to do.

Needless to say, she didn’t understand relational databases at all.

Instead of six tables, the database finally evolved into over 200 tables under her oversight.  Most of these tables were of two entries — an index and a sequence identification tag.

 

3) Ask around, someone might have a better way

 

I was asked to help by a lab who was having trouble with some code developed by a programmer who had moved on.  The lab technician who used the software said that it took 19 hours to assemble the data required to define the wells on a micro array plate.

I took a look at the code.  By using the NCBI toolkit, several Perl scripts, and a database,  I was able to reduce 19 hours to about 20 minutes.

The previous programmer  used this elaborate system of indexed GenBank reports.  By using the toolkit I was able to process the NCBI ASN.1 files directly.

 

Software Issues

 

1) Software has its limits

 

One lab was using FileMaker Pro for data storage.  This was okay at first, but at 500 files growing beyond a 2-Gb file limit, FileMaker was struggling.

Data access proved more timely ported to an Oracle database.

 

 2) Read the Manual

 

A sequence is a string of letters.  As such, there is only so much you can do in searching strings.  The word size of the search is limited.

One lab was analyzing a sequence against the entire genome of a selected organism using open source software.  This software wasn’t intended to search the entire genome, just short pieces of it. 

After the process took some 5 days to partly analyze just one sequence, the lab technician decided that this widely utilized open source  program had to be rewritten.

The request was declined.

 

3) Document your code

 

We were called in to save a pharmacology database developed in Access. The original developer used Access because he “sort of” knew how to develop input screens in Access.  The lab ran into trouble when the developer left. No one in the lab was able to take over the application and everyone else they asked to look at the project, left shaking their heads.  There was no documentation of record.

The data was ported to an Oracle database with web-enabled user input and reporting functions.

 

4) Verify that the process completed

 

One research group created a process that was to automatically archive the day’s research data to backup.  They assumed everything was okay, until they lost a hard drive and found out the the automatic nightly backup never happened because the filename, which explicitly listed the physical location of the data file, was too long for the archiving software.  The backup failed with an error message, but no one ever checked.

 

Some things you just can’t help

 

One morning I arrived at the lab and found everyone on the floor waiting for me.  They couldn’t access the server to read mail, etc.   

I opened the lab, looked in the direction of the server and found an electrical plug pulled out of a socket. 

It seems that the nightly housekeeping need an electrical outlet for the vacuum cleaner and the one that was used by the server was the handiest.

 

One More…

 

Our lab paid the institution’s IT department for a monthly back-up of our computers. 

One morning, I came in, and everything was dead.  I called our lab sys admin, told him to investigate.

Well, turns out IT hadn’t really done a back-up of our system in over 3 years.  Apparently, they tried over the weekend.  (Our lab sys admin wasn’t involved in the process, as he was subsidized by our department and not IT.)

At the start of the process, the date command produced the proper output. At the end of the process, the data command produced the output — no date command found  - anywhere. 

I forget exactly what got deleted or screwed up, but everything had to be rebuilt.

Luckily, I had used one of the seldom-used machines to mirror our data, etc. on a daily basis. So, once the machines were back (2 days), we were okay and didn’t lose much.

 At this time, the average life span of a sys admin in IT was around 6 months.

Theses are just a few of my encounters in field of the life sciences.  I won’t go into the ones from engineering,  but I’ve got some beauts — especially as a woman in engineering.

Computer Science Wild No comments yet

(I’m delaying the “horror stories” until next week, because I want to fully document them all.)

 

I ran across the phrase “computer science wild” at a recent conference.   I’ve got my own thought, especially since  the top 25 coding errors was released yesterday.  The link to the article  is - http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9125678&source=NLT_SIT&nlid=91.

 

I think any programmer should have the opportunity to write software that might kill someone, blow up an extremely expensive piece of equipment, or cause a waste of thousands of dollars because the system is down.  Maybe then they would think, write better code, and debug the software thoroughly before they released it into the wild.

 

The Ariadne V rocket blew up on take-off because the software didn’t contain an exception handler for buffer overflow!   (This translates to something like an array overflow. An overflow would trigger a programming mechanism that would write out the buffer contents and clear it.  The usual device is to transfer data capture to another buffer while the full buffer is written to i/o.)

 

The excuse for the disaster was that the specifications didn’t spell out the need for that programming mechanism.   An exception handler is a very basic mechanism for catching and correcting errors.  There is no excuse for this oversight.

 

One major project I worked on was acoustical (noise) testing of aircraft engines.  Our crew would go to some really great places like Roswell, NM, Moses Lake, WA, or Uvalde, TX.  We would record and analyze the noise of the engines as the aircraft flew over at different altitudes with variable loads in various approach patterns.

 

There were several pieces of software that had to work in tandem.  The airborne system, the ground-based weather station,  the meteorological (met) plane, the accoustic data analyzer, and the analysis station all had to work together to get the required results. 

 

There was no room for error.  Measuresments had to be exact, even out to 16 places after the decimal point. 

 

Modeling techniques, programming languages and IDEs (Interactive Development Environments) have become very sophisticated and complex.  A programmer today can “gee whiz” just about anything.

“Because we can” has become the norm.

 

This is great, but I’ve run into lab techs, etc. who were just this side of computer illiterate.  Like my dad, they adhere to a limited number of computer applications, accessed by a few key strokes or mouse clicks they have memorized.

 

And don’t think that engineers are immune.  They had to be drawn “screaming and kicking” away from their sliderules.

 

I’m for simple to start.  You can always add more “bells and whistles” as the system (and its users) matures.

 

 

 

“You can’t do Bioinformatics if you haven’t worked in a wet lab” No comments yet

The line “you can’t do bioinformatics if you haven’t worked in a wet lab”, has been used as the basis for the “you need to know where the data comes from” argument time and time again.   I actually saw this in print in a slide presentation at the Next-Generation Sequencing Data Analysis conference in Providence, RI, in September 2008. 

I can sympathize with this viewpoint, but I don’t agree with it.  For instance, I designed the data system, compiled the data, and did the field testing that certified a re-engined aircraft, but I can’t pilot a plane.   I did do a lot of field laboratory work and it was “wet” - if snow, sleet, and rain count, along with desert dust and volcanic ash.

Knowing where the data comes from is very important, but what is of more importance is whether or not the data is actually measuring what it is supposed to measure  — data validity (are your instruments correctly calibrated and is the sampling rate sufficient), what is the format of the data, what is the size of the data, and to what sort of analysis will the data be subjected.

If the lab experience is so very important, a simple systems analysis is a very good tool to use.  As I’ve done it, the observer/programmer/engineer would “live” in the lab for a period of time — usually two to four weeks, or until they have a good grasp of the processes involved, taking copious notes and asking lots of questions.  That person may actually perform some of the work involved if desired.
This person should have some understanding of molecular biology, etc. to fully appreciate the lab experience. 

This activity has the potential of illuminating possible bottlenecks or methods that may need modification or fine tuning.  If more that one site is involved, so much the better, as discrepancies in processes will be made obvious.

My biological wet lab experience got me a “you have excellent lab technique” and a job offer, which I declined.

Bioinformatics training also comes into question.  Many courses just help the student determine which internet site to go to for information, or how to construct a FASTA-formatted sequence, or parse a BLAST output or a GenBank report.  They can’t do much except offer a survey of things “bioinformatic”. Not much time is spent on information management or engineering approaches.

I jumped from engineering to bioinformatics in the early 90’s.  The object-oriented data model I presented apparently found an audience.  I did some reading up on genetics, etc. before the interview, but most of the knowledge used to answer interview questions such as, “what are the four basic building block of life”, came from watching X-Files.  Things have gotten a lot more complicated (the textbooks have gotten heavier), and keeping up with new discoveries can become quite a task.

Next week I will offer a series of “horror stories”, or some of my experiences in the bioinformatics arena.

Top of page / Subscribe to new Entries (RSS)