LifeFormulae Blog » Posts for tag 'tar'

A Valuable Lesson No comments yet

There once was a web portal called the Search Launcher to which I dedicated 4 years of my life.

It sort of fell into my lap. The only orientation I got was the basic directory structure. The supporting csh scripts, programs, and databases (and the programs that created those databases) I had to discover on my own.

It took me about a year to get it all under control. I think I actually made things better. I was used to dealing with tons of data, being on call at all hours, and minor things like distributed and parallel processing. I had the welcoming server off-loading compute-intensive processes to another server at a faster machine, as NFS was proving too flaky. I had huge databases distributed across many disks according to their size. I had website client programs on the Mac and PC that a researcher could run from a local machine.

I would work late and late on weekends during the off-hours when the load was lighter. If I had something really crucial to do, I telecommuted because I didn’t want any distractions.

Things ran smoothly, but there was the occasional hiccup. Mostly I monitored things and planned for improvements.

People would come by my cubicle and remark on how it looked as if I had nothing to do. If only you knew, I thought to myself.

One day, out of the blue, I was asked to interview someone. I was told he would be joining our group.

Okay, so I interviewed him. His main exposure to computing was Windows PC base. He had little to no Unix/Linux experience, much less programming, dealing with websites, huge amounts of data, and distributed anything. He did, however, have “wet lab” experience.

After he was hired, he told me that he was going to “rework things from top to bottom and make it easy for me.”

As he tried to “rework things”, he decided that everything had to be on one machine on one disk. He said this was to make things easier, but I really knew he didn’t understand the current configuration.

I decided it was time to bow out. I didn’t want to held responsible when his “improvements” came crashing down.

So I left. Before I did, however, he had me move everything to the one disk and then asked me to create a tarball. Well, tar wouldn’t work because the directory structure became so convoluted that the directory names were so long that tar couldn’t relate. Not to mention the size of what had to be tar’d!

This was a warning of future activities as the department decided to invest in a Linux farm at his suggestion. Needless to say, the system was improperly configured and crashed about two to three times a week.

Remember those long directory names that tar couldn’t comprehend? Well, the Linux farm was configured to continue this wonderful tradition with the result that everybody thought that tar was archiving the data when it really wasn’t. Nobody was reading the error files generated by the tar process, so nobody paid attention until sometime later. Then they hired a sysadmin whose sole duty was to watch the tar archive and make sure it took.

I was off to other venues, one of which was Enron. Yes, I was there when it all came down, but the 200 people in my department were not affected. It was a circus getting to work for awhile. Reporters were hiding under the stairways in the parking garage hoping for news tidbits.

I heard that my old bioinformatics department had hired an “efficiency expert” to advise them on the problems they were encountering and advice on how to fix them.

They got tired of hearing how everything was broken, so they let the expert go. Business continued as usual.

Next they hired a really good sys admin to take care of things, but he spent most of his time keeping the farm going by scrounging parts from the backup system. He left after awhile.

The moral of the story is this – know what you are hiring. Don’t give them power to instrument something they don’t know about or understand. This will upset the people that know what they are doing and that valuable experience will move on.

A lot of good people with a much better plan than a misconfigured Linux farm went on to better pastures, putting the department in the position of trying to recover what was lost.

With all the data now forth coming and the latest news that most sequences are wrongly annotated, it’s time for experts.

As an aside – A talk given on Thursday, Jan. 6th a fascinating talk on deep transcriptome analysis by Chris Mason, Assistant Professor, at the Institute for Computational Biomedicine at Cornell University listed the following observations that next-gen sequencing is bringing to light.

Some of the most interesting points from Mason’s talk were:

  • A large fraction of the existing genome annotation is wrong.
  • We have far more than 30,000 genes, perhaps as many as 88,000.
  • About ten thousand genes use over 6 different sites for polyadenylation.
  • 98% of all genes are alternatively spliced.
  • Several thousand genes are transcribed from the “anti-sense”strand.
  • Lots of genes don’t code for proteins.  In fact, most genes don’t code for proteins.

Mason also described the discovery of 26,187 new genes that were present in at least two different tissue types.

For more, see –

Genetics and biology are concrete sciences. Computer science and engineering entail a lot of abstract thinking which is desperately needed for the underlying structure to support the analysis of the masses of sequence data currently amassing.

Get the right people for the job and you won’t find trouble.

Top of page / Subscribe to new Entries (RSS)