<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>LifeFormulae Blog</title>
	<atom:link href="http://blog.lifeformulae.com/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.lifeformulae.com</link>
	<description></description>
	<pubDate>Tue, 31 Jan 2012 20:25:31 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>LIfe Changes</title>
		<link>http://blog.lifeformulae.com/2012/01/life-changes/</link>
		<comments>http://blog.lifeformulae.com/2012/01/life-changes/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 20:25:31 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=210</guid>
		<description><![CDATA[p { margin-bottom: 0.08in; }
I made a  recent life change that contained a silver lining.   It revealed that the simple way is usually the best way.  It  also revived a few latent talents and I discovered one I didn’t know I had.
First. Let me explain what happened.  My parents [...]]]></description>
			<content:encoded><![CDATA[<p>p { margin-bottom: 0.08in; }</p>
<p style="margin-bottom: 0in;">I made a  recent life change that contained a silver lining.   It revealed that the simple way is usually the best way.  I<span style="font-style: normal;">t </span> also revived a few latent talents and I discovered one I didn’t know I had.</p>
<p style="margin-bottom: 0in;">First. Let me explain what happened.  My parents are elderly, living alone, and needed some looking after.  I talked to my business partners and we agreed that I could do what needed to be done with our product over the web, provided we got together on a bi-weekly or monthly basis.</p>
<p style="margin-bottom: 0in;">So my husband and I moved.  We squeezed into a smaller living situation, big city to very small town.  I grew up in a small town, but after 30+ years away, the return was a little traumatic.</p>
<p style="margin-bottom: 0in;">There were several handicaps about living in this small town that we were aware of, but since we were usually “just visiting,” we didn’t pay them much mind.</p>
<ul>
<li>
<p style="margin-bottom: 0in;">There is only one grocery store 	that carries a fairly complete line of items.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There are 4 gasoline stations, but 	3 are owned by the same person.</p>
</li>
<li>
<p style="margin-bottom: 0in;">Other than DSL, there is no fast 	Ethernet.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There is only one doctor, one 	clinic, and one drug store.</p>
</li>
<li>
<p style="margin-bottom: 0in;">The only soft goods, such as 	clothing, cosmetics, grooming essentials are carried by a cut-rate 	supplier and selection is very limited.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There are less than 10 food 	establishments, including the local drive-ins and they all serve 	that same type of Texas fried  or Tex-Mex food.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There is no cable TV.  A dish is 	the only alternative unless you don’t mind a very limited 	selection of channels - about 6 provided reception is good.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There is no dry cleaner.</p>
</li>
<li>
<p style="margin-bottom: 0in;">The atmosphere is decidedly rural, 	with a good dose of sustenance and a healthy gun culture with a fair 	share of intolerance and everybody knows everybody.</p>
</li>
<li>
<p style="margin-bottom: 0in;">Boredom can be a problem.</p>
</li>
</ul>
<p style="margin-bottom: 0in;">Unless you want to drive 14 miles in one direction, or 25 in the other, you are stuck with what you can find.  Even at that the selection is still pretty limited.</p>
<p style="margin-bottom: 0in;">On the other hand –</p>
<ul>
<li>
<p style="margin-bottom: 0in;">There is only one stoplight.</p>
</li>
<li>
<p style="margin-bottom: 0in;">Dietary decisions become simpler, 	because there isn’t that much to choose from.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There is less traffic.  You can 	get to where you want to be in about 5 minutes.</p>
</li>
<li>
<p style="margin-bottom: 0in;">There are people who have retired 	from a stressful, urban job to a small town.  I call them “jewels”. 	Find them.</p>
</li>
<li>
<p style="margin-bottom: 0in;">The people look out for each 	other.</p>
</li>
<li>
<p style="margin-bottom: 0in;">You learn to entertain yourself 	instead of looking elsewhere.</p>
</li>
<li>
<p style="margin-bottom: 0in;">You learn to make do.</p>
<p style="margin-bottom: 0in;">
</li>
</ul>
<p style="margin-bottom: 0in;">My sister asked if I would come help her out at the local newspaper on a part-time basis.</p>
<p style="margin-bottom: 0in;">I said okay.  Although I grew up in a small-town newspaper publishing family, it has been a long, long time since I did that.</p>
<p style="margin-bottom: 0in;">Although they did not print the newspaper at the local site, articles had to be written, notices and pictures taken, and advertising sold.</p>
<p style="margin-bottom: 0in;">Computers and digital cameras have made the whole process a lot easier, but articles, notices, etc., still had to be typeset, whether by scanner equipped with optical scanner recognition of by hand.  Advertising still has to be sold over the phone or in person.</p>
<p style="margin-bottom: 0in;">I discovered that I had a knack for taking photographs. I was told, I also found I can still write a pretty fair article.</p>
<p style="margin-bottom: 0in;">I never cared much for taking photographs, but I think I may have stumbled upon a new hobby.</p>
<p style="margin-bottom: 0in;">As to living in a small town.  It is different.  But the slow pace can be wonderful after a go-go/survival of the urban landscape.</p>
<p style="margin-bottom: 0in;">I have learned to take things a little slower and think more simplistically.</p>
<ul>
<li>
<ul>
<li>
<p style="margin-bottom: 0in;">Break down a task into a few 		simple steps.</p>
<ul>
<li>
<p style="margin-bottom: 0in;">If you can&#8217;t get your head 			around the entire project, start on one or two pieces.  			Eventually, they will integrate into a whole.</p>
</li>
</ul>
</li>
<li>
<p style="margin-bottom: 0in;">Keep an eye on what you are 		trying to accomplish.</p>
<ul>
<li>
<p style="margin-bottom: 0in;">Try to visualize  that big 			picture.</p>
</li>
</ul>
</li>
<li>
<p style="margin-bottom: 0in;">Factor in the parameters that 		will help to achieve that effect.</p>
<ul>
<li>
<p style="margin-bottom: 0in;">Define those parameters and 			represent them in a manner meaningful to the project.</p>
</li>
</ul>
</li>
<li>
<p style="margin-bottom: 0in;">Trust yourself enough make the 		best decision about what you are trying to achieve.</p>
</li>
<li>
<p style="margin-bottom: 0in;">Search for the “jewels” who 		will enable that decision. A conversation with someone about 		anything just might shake loose the information you need.</p>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2012/01/life-changes/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Data Project Development Pointers</title>
		<link>http://blog.lifeformulae.com/2011/05/data-project-development-pointers/</link>
		<comments>http://blog.lifeformulae.com/2011/05/data-project-development-pointers/#comments</comments>
		<pubDate>Mon, 02 May 2011 20:01:32 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=208</guid>
		<description><![CDATA[A recent Bioinform (www.bioinform.com) poll asked “What are the biggest informatics challenges for the next generation sequencing data?”  The poll results are listed as follows: 57% Functional Interpretation; 24% Data Management; 9 % Assembly and Alignment; 4% Variant Calling; and 4% Storage.
As a former Data Engineer entrusted throughout my career with obscene amounts of [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">A recent Bioinform (<a title="Bioinform" href="http://www.bioinform.com" target="_blank">www.bioinform.com</a>) poll asked “What are the biggest informatics challenges for the next generation sequencing data?”  The poll results are listed as follows: 57% Functional Interpretation; 24% Data Management; 9 % Assembly and Alignment; 4% Variant Calling; and 4% Storage.</p>
<p style="margin-bottom: 0in;">As a former Data Engineer entrusted throughout my career with obscene amounts of various kinds of data, I am appalled that data management and storage ranked so low.  Where&#8217;s your Functional Interpretation without the data?</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">I&#8217;ve worked with all sorts of data.  Data, that in some instances, was obtained under adverse conditions and could not be duplicated had to be protected, more or less, by my very skin  (or so I was threatened).</p>
<p style="margin-bottom: 0in;">Next-gen sequencing is producing files of short run data that are amplifying the errors inherent in first-gen sequence data.  These next-gen files are being produced at a phenomenal rate, sometimes surpassing the petabyte count.</p>
<p style="margin-bottom: 0in;">Data managers can be thankful that data storage has been developed that provides a lot of bang for the buck, as 4T drives as just about standard and the 2G file limit has been eliminated.</p>
<p style="margin-bottom: 0in;">Having worked in the field with the first Compaq lap tops and the later Zenith with a 40M hard drive, this is very heartening news.</p>
<p style="margin-bottom: 0in;">I&#8217;ve put together a list of data pointers that anyone attempting to work with data of any kind needs to read.</p>
<p style="margin-bottom: 0in;">What are you trying to measure or analyze is the most fundamental question.</p>
<p style="margin-bottom: 0in;">Close on the heels of this one is - How to Acquire the Data.  Is there a system in place that can produce the data stream necessary.  If not, is there a system that can be modified that will produce the data you need.  If not, what will it take – hardware and software, to produce what you want.</p>
<p style="margin-bottom: 0in;">This data acquisition phase can be extremely costly if you don&#8217;t have a overall idea of the complete system – acquisition, storage, and analysis.</p>
<p style="margin-bottom: 0in;">Next question – How much data are we talking about?  Is it limited to a file, a system, or a cluster of devices?</p>
<p style="margin-bottom: 0in;">Where are we going to store this data?  Do we have the storage equipment at hand?  If we do have the equipment, can we add on what we need without reinventing the wheel?</p>
<p style="margin-bottom: 0in;">The reader will probably instantly of the “cloud.”  However, as of late (i.e.Amazon&#8217;s EC2 cloud outage), tech blogs are stating that a cloud hack is just a thought away (<a title="Cloud Hacks" href="http://tech.blorge.com/Structure:%20/2011/04/28/data-security-in-the-cloud-sucks-as-witness-sony-psn-hack/" target="_blank">http://tech.blorge.com/Structure:%20/2011/04/28/data-security-in-the-cloud-sucks-as-witness-sony-psn-hack/</a>).</p>
<p style="margin-bottom: 0in;">Another question, will the data be stored in its raw format or will it need massaging.  Raw data vs. manipulated or converted data (i.e. binary converted to engineering units or test) can easily quadruple your storage needs and costs.</p>
<p style="margin-bottom: 0in;">Will data from various hardware sources need integration into the data stream?  How is this integration occur?  Will additional software be necessary?  Is a data model required?  In some instances, more than one data model may be necessary. Is a database reflecting these models needed?  Who will develop the data model and administer the database?</p>
<p style="margin-bottom: 0in;">And, while we&#8217;re talking about it, How easy or difficult would it be to take archived data and have it available for processing – a few minutes, a day, a week?</p>
<p style="margin-bottom: 0in;">If its stored as binary or other basic (raw) form, how long to pull that data from the archive, convert it, and have it available for analysis?</p>
<p style="margin-bottom: 0in;">How are you going to certify that the raw data is correct and the conversion utility created a true conversion of that raw data?</p>
<p style="margin-bottom: 0in;">Just the term “archived data” has its own implications.  What do you mean by “archived” vs. “active” data.  What raises the flag that says this active data can now be archived?  Are there several phases in archiving that data? How long will it take?</p>
<p style="margin-bottom: 0in;">Some of the tests I&#8217;ve been involved with involved acquisition of live data in the field, performing spot analysis of the data as it was acquired.  This live data was subsequently saved to digital tape or hard drive for further detailed analysis.</p>
<p style="margin-bottom: 0in;">A three and a half week field test turned into 3 to 4 months of analysis at home base.  The archived data had to perfectly mirror the live data and data analysis obtained in the field.</p>
<p style="margin-bottom: 0in;">Could you do this with your data?  Rerunning a field test is an expensive proposition – many thousands of dollars could be involved.</p>
<p style="margin-bottom: 0in;">Speaking of analysis – who will be analyzing the data.  What hardware and software do the have or need?  Will further software development be in the picture along with hardware upgrades?</p>
<p style="margin-bottom: 0in;">Are different platforms involved?  Is the data representation on each platform consistent?</p>
<p style="margin-bottom: 0in;">Little Endian to Big Endian was a major problem at one time, followed by 32 and 64 bit system representations.  Ask the end users questions and don&#8217;t be blind-sided by system differences.</p>
<p style="margin-bottom: 0in;">Another analysis question concerns subsets of data.  Can you subset your data store?  (I hope you&#8217;ve developed data models to support effort.)</p>
<p style="margin-bottom: 0in;">A final question concerns manpower and experience.  Do you have the staff that has the experience to support the endeavor.  Saying you know SQL because you read a text defining SQL isn&#8217;t going to get it.</p>
<p style="margin-bottom: 0in;">I can&#8217;t stress how important the proper, experienced staff can be.   The hardest staff position to fill is that of project manager.  A really, good project manager should come equipped with a CV replete with a list of  incremental  project management experience.  You will probably have to pay through the nose for a good one, but it will be worth it in the end.</p>
<p style="margin-bottom: 0in;">I had to choose between a person with a biology background and little to no programming versus one with a background in computer science, mathematics, or engineering, I&#8217;d choose the latter.  They can pick up the biology. Of course, this depends on the person under consideration.</p>
<p style="margin-bottom: 0in;" align="LEFT">First question I ask myself is - Could this person help me get a plane off the ground?  Can they handle stress?  Do they think on their feet?  How organized are they?  How do they do in ill-defined environments?   Do they fit in? Will their personality get in the way?</p>
<p style="margin-bottom: 0in;" align="LEFT">In any case, look beyond that paper resume and the list of provided references.  You don&#8217;t want someone whose only experience consists of “Perl Scripts Done in a Panic”.</p>
<p style="margin-bottom: 0in;">There is a lot to consider in the development of a system that turns on a piece of data.  Ask questions.  No matter how naive they may sound, I guarantee you will save time, and time means money.</p>
<p style="margin-bottom: 0in;">For a little humor regarding software development check out – <a title="Software Development Humor" href="http://davidlongstreet.wordpress.com/category/software-development/humor/" target="_blank">http://davidlongstreet.wordpress.com/category/software-development/humor/</a>.</p>
<p style="margin-bottom: 0in;">You may need it.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2011/05/data-project-development-pointers/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Cue the Hallelujah Chorus!</title>
		<link>http://blog.lifeformulae.com/2011/02/cue-the-hallelujah-chorus/</link>
		<comments>http://blog.lifeformulae.com/2011/02/cue-the-hallelujah-chorus/#comments</comments>
		<pubDate>Mon, 21 Feb 2011 18:54:17 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[computational biology]]></category>

		<category><![CDATA[computer analysis]]></category>

		<category><![CDATA[computer science]]></category>

		<category><![CDATA[cross functional individuals]]></category>

		<category><![CDATA[engineering]]></category>

		<category><![CDATA[nature]]></category>

		<category><![CDATA[Nature Biotechnology]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=206</guid>
		<description><![CDATA[The Volume 29 Number 1 January 2011 issue of nature biotechnology (www.nature.com/naturebiotechnology) finally puts in print what I&#8217;ve been recommending all along.   The Feature article on computational BIOLOGY, “Trends in computation biology – 2010” on page 45 states, “Interviews with leading scientists highlight several notable breakthroughs in computational biology from the past year [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The Volume 29 Number 1 January 2011 issue of <strong>nature biotechnology</strong> (<a title="nature biotechnology" href="http://www.nature.com/naturebiotechnology" target="_blank">www.nature.com/naturebiotechnology</a>) finally puts in print what I&#8217;ve been recommending all along.   The Feature article on computational BIOLOGY, “Trends in computation biology – 2010” on page 45 states, “Interviews with leading scientists highlight several notable breakthroughs in computational biology from the past year and suggest areas where computation may drive biological discovery,”</p>
<p style="margin-bottom: 0in;">The researchers were asked to nominate papers of particular interest published in the previous year that have influenced the direction of their research.</p>
<p style="margin-bottom: 0in;">The article is good, but what was really interesting was <strong>Box 2 – Cross-functional individuals</strong> on page 49.  To quote, “Our analysis&#8230;suggests that researchers of a particular type are driving much of cutting-edge computational biology.  Read on to find out what characterizes them.”</p>
<p style="margin-bottom: 0in;">I&#8217;m going to re-print <strong>Box 2 Cross-functional individuals</strong> in it&#8217;s entirety since it&#8217;s short and the message is so very important.</p>
<p style="margin-bottom: 0in;"><strong>Box 2  Cross-functional individuals</strong></p>
<p style="margin-bottom: 0in;">In the courses of compiling this survey, several investigators remarked that it tends to be easier for computer scientists to learn biology that for biologists to learn computer science.  Even so, it is hard to believe that learning the central dogma and the Krebs cycle will enable your typical programmer-turned-computational biologist to stumble upon a project that yields important novel biological insights.  So what characterizes  successful computational biologists?</p>
<p style="margin-bottom: 0in;">George Church, whose laboratory at Harvard Medical School (Cambridge, MA USA) has a history of producing bleeding-edge research in many cross-disciplinary domains, including computational biology, say, “Individuals in my lab tend to be curious and somewhat dissatisfied with the way things are. They are comfortable in two domains simultaneously.  This has allowed us to go after problems in the space between traditional research projects.”</p>
<p style="margin-bottom: 0in;">A former Church lab member, Greg Porreca, articulates this idea further, “I&#8217;ve found that many advances in computational biology start with simple solutions written by cross-functional individuals to accomplish simple tasks.  Bigger problems are hard to address with those rudimentary algorithms, so folks with classical training in computer science step in and devise highly optimized solutions that are faster and more flexible.”</p>
<p style="margin-bottom: 0in;">An overarching theme that also emerges from this survey suggests that tools for computational analysis permeated biological research according to three states: first, a cross-functional individual  sees a problem and devises a solution good enough to demonstrate the feasibility of a type of analysis; second, robust tools are created, often utilizing the specialized knowledge of formally trained computer  scientists; and third, the tools reach biologists focused on understanding specific phenomena, who incorporate the tools into everyday use.  These stages echo existing broader literature on disruptive innovations<sup>1</sup> and technology-adoption life cycles<sup>2,3,</sup> which may suggest how breakthroughs in computational biology can be nurtured.</p>
<ol>
<li>
<p style="margin-bottom: 0in;">Christiansen, C.M. &amp; Bower, 	J.I., Disruptive technologies: catching the wave. Harvard Business 	Review (1995).</p>
</li>
<li>
<p style="margin-bottom: 0in;">Moore, G.A. Crossing the Chase: 	Marketing and Selling High-Tech Products to Mainstream Customers 	(Harvard Business, 1999)</p>
</li>
<li>
<p style="margin-bottom: 0in;">Rogers, E.M. Diffusion of 	Innovations (Free Press, 2003).</p>
</li>
</ol>
<p style="margin-bottom: 0in;">Biologists must become aware of what the disciplines of computer science and engineering can offer computational biology.  Until this happens, forward progress in computational biological innovations and discovery will be unnecessarily hampered by a number of superfluous factors not the least of which is complacence.</p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2011/02/cue-the-hallelujah-chorus/feed/</wfw:commentRss>
		</item>
		<item>
		<title>A Valuable Lesson</title>
		<link>http://blog.lifeformulae.com/2011/01/a-valuable-lesson/</link>
		<comments>http://blog.lifeformulae.com/2011/01/a-valuable-lesson/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 20:47:51 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[genes]]></category>

		<category><![CDATA[genome]]></category>

		<category><![CDATA[Linux]]></category>

		<category><![CDATA[next-gen]]></category>

		<category><![CDATA[polyadenylation]]></category>

		<category><![CDATA[proteins]]></category>

		<category><![CDATA[sequencing]]></category>

		<category><![CDATA[tar]]></category>

		<category><![CDATA[transcription]]></category>

		<category><![CDATA[transcriptome]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=203</guid>
		<description><![CDATA[There once was a web portal called the Search Launcher to which I dedicated 4 years of my life.
It sort of fell into my lap.  The only orientation I got was the basic directory structure.  The supporting csh scripts, programs, and databases (and the programs that created those databases) I had to discover [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">There once was a web portal called the Search Launcher to which I dedicated 4 years of my life.</p>
<p style="margin-bottom: 0in;">It sort of fell into my lap.  The only orientation I got was the basic directory structure.  The supporting csh scripts, programs, and databases (and the programs that created those databases) I had to discover on my own.</p>
<p style="margin-bottom: 0in;">It took me about a year to get it all under control.  I think I actually made things better.  I was used to dealing with tons of data, being on call at all hours, and minor things like distributed and parallel processing.  I had the welcoming server off-loading compute-intensive processes to another server at a faster machine, as NFS was proving too flaky.  I had huge databases distributed across many disks according to their size.  <em> </em><span style="font-style: normal;">I had website client programs on the Mac and PC that a researcher could run from a local machine.</span></p>
<p style="margin-bottom: 0in;">I would work late and late on weekends during the off-hours when the load was lighter.  If I had something really crucial to do, I telecommuted because I didn&#8217;t want any distractions.</p>
<p style="margin-bottom: 0in;">Things ran smoothly, but there was the occasional hiccup.  Mostly I monitored things and planned for improvements.</p>
<p style="margin-bottom: 0in;">People would come by my cubicle and remark on how it looked as if I had nothing to do.  If only you knew, I thought to myself.</p>
<p style="margin-bottom: 0in;">One day, out of the blue, I was asked to interview someone.  I was told he would be joining our group.</p>
<p style="margin-bottom: 0in;">Okay, so I interviewed him.  His main exposure to computing was Windows PC base.  He had little to no Unix/Linux experience, much less programming, dealing with websites, huge amounts of data, and distributed anything.  He did, however, have “wet lab” experience.</p>
<p style="margin-bottom: 0in;">After he was hired, he told me that he was going to “rework things from top to bottom and make it easy for me.”</p>
<p style="margin-bottom: 0in;">As he tried to “rework things”, he decided that everything had to be on one machine on one disk.  He said this was to make things easier, but I really knew he didn&#8217;t understand the current configuration.</p>
<p style="margin-bottom: 0in;">I decided it was time to bow out.  I didn&#8217;t want to held responsible when his “improvements” came crashing down.</p>
<p style="margin-bottom: 0in;">So I left.  Before I did, however, he had me move everything to the one disk and then asked me to create a tarball.  Well, tar wouldn&#8217;t work because the directory structure became so convoluted that the directory names were so long that tar couldn&#8217;t relate. Not to mention the size of what had to be tar&#8217;d!</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">This was a warning of future activities as the department decided to invest in a Linux farm at his suggestion.  Needless to say, the system was improperly configured and crashed about two to three times a week.</p>
<p style="margin-bottom: 0in;">Remember those long directory names that tar couldn&#8217;t comprehend?  Well, the Linux farm was configured to continue this wonderful tradition with the result that everybody thought that tar was archiving the data when it really wasn&#8217;t.  Nobody was reading the error files generated by the tar process, so nobody paid attention until sometime later.  Then they hired a sysadmin whose sole duty was to watch the tar archive and make sure it took.</p>
<p style="margin-bottom: 0in;">I was off to other venues, one of which was Enron.  Yes, I was there when it all came down, but the 200 people in my department were not affected.  It was a circus getting to work for awhile.  Reporters were hiding under the stairways in the parking garage hoping for news tidbits.</p>
<p style="margin-bottom: 0in;">I heard that my old bioinformatics department had hired an “efficiency expert” to advise them on the problems they were encountering and advice on how to fix them.</p>
<p style="margin-bottom: 0in;">They got tired of hearing how everything was broken, so they let the expert go.  Business continued as usual.</p>
<p style="margin-bottom: 0in;">Next they hired a really good sys admin to take care of things, but he spent most of his time keeping the  farm going by scrounging parts from the backup system.  He left after awhile.</p>
<p style="margin-bottom: 0in;">The moral of the story is this – know what you are hiring.  Don&#8217;t give them power to instrument something they don&#8217;t know about or understand.  This will upset the people that know what they are doing and that valuable experience will move on.</p>
<p style="margin-bottom: 0in;">A lot of good people with a much better plan than a misconfigured Linux farm went on to better pastures, putting the department in the position of trying to recover what was lost.</p>
<p style="margin-bottom: 0in;">With all the data now forth coming and the latest news that most sequences are wrongly annotated, it&#8217;s time for experts.</p>
<p style="margin-bottom: 0in;">As an aside – A talk given on Thursday, Jan. 6th a fascinating talk on deep transcriptome analysis by Chris Mason, Assistant Professor, at the Institute for Computational Biomedicine at Cornell University listed the following observations that next-gen sequencing is bringing to light.</p>
<p style="margin-bottom: 0in;">Some of the most interesting points from Mason&#8217;s talk were:</p>
<ul>
<li>A large fraction of the existing 	genome annotation is wrong.</li>
<li>We have far more than 30,000 	genes, perhaps as many as 88,000.</li>
<li>About ten thousand genes use over 	6 different sites for polyadenylation.</li>
<li>98% of all genes are alternatively 	spliced.</li>
<li>Several thousand genes are 	transcribed from the &#8220;anti-sense&#8221;strand.</li>
<li>Lots of genes don&#8217;t code for proteins.  In fact, most 	genes don&#8217;t code for proteins.</li>
</ul>
<p>Mason also described the discovery of 26,187 new genes that were present in at least two different tissue types.</p>
<p style="margin-bottom: 0in;">For more, see – <a href="http://scienceblogs.com/digitalbio/2011/01/next_gene_sequencing_results_a.php">http://scienceblogs.com/digitalbio/2011/01/next_gene_sequencing_results_a.php</a><a title="Next-gen Sequencing" href="http://scienceblogs.com/digitalbio/2011/01/next_gene_sequencing_results_a.php." target="_blank">.</a></p>
<p style="margin-bottom: 0in;">Genetics and biology are concrete sciences.  Computer science and engineering entail a lot of abstract thinking which is desperately needed for the underlying structure to support the analysis of the masses of sequence data currently amassing.</p>
<p style="margin-bottom: 0in;">Get the right people for the job and you won&#8217;t find trouble.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2011/01/a-valuable-lesson/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Another Study on What Women Really Want</title>
		<link>http://blog.lifeformulae.com/2010/08/another-study-on-what-women-really-want/</link>
		<comments>http://blog.lifeformulae.com/2010/08/another-study-on-what-women-really-want/#comments</comments>
		<pubDate>Tue, 24 Aug 2010 20:53:59 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[aerospace]]></category>

		<category><![CDATA[biotechnology]]></category>

		<category><![CDATA[engineering]]></category>

		<category><![CDATA[engineers]]></category>

		<category><![CDATA[IT]]></category>

		<category><![CDATA[science]]></category>

		<category><![CDATA[scientists]]></category>

		<category><![CDATA[women]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=200</guid>
		<description><![CDATA[First, we had the guy from Harvard try to explain that women aren&#8217;t interested in science because there is an intrinsic aptitude for things scientific based on gender. Guess which gender is deemed as more scientific?
Now, we have a new observation brought to us by Wray Herbert (http://www.huffingtonpost.com/wray-herbert/women-science_b_652858.html).
According to Miami University psychological scientist Amanda Dickman, [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">First, we had the guy from Harvard try to explain that women aren&#8217;t interested in science because there is an intrinsic aptitude for things scientific based on gender. Guess which gender is deemed as more scientific?</p>
<p style="margin-bottom: 0in;">Now, we have a new observation brought to us by Wray Herbert (<a title="Why Women Are Shunning Science Careers" href="http://www.huffingtonpost.com/wray-herbert/women-science_b_652858.html" target="_blank">http://www.huffingtonpost.com/wray-herbert/women-science_b_652858.html</a>).</p>
<p style="margin-bottom: 0in;">According to Miami University psychological scientist Amanda Dickman, there is a new explanation citing a difference in worthiness or values rather than ability.   It seems, according to the new theory, that women reject science, engineering, and math because they view the these fields as too ego and power driven for their tastes.</p>
<p style="margin-bottom: 0in;">The unambiguous results for the study found that  young women did see science and engineering careers as isolated and individualistic&#8211;and what&#8217;s more, as obstacles to finding meaning in their lives.</p>
<p style="margin-bottom: 0in;">The article goes on to state that it seems to be a perception thing.  I would agree that it could very well be the perception thing, but there I think there is a little more to it than that.</p>
<p style="margin-bottom: 0in;"><strong>A Little Background</strong></p>
<p style="margin-bottom: 0in;">My higher education endeavors began with a trip down the road that would merit approval from the study group quoted above. I got an undergraduate degree in Social and Behavioral Sciences and was just a few hours away from a graduate degree when I discovered I was bored to death.   Something was missing.  There was no challenge.</p>
<p style="margin-bottom: 0in;">I tried the MBA path.  Nothing doing.</p>
<p style="margin-bottom: 0in;">I had taken an intro to computers course as part of my undergraduate course work and a ton of statistics courses but neither appealed. It wasn&#8217;t until I ran into my first “micro-computer” (as they were then known), that I realized this little machine was really going to change things. I even got a Heath kit catalog, ordered the H-89 kit, and put it together.</p>
<p style="margin-bottom: 0in;">The closest decree to a computer science degree my university offered was a degree in mathematical sciences.  I signed up for that.</p>
<p style="margin-bottom: 0in;">Believe me, it wasn&#8217;t easy.  I had already gotten the required courses out of the way, so for three semesters every class I had was either math or computer science.  But it was interesting and definitely challenging.</p>
<p style="margin-bottom: 0in;"><strong>Group Members</strong></p>
<p style="margin-bottom: 0in;">The isolated and individualistic scientist, engineer, computer scientist as cited by the study does not exist in the real world.</p>
<p style="margin-bottom: 0in;">My first post graduation gig was at the Health Services Division of a  major aerospace company as a compiler developer.   I was part of the Systems Enhancements and Extensions <strong>Group</strong>.  From there, I transferred to the aircraft company in that same corporation.  I was part of the Flight  Test Research and Development <strong>Group</strong>.   I went to another aircraft company and the Instrumentation <strong>Group</strong>.  And so on.  You were always a member of a group. A group that together designed, developed, and produced things – computer software, digital data acquisition systems, aircraft manufacturing scheduling systems, etc.</p>
<p style="margin-bottom: 0in;">When I moved over to biotechnology, it was the same – you were a member of a <strong>group</strong>.  A lab group, a bioinformatics group developing LIMS systems, sequence analysis and imaging recognition software, and so on.</p>
<p style="margin-bottom: 0in;">However, I did find that scientists more that engineers were more power/ego driven. I think this is because of funding issues.  Although both areas receive the majority of their funds from the government, the basis of the awards is different.</p>
<p style="margin-bottom: 0in;">The individual scientist, as P.I., applies for the grant, writes the proposal and receives the funding – almost a personal assessment of that scientist&#8217;s capabilities.       Furthermore, I feel that the letters - “PhD”, carries a lot of baggage.</p>
<p style="margin-bottom: 0in;">For most engineers, the company applies for the grant, writes the proposal (after the engineers have okayed the design), and receives the funding. The engineer is associated with the program for which that proposal was submitted.   The engineer isn&#8217;t as personally involved.</p>
<p style="margin-bottom: 0in;"><strong>What I&#8217;ve Encountered</strong></p>
<p style="margin-bottom: 0in;">In the military industrial complex I encountered bored ex-military who used  weekly status reports to declare war on some other part of the division .   These attacks were mostly diversions and never amounted to much.  These could be construed as power plays, but I list them as “play” period.</p>
<p style="margin-bottom: 0in;">Believe me, there were some good ones – stopping just short of an exchange of blows.  It&#8217;s also amazing how far echoes carry in an aircraft hanger.</p>
<p style="margin-bottom: 0in;">The following examples are situations I encountered along the way.  They are mostly examples of misdirected intentions, but a few border on outright criminality.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">There were approximately 8 databases that all held the same information but for 8 different divisions.  The electronics parts – transducers, potentiometers, strain gauges, resistors etc, in each of the databases were exactly the same.  However, the nomenclature varied by division.  We tried to standardize on one database system with one naming standard, but ran straight into a brick wall.   Not one division was willing to cede to another.  It was only after word came down from on high that additional funding would not be forthcoming,  that everybody finally sat down to talk.</p>
<p style="margin-bottom: 0in;"><strong>Insane Budgeting Exercises </strong></p>
<p style="margin-bottom: 0in;">One division needed to get a new system but was offered an old barely breathing system with exorbitant maintenance costs.   The division was instructed to budget for and use the old system for the current fiscal year.    For the he next budget cycle, the department was  to state that a new system (the one originally requested) would save X amount of dollars over last year&#8217;s budget.  The new system  was then be given the green light.</p>
<p style="margin-bottom: 0in;">A director was undercutting his yearly budget to emphasize cost savings.  Consequently, his budget was always cut to that amount for the next year.  It was pointed out that he should over run this year&#8217;s budget by the amount he wanted for next year.  Then he would (and did) get the additional funding.</p>
<p style="margin-bottom: 0in;"><strong>A Simple Name Change can Work Wonders</strong></p>
<p style="margin-bottom: 0in;">it was ascertained that for less that the amount the department was paying IT for storage of design data, a new system, software, and personnel could be purchased and hired.  Department was notified that requesting a “computer system” would not meet with budgeting approval Only after the system was termed a “data multiplexer” to be administered by “data design personnel” was department able to proceed with system purchase.</p>
<p style="margin-bottom: 0in;"><strong>One Size Does Not Fit All</strong></p>
<p style="margin-bottom: 0in;">IT sends down list of “acceptable” software.  So-called software was specifically IT oriented and would not work in an engineering environment.  Division engineers take up collection and purchase needed software themselves.</p>
<p style="margin-bottom: 0in;"><strong>Almost Criminal</strong></p>
<p style="margin-bottom: 0in;">Vast amounts of money, time, manpower were spent developing a manufacturing scheduling system for aircraft manufacture.  System rated manufacturing personnel in terms of ability.  System was deemed a major success – avoiding bottlenecks, completion times, etc.  System was never deployed due to union demands that manufacturing personnel could not be rated in terms of ability.</p>
<p style="margin-bottom: 0in;">Decode system purchased for data acquisition decode and analysis ($150K) was purchased without installed hard drive for data storage ($15K).  It was determined system could use in-house data farm to store data.  Decode system required confirmation that contiguous data storage space was available  to go ahead and store data.</p>
<p style="margin-bottom: 0in;">Transfer mechanism did not provide this info, so decode system would not store data on data farm.    Contractor told department officials that the system software on the decode system and in-house data farm were incompatible.  Contractor sold department customized software for $750K to replace decode system.</p>
<p style="margin-bottom: 0in;"><strong>A Meaningful Life</strong></p>
<p style="margin-bottom: 0in;">I&#8217;ve never considered my career in engineering and biotechnology as isolated and individualistic.  Sure, you have individual work, but it is as part of a team.</p>
<p style="margin-bottom: 0in;">As far as letting the ego and power driven become obstacles,  I have to admit that my behavioral sciences background provided one of the most important career tools I have yet to encounter.  My “Advanced Abnormal Psychology” course taught me how to observe and analyze people.</p>
<p style="margin-bottom: 0in;">To find meaning in one&#8217;s life entails one heck of a lot more than a career.    Perhaps by observing and analyzing one&#8217;s misconceptions about one area will enhance our conceptions of life in general.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/08/another-study-on-what-women-really-want/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What is Old Is New Again - The Cloud</title>
		<link>http://blog.lifeformulae.com/2010/05/what-is-old-is-new-again-the-cloud/</link>
		<comments>http://blog.lifeformulae.com/2010/05/what-is-old-is-new-again-the-cloud/#comments</comments>
		<pubDate>Fri, 21 May 2010 20:44:05 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[Amazon Elastic Cloud]]></category>

		<category><![CDATA[Apache]]></category>

		<category><![CDATA[BCM Bioinformatics]]></category>

		<category><![CDATA[Bill Ballmer]]></category>

		<category><![CDATA[Bloom]]></category>

		<category><![CDATA[Cloud Computing]]></category>

		<category><![CDATA[comparative genomics]]></category>

		<category><![CDATA[data interoperability]]></category>

		<category><![CDATA[data ownership]]></category>

		<category><![CDATA[data separation]]></category>

		<category><![CDATA[DMTF]]></category>

		<category><![CDATA[elasticity]]></category>

		<category><![CDATA[Google App Engine]]></category>

		<category><![CDATA[Hadoop]]></category>

		<category><![CDATA[HPC]]></category>

		<category><![CDATA[hybervisor]]></category>

		<category><![CDATA[Internet]]></category>

		<category><![CDATA[IT]]></category>

		<category><![CDATA[libcloud]]></category>

		<category><![CDATA[Micosoft Azure]]></category>

		<category><![CDATA[MIT]]></category>

		<category><![CDATA[Open Cloud Standards Incubator]]></category>

		<category><![CDATA[Richard Stallman]]></category>

		<category><![CDATA[security]]></category>

		<category><![CDATA[StarCluster]]></category>

		<category><![CDATA[Use Cases]]></category>

		<category><![CDATA[W3c]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=195</guid>
		<description><![CDATA[ Cloud computing is the current IT rage, said to cure all information management skills.
Cloud computing is just a new name for timeshare, a system in which various entities shared a centralized computing facility.  A giant piece or two of big iron and floors of tape decks provided information processing and storage capabilities for [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;" align="LEFT"><img src="file:///tmp/moz-screenshot.gif" alt="" /> Cloud computing is the current IT rage, said to cure all information management skills.</p>
<p style="margin-bottom: 0in;" align="LEFT">Cloud computing is just a new name for timeshare, a system in which various entities shared a centralized computing facility.  A giant piece or two of big iron and floors of tape decks provided information processing and storage capabilities for a price.</p>
<p style="margin-bottom: 0in;" align="LEFT">The user was connected to the mainframe by a dumb terminal and later on by PC&#8217;s.  The advantage (said the sales jargon), was that the user didn&#8217;t need to buy any additional hardware, worry about software upgrades or data backup and recovery.  They would only pay for the time and space their processes required.  Resources would be pooled and connected by a high speed network and could be accessed as demanded.  The user wouldn&#8217;t really know what computing resources were in use, they just got results. Everything depended on the network communications between the use and centralized computing source.</p>
<p style="margin-bottom: 0in;" align="LEFT"><strong>What is New</strong></p>
<p style="margin-bottom: 0in;" align="LEFT">Cloud computing is more powerful today because the communications network is the Internet.   Some Cloud platforms also offer Web access to the tools – programming language, database, web utilities needed  to create the cloud application.</p>
<p style="margin-bottom: 0in;" align="LEFT">The most important aspect I believer the Cloud offers is instant elasticity.   A process can be upgraded almost instantaneously to use more nodes and obtain more computing power.</p>
<p style="margin-bottom: 0in;" align="LEFT">There are quite a few blog entries out there concerning the “elastic” cloud.  For thoughts on “spin up” and “spin down” elasticity see <a title="Elasticity in the Cloud" href="http://timothyfitz.wordpress.com/2009/02/14/cloud-elasticity/" target="_blank">http://timothyfitz.wordpress.com/2009/02/14/cloud-elasticity/</a>.  For thoughts on “how elasticity could make you go broke, or On-demand IT overspending” see <a title="Elasticity Can Make You Go Broke" href="http://blogs.gartner.com/daryl_plummer/2009/03/11/cloud-elasticity-could-make-you-go-broke/" target="_blank">http://blogs.gartner.com/daryl_plummer/2009/03/11/cloud-elasticity-could-make-you-go-broke/</a>.</p>
<p style="margin-bottom: 0in;" align="LEFT">And finally, an article that spawned the “elasticity is a myth” connotation or “over-subscriptionand over-capacity are two different things, see – <a title="Elasticity Is A Myth" href="http://www.rationalsurvivability.com/blog/?p=1672&amp;cpage=1#comment-35881" target="_blank">http://www.rationalsurvivability.com/blog/?p=1672&amp;cpage=1#comment-35881</a>.</p>
<p style="margin-bottom: 0in;" align="LEFT">A good article that covers elasticity, hypervisors, and cloud security in general is located at <a title="Cloud Security" href="http://queue.acm.org/detail.cfm?id=1794516" target="_blank">http://queue.acm.org/detail.cfm?id=1794516</a>.  The queue.acm.org site is maintained by the Association for Computing Machinery. There are lots of articles on all sorts of computing topics including, “Why Cloud Computing Will Never Be Free” (<a title="Cloud Computing Will Never Be Free" href="http://queue.acm.org/detail.cfm?id=1772130" target="_blank">http://queue.acm.org/detail.cfm?id=1772130</a>).</p>
<p style="margin-bottom: 0in;" align="LEFT"><strong>The Clouds</strong></p>
<p style="margin-bottom: 0in;" align="LEFT">The most notable Clouds are Amazon&#8217;s Elastic Cloud, Google&#8217;s App Engine, and Microsoft&#8217;s Azure.</p>
<p style="margin-bottom: 0in;" align="LEFT">The three Cloud delivery models include:</p>
<ul>
<li>
<ul>
<li>
<p style="margin-bottom: 0in;" align="LEFT">Software as a service 		(SaaS), applications running on a cloud are accessed via a web 		browser</p>
</li>
<li>
<p style="margin-bottom: 0in;" align="LEFT">Platform as a service 		(PaaS), cloud-developed user applications such as databases</p>
</li>
<li>
<p style="margin-bottom: 0in;" align="LEFT">Infrastructure as a 		service (IaaS), provides computing resources to users on an 		as-needed basis</p>
<p style="margin-bottom: 0in;" align="LEFT">
</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;" align="LEFT"><strong>Pros and Cons </strong></p>
<p style="margin-bottom: 0in;" align="LEFT">There are pros and cons for Cloud Computing.   <span style="font-style: normal;">Microsoft&#8217;s Bill Ballmer is a proponent of Cloud computing.</span></p>
<p style="margin-bottom: 0in; font-style: normal;" align="LEFT">In a recent email (<a title="Ballmer Email" href="http://blog.seattlepi.com/microsoft/archives/196793.asp" target="_blank">http://blog.seattlepi.com/microsoft/archives/196793.asp</a>) to Microsoft&#8217;s employees, Ballmer make the following case for Cloud Computing.  He advises his employees to watch a video (<a title="Ballmer Cloud Video" href="http://www.microsoft.com/presspass/presskits/cloud/videogallery.aspx" target="_blank">http://www.microsoft.com/presspass/presskits/cloud/videogallery.aspx</a>) in which he makes the following points.</p>
<p style="font-style: normal;" align="LEFT">In my speech, I outlined the five dimensions that define the way people use and realize value in the cloud:</p>
<ul>
<li>
<p style="margin-bottom: 0in;">The cloud creates opportunities 	and responsibilities</p>
</li>
<li>
<p style="margin-bottom: 0in;">The cloud learns and helps you 	learn, decide and take action</p>
</li>
<li>
<p style="margin-bottom: 0in;">The cloud enhances your social and 	professional interactions</p>
</li>
<li>
<p style="margin-bottom: 0in;">The cloud wants smarter devices</p>
</li>
<li>The cloud drives server advances that drive the cloud</li>
</ul>
<p style="margin-bottom: 0in;" align="LEFT">Some very notable people are anti-cloud.</p>
<p style="margin-bottom: 0in;" align="LEFT">Richard Stallman, GNU software founder, said in recent interview for the London Guardian (<a title="Richard Stallman Interview" href="http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman" target="_blank">http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman</a>) that Cloud computing is a trap.</p>
<p style="margin-bottom: 0in;" align="LEFT">The Web-based programs like Google&#8217;s Gmail will force people to buy into locked, proprietary systems that will cost more and more over time, according to the free software campaigner.</p>
<p style="margin-bottom: 0in;" align="LEFT"><em>&#8216;It&#8217;s stupidity. It&#8217;s worse than stupidity: it&#8217;s a marketing hype campaign,&#8217; he told The Guardian. &#8216;Somebody is saying this is inevitable — and whenever you hear somebody saying that, it&#8217;s very likely to be a set of businesses campaigning to make it true.&#8217;&#8221;</em></p>
<p style="margin-bottom: 0in; font-style: normal;" align="LEFT">Aside from all that, what should a potential user be wary of in the Cloud?   I&#8217;ll try to answer that below.</p>
<p align="LEFT"><strong>Security in the Cloud</strong></p>
<p style="margin-bottom: 0in;" align="LEFT">Security in the cloud is a major concern.  Hackers are salivating because everything – applications, data, are all in the same place.</p>
<p style="margin-bottom: 0in;" align="LEFT">How do you know the node your process is accessing is real or virtual?  The Hypervisor (in Linux, a special version of the kernel) owns the hardware and spawns virtual nodes.  If the Hypervisor is hacked, the hacker owns all the nodes created by it.  http://www.linux-kvm. org has further explanations and discussions of virtual node creators/creations.</p>
<p style="margin-bottom: 0in;" align="LEFT">Data separation is a big concern.  Could your data become contaminated by data in other      environments in the cloud.?   What access restrictions are in place to protect sensitive data?</p>
<p style="margin-bottom: 0in;" align="LEFT">Can a user in another cloud environment inadvertently or intentionally get access to your data?</p>
<p style="margin-bottom: 0in;" align="LEFT">Data interoperability is another question mark.  A company cannot transfer data from a public cloud provider, such as Amazon, Microsoft, or Google, put it in a private IaasP that a private cloud provider develops for a company, and then copy that data from its private cloud to another cloud provider, public or private.    This is difficult because there are no standards for operating in this hybrid environment.</p>
<p style="margin-bottom: 0in;" align="LEFT"><strong>Data Ownership</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;" align="LEFT">Who is the custodian and who controls data if your company uses cloud providers, public and private?</p>
<p style="margin-bottom: 0in; font-weight: normal;" align="LEFT">Ownership concerns have no been resolved by the cloud computing industry.  At the same time, the industry has no idea when a standard will emerge to handle information exchanges.</p>
<p style="margin-bottom: 0in;" align="LEFT">W3C – <a title="W3c.org" href="http://www.w3.org/" target="_blank">http://www.w3.org/</a>, is sponsoring workshops and publishing proposals concerning standards for the Cloud.  You can subscribe to their weekly newsletter  and stay up on all sorts of web-based technologies.</p>
<p style="margin-bottom: 0in;" align="LEFT">Also, the Distributed Management Task Force  Inc.(<a title="Distributed Management Task Force" href="http://www.dmtf.org/home" target="_blank">http://www.dmtf.org/home</a>) is a consortium ofof IT companies focusing on, “Developing management standards &amp; promoting<br />
interoperability for enterprise &amp; Internet environments”.</p>
<p style="margin-bottom: 0in;" align="LEFT">The DMTF Open Cloud Standards Incubator  was launched to address management interoperability for Cloud Systems  (<a title="DMTF Cloud Incubator" href="http://www.dmtf.org/about/cloud-incubator" target="_blank">http://www.dmtf.org/about/cloud-incubator</a>).  The DMTF leadership board currently includes AMD,  CA Technologies, Cisco, Citrix Systems, EMC, Fujitsu, HP, Hitachi, IBM, Intel, Microsoft, Novell, Rack Space, RedHat, Savvis, Sun Guard, Sun Microsystems, and VMWare.</p>
<p style="margin-bottom: 0in;" align="LEFT"><strong>Working with the Cloud</strong></p>
<p style="margin-bottom: 0in;" align="LEFT">Working with the Cloud can be intimidating.  One suggestion is to build a private cloud in-house before moving on to the public cloud.</p>
<p style="margin-bottom: 0in;" align="LEFT">However, even that has its difficulties. Not to worry, there are several tools available to ease the transition.</p>
<p style="margin-bottom: 0in;" align="LEFT">There is  a Cloud programming language – Bloom, developed at UC Berkeley by Dr. Joseph Hellerstein.  HPC In The Cloud has published an interview with Dr. Hellerstein at <a title="Bloom Programming Language" href="http://www.hpcinthecloud.com/features/Clouds-New-Language-Set-to-Bloom-92130384.html?viewAll=y" target="_blank">http://www.hpcinthecloud.com/features/Clouds-New-Language-Set-to-Bloom-92130384.html?viewAll=y </a></p>
<p style="margin-bottom: 0in;" align="LEFT">Bloom is based on Hadoop (<a title="Hadoop" href="http://hadoop.apache.org/" target="_blank">http://hadoop.apache.org</a>) which is open source software for High Performace Computing (HPC) from Apache..</p>
<p style="margin-bottom: 0in;" align="LEFT">For ease of inter connectivity, Apache has released Apache libcloud, a standard client library written in python for many popular cloud providers – <a title="Apache libcloud" href="http://incubator.apache.org/libcloud/index.html" target="_blank">http://incubator.apache.org/libcloud/index.html</a>. But libcloud doesn&#8217;t cover data standards, just connectivity.</p>
<p style="margin-bottom: 0in;" align="LEFT">MIT StarCluster– <span style="font-weight: normal;"><a title="MIT Star Cluster" href="http://web.mit.edu/stardev/cluster" target="_blank">http://web.mit.edu/stardev/cluster</a> , </span><span style="color: #333333;"><span style="font-weight: normal;">is an open source utility for creating and managing general purpose computing clusters hosted on Amazon&#8217;s Elastic Compute Cloud (EC2). StarCluster minimizes the administrative overhead associated with obtaining, configuring, and managing a traditional computing cluster used in research labs or for general distributed computing applications.</span></span><span style="color: #fe0000;"><span style="font-weight: normal;"> </span></span></p>
<p>All that&#8217;s needed to get started with your own personal computing cluster on EC2 is an Amazon AWS account and StarCluster.</p>
<p style="margin-bottom: 0in;" align="LEFT"><span style="font-weight: normal;">HPC presents use cases as a means to understanding cloud computing.  <a title="HPC Use Cases" href="http://www.hpcinthecloud.com/features/25-Sources-for-In-Depth-HPC-Cloud-Use-Cases-93886489.html" target="_blank">http://www.hpcinthecloud.com/features/25-Sources-for-In-Depth-HPC-Cloud-Use-Cases-93886489.html</a>.</span></p>
<p style="margin-bottom: 0in;" align="LEFT"><span style="font-weight: normal;">BCM Bioinformatics has a new methodology article – Cloud Computing for Comparative Genomics that includes a cost analysis of using the cloud.  Download the .pdf at <a title="BCM Bioinformatics" href="http://www.biomedcentral.com/1471-2105/11/259/abstract" target="_blank">http://www.biomedcentral.com/1471-2105/11/259/abstract</a>.</span></p>
<p style="margin-bottom: 0in;" align="LEFT">I hope this will get you started.  Once again, a big thanks to Bill for his assistance.</p>
<p style="margin-bottom: 0in; font-weight: normal;" align="LEFT">
<p style="margin-bottom: 0in; font-weight: normal;" align="LEFT">
<p style="margin-bottom: 0in; font-weight: normal;" align="LEFT">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/05/what-is-old-is-new-again-the-cloud/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Effective Bioinformatics Programming - Part 5</title>
		<link>http://blog.lifeformulae.com/2010/04/effective-bioinformatics-programming-part-5/</link>
		<comments>http://blog.lifeformulae.com/2010/04/effective-bioinformatics-programming-part-5/#comments</comments>
		<pubDate>Tue, 06 Apr 2010 17:55:37 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[BMC]]></category>

		<category><![CDATA[EBI]]></category>

		<category><![CDATA[eWEEk.com]]></category>

		<category><![CDATA[GenBank]]></category>

		<category><![CDATA[Gene Ontology]]></category>

		<category><![CDATA[MIAME]]></category>

		<category><![CDATA[MIGS]]></category>

		<category><![CDATA[multi-processor programming]]></category>

		<category><![CDATA[scientific programming]]></category>

		<category><![CDATA[Sequence Ontology]]></category>

		<category><![CDATA[supercomputing]]></category>

		<category><![CDATA[WSDL]]></category>

		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=187</guid>
		<description><![CDATA[First, a little irony.  In the late &#8217;90&#8217;s I interviewed with BMC software in Houston.  At that time, BMC was a supporter of big iron, providing report facilities, etc.
When asked what software I currently used, I replied with “GNU software”.  The interviewer asked, “What is GNU? I&#8217;ve never heard of it.”
I explained [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">First, a little irony.  In the late &#8217;90&#8217;s I interviewed with BMC software in Houston.  At that time, BMC was a supporter of big iron, providing report facilities, etc.</p>
<p style="margin-bottom: 0in;">When asked what software I currently used, I replied with “GNU software”.  The interviewer asked, “What is GNU? I&#8217;ve never heard of it.”</p>
<p style="margin-bottom: 0in;">I explained that it was free software that you could download from the web, etc. But they weren&#8217;t really interested.</p>
<p style="margin-bottom: 0in;">Anyway,   eWEEK.com had a feature this week  - &#8216;MindTouch Names 20 Most Powerful Open-Source Voices of 2010.  The first name mentioned was William Hurley.  The chief architect of Open Source strategy at BMC. (<a title="William Hurley" href="http://www.eweek.com/c/a/IT-Management/OSBC-Names-20-Most-Powerful-Open-Source-Voices-of-2010-758420/?kc=EWKNLEDP03232010A" target="_blank">http://www.eweek.com/c/a/IT-Management/OSBC-Names-20-Most-Powerful-Open-Source-Voices-of-2010-758420/?kc=EWKNLEDP03232010A</a>).</p>
<p style="margin-bottom: 0in;">I guess they&#8217;re interested now.</p>
<p style="margin-bottom: 0in;"><strong>Data Standards</strong></p>
<p style="margin-bottom: 0in;">There are any number of sequence data formats. This link at EBI – <a title="EBI Tutorials" href="http://www.ebi.ac.uk/2can/tutorials/formats.html " target="_blank">http://www.ebi.ac.uk/2can/tutorials/formats.html </a>describes several.</p>
<p style="margin-bottom: 0in;">What is really astounding is that most of these formats have remained to same over the years. The tab-delimited and CSV (comma separated values) format is as prolific as ever, as is the GenBank report.</p>
<p style="margin-bottom: 0in;">And equally astonishing is the fact that manipulating the data (e.g. parsing GenBank reports) is still the same.</p>
<p style="margin-bottom: 0in;">True, the Bio libraries such as BioPerl, BioJava, BioRuby, now provide modules that make this easier, (if you can install them) but it is still the same old download and parse.</p>
<p style="margin-bottom: 0in;">There are also several groups trying to standardize sequence data.   The SO (Sequence Ontology) group (<a title="Sequence Ontology" href="http://www.sequenceontology.org" target="_blank">http://www.sequenceontology.org</a>) is trying to do for sequence annotations what GO (Gene Ontology - <a title="Gene Ontology" href="http://www.geneontology.org" target="_blank">http://www.geneontology.org</a>) did for genes and gene product attributes.</p>
<p style="margin-bottom: 0in;">MIGS (Minimum Information About A Genome Sequence spec at <a title="MIGS" href="http://nora.nerc.ac.uk/5548/" target="_blank">http://nora.nerc.ac.uk/5548/</a>) is following the course of the MAGE MIAME Standard  (Minimum Information About a Microarray Experiment at <a title="MIAME" href="http://www.mged.org/Workgroups/MIAME/miame.html" target="_blank">http://www.mged.org/Workgroups/MIAME/miame.html</a>).  Good luck with that, as many scientists have openly voiced objections to that standard.</p>
<p style="margin-bottom: 0in;"><strong>XML and the Web </strong></p>
<p style="margin-bottom: 0in;">XML (eXtensible Markup Language) and WSDL (Web Services Description Language) are one method of easing the interchange of data.  Links at – <a title="XML" href="http://en.wikipedia.org/wiki/XML" target="_blank">http://en.wikipedia.org/wiki/XML</a> and <a title="WSDL" href="http://en.wikipedia.org/wiki/Web_Services_Description_Language" target="_blank">http://en.wikipedia.org/wiki/Web_Services_Description_Language</a>.</p>
<p style="margin-bottom: 0in;">There are a number of drawbacks to this setup.</p>
<p style="margin-bottom: 0in;">Not all of the sequence data is available in XML or well-formed XML.</p>
<p style="margin-bottom: 0in;">Some XML, such as NCBI XML, needs further interpretation.  For example, the sequence feature (annotation) locations must be “translated” for further use.</p>
<p style="margin-bottom: 0in;">XSLT has performance issues, and is size-delimited. We tried processing LARTS converted NCBI ASN.1 GenBank XML data to XSLT and found there were definite size limitations.</p>
<p style="margin-bottom: 0in;">Using WSDL means exposing yourself to the world via the web.</p>
<p style="margin-bottom: 0in;">Javascript has too many security questions to consider seriously.</p>
<p style="margin-bottom: 0in;"><strong>Software Development</strong></p>
<p style="margin-bottom: 0in;">Software development takes time and the right people.  True, there is a lot of open source software out there, but I&#8217;ve mentioned the perils of that method in a previous blog.</p>
<p style="margin-bottom: 0in;">A scientist with a grant to produce results dependent on computer analysis is only going to write code that is good enough to create code (or find someone (read post-doc) who can create that code very cheaply)  that will back up those findings.</p>
<p style="margin-bottom: 0in;">Has the code been extensively tested? Are the results produced by the code valid?  Can the code be used by future projects?  Is the software portable?  Is it robust?  Can it be ported to different hardware environments?</p>
<p style="margin-bottom: 0in;">There is a great article – “Are we taking supercomputing code seriously?” at (<a title="Supercomputing Code" href="http://www.zdnet.co.uk/news/it-strategy/2010/01/28/are-we-taking-supercomputing-code-seriously-40004192/" target="_blank">http://www.zdnet.co.uk/news/it-strategy/2010/01/28/are-we-taking-supercomputing-code-seriously-40004192/</a>).  This article, in turn, has links to other articles on methods and algorithms, and error behavior, for example. This one on scientific software considers how multi-processing has influenced algorithm development and the problem of different multi-processors co-existing on the same machine (<a title="Scientific Programming" href="http://www.scientific-computing.com/features/feature.php?feature_id=262" target="_blank">http://www.scientific-computing.com/features/feature.php?feature_id=262</a>).</p>
<p style="margin-bottom: 0in;">He states that in the rush to do science, scientists fail to spot software for what it is: the analogue of the experimental instrument.  Therefore the software must be treated with the same respect that a physical experiment would.</p>
<p style="margin-bottom: 0in;">When I started my career, I worked on a system that was a totally integrated database system for hospitals. It was one of those systems that was so very ahead of its time (mid-80&#8217;s), that a corporation bought the product and squashed it.</p>
<p style="margin-bottom: 0in;">Anyway, our Systems and Extensions group supported the 6 compilers that comprised the system software that made the system function.  The tailoring group wrote the code that created the screens that drove the system.</p>
<p style="margin-bottom: 0in;">At the inception of the system, a decision was to be made over the make up of the tailoring group: should they be programmers  that would be taught medical jargon, terms, etc; or should they be medical personnel – doctors, nurses, techs, that would be taught programming?</p>
<p style="margin-bottom: 0in;">The decision was to go with medical personnel, as it was surmised they would understand hospitals better.</p>
<p style="margin-bottom: 0in;">At the same time, a decision to limit the number of screens a hospital could request (called tailoring) to 500 was discussed.  The decision was to let the hospital have however many screens it wanted.</p>
<p style="margin-bottom: 0in;">The tailoring group got their training and set in to programming.  After a period of time, it was realized that the group had, in essence, created one bad program and copied it thousands of times.</p>
<p style="margin-bottom: 0in;">It was so bad, we did two things.  We created a program profiler that produced a performance summary of the programming aspects of that program. (We were immediately asked to remove it by the tailoring group, as it was too confusing.)  Two, we created an automated programming module that would create the code from the display widgets on the screen designed by the tailoring group.</p>
<p style="margin-bottom: 0in;">This approach was helping, but people were abandoning ship as talk of an acquisition was surfacing.  Our junior programmer went from new-hire to senior team member in 30 days.</p>
<p style="margin-bottom: 0in;">I think we would have done a lot better with programmers learning medical terms.</p>
<p style="margin-bottom: 0in;">As for the hospital screen limit, we had hospitals with 10,000 individual screens. We should have stuck with 500.</p>
<p style="margin-bottom: 0in;">One last thing.  When looking at any piece of scientific programming, please realize that in the Authors accreditation usually starts with the PI.  The people who did the actual work are generally listed at the end of the line. The PI may have had the idea, but likely as not could not code it.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/04/effective-bioinformatics-programming-part-5/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Effective Bioinformatics Programming Part 4</title>
		<link>http://blog.lifeformulae.com/2010/03/effective-bioinformatics-programming-part-4/</link>
		<comments>http://blog.lifeformulae.com/2010/03/effective-bioinformatics-programming-part-4/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 19:46:20 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[Cilk]]></category>

		<category><![CDATA[Folding@home]]></category>

		<category><![CDATA[FPGA]]></category>

		<category><![CDATA[GPU]]></category>

		<category><![CDATA[grid computing]]></category>

		<category><![CDATA[InifiniBand]]></category>

		<category><![CDATA[LabView FPGA]]></category>

		<category><![CDATA[multicore processing]]></category>

		<category><![CDATA[Myrinet]]></category>

		<category><![CDATA[NoSQL]]></category>

		<category><![CDATA[ORM]]></category>

		<category><![CDATA[parallel processing]]></category>

		<category><![CDATA[Postgres]]></category>

		<category><![CDATA[QsNet]]></category>

		<category><![CDATA[SETI@home]]></category>

		<category><![CDATA[UML]]></category>

		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=169</guid>
		<description><![CDATA[All Things HPC 
Traditionally, High Performance Computing (HPC) means using high-end hardware like super computers to perform complex computational tasks.
A new definition of HPC (“High Productivity Computing”) means the entire processing and data handling infrastructure.  This includes  software tools, platforms (computer hardware and operation systems), and data management software.
Parallel or Multicore Processing
I think [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><span style="font-style: normal;"><strong>All Things HPC </strong></span></p>
<p style="margin-bottom: 0in;">Traditionally, High Performance Computing (HPC) means using high-end hardware like super computers to perform complex computational tasks.</p>
<p style="margin-bottom: 0in;">A new definition of HPC (“High Productivity Computing”) means the entire processing and data handling infrastructure.  This includes  software tools, platforms (computer hardware and operation systems), and data management software.</p>
<p style="margin-bottom: 0in;"><strong>Parallel or Multicore Processing</strong></p>
<p style="margin-bottom: 0in;">I think just about everybody has performed some sort of parallel programming.  Starting two processes at once on the same machine is parallelism.  If the program runs by itself and doesn&#8217;t need input from another program or product output for another program to use, it&#8217;s loosely coupled.  It&#8217;s tightly coupled if one program feeds another.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">PC architecture today supports multicore processors.  A two-core CPU is, in essence, two CPUs on the same chip.  These cores may share memory cache (tightly coupled) or not (loosely coupled).  They may implement a method of message passing – intercore communication.</p>
<p style="margin-bottom: 0in;">Cilk is a language for multi-threaded parallel processing based on ANSI C.  MIT was the initial developer of the Cilk technology.  The link to their page is at – <a title="Cilk Project at MIT" href="http://supertech.csail.mit.edu/cilk/" target="_blank">http://supertech.csail.mit.edu/cilk/.</a></p>
<p style="margin-bottom: 0in;">MIT had licensed Cilk to Cilk Arts, Inc.  Cilk Arts added support for C++, parallel loops, and interoperability with serial interfaces. The product has since been acquired by Intel and will be incorporated into the Intel C++ compiler. The Intel page is at  <a title="Intel Software Cilk++ Page" href="http://software.intel.com/en-us/articles/intel-cilk/" target="_blank">http://software.intel.com/en-us/articles/intel-cilk/</a>.</p>
<p style="margin-bottom: 0in;">Cilk++ makes multicore processing easy.  CILK++ uses keywords to adapt existing C++ code to multicore processing. (You will need a multi-core processor).</p>
<p style="margin-bottom: 0in;">Cilk++ is currently is a technical preview state.    This means they want you to use it and give them feedback. Download the Intel CILK++ SDK at <a title="Cilk++ Download Info" href="http://software.intel.com/en-us/articles/download-intel-cilk-sdk/" target="_blank">http:</a><a title="Cilk++ Download Info" href="//software.intel.com/en-us/articles/download-intel-cilk-sdk/" target="_blank">//software.intel.com/en-us/articles/download-intel-cilk-sdk/</a>. You will need to sign a license agreement.</p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">The page also presents download links for 32-bit and 64-bit Linux Cilk++. (You will need an Intel processor for the Linux apps.)</span></span></p>
<p style="margin-bottom: 0in;">There is an e-book on Multi-Processor programming available from Intel. The link is - <a title="Multi-Processor Programming Book" href="http://software.intel.com/en-us/articles/e-book-on-multicore-programming/" target="_blank">http://software.intel.com/en-us/articles/e-book-on-multicore-programming/</a>.</p>
<p style="margin-bottom: 0in;"><a href="http://www.cilk.com/home/try-and-buy-cilk/download-cilk/"><span style="color: #800000;"><span style="text-decoration: none;"> </span></span></a><span style="text-decoration: none;">The book contains a lot of information on multicore programming, parallelism, scheduling theory, shared memory hardware, concurrency platforms, race conditions, divide and conquer recurrences, and others.</span></p>
<p style="margin-bottom: 0in;">Grid computing is distributed, large scale, cluster computing. Two of the most famous grid projects are SETI@home and Folding@home (<a title="Folding@home" href="http://folding.stanford.edu" target="_blank">http://folding.stanford.edu</a>).</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">SETI (the search for extra-terrestrial Intelligence) at home uses internet connected computers hosted by Space Sciences Laboratory at the UC, Berkeley.  Folding at home focuses on how proteins (biology&#8217;s workhorses) fold or assemble themselves to carry out important functions. </span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">Other lesser known grids are einstein@home (<a title="Einstein@home" href="http://www.einsteinathome.org" target="_blank">http://www.einsteinathome.org</a> - &#8220;Grab a wave from Space&#8221;) processing data from gravational wave detectors, and MilkyWay@home (<a title="MilkyWay@home" href="http://milkyway.cs.rpi.edu/milkyway" target="_blank">http://milkyway.cs.rpi.edu/milkyway</a>) creating a highly accurate 3-D model of the Milky Way Galaxy.</span></span></p>
<p style="margin-bottom: 0in;"><a href="mailto:SETI@home"><span style="color: #000000;"><span style="text-decoration: none;"><strong>Communication</strong></span></span></a></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">The clusters mentioned above use the internet to exchange messages.  If fast messaging is not required, plain old ethernet should be sufficient for your messaging needs,  The problem with ethernet is latency. It takes a long time to set up and get that first message out there. After that, it&#8217;s solid.</span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">But if you&#8217;re looking for constant speed try Infiniband (<a title="InfiniBand" href="http://en.wikipedia.org/wiki/Infiniband" target="_blank">http://en.wikipedia.org/wiki/Infiniband</a>), Myriet (<a title="Myrinet" href="http://www.myri.com" target="_blank">www.myri.com</a>), or QsNet (<a title="QsNet" href="http://en.wikipedia.org/wiki/QsNet" target="_blank">http://en.wikipedia.org/wiki/QsNet</a>).</span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;"><strong>Gamers</strong></span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">Oh, those gamers.  Without their demand for faster, bigger, better, where would we be?</span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">For example, do not overlook the gaming console. NCSA (National Center for Supercomputing Applications) has a cluster of Sony PlayStations.  The PlayStation 3 runs Yellow Dog Linux. The average PS3 retails for around $600. The Folding@home grid runs on PS3s and PCs.</span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">Then we come to the <strong>GPU</strong> (Graphics Processing Unit).  GPU computing means using the GPU to do general purpose scientific and engineering computing. The model for GPU computing couples a CPU with a GPU, with the GPU performing the heavy processing.    (<a title="GPU Programming" href="http://www.nvidia.com/object/GPU_Computing.html" target="_blank">http://www.nvidia.com/object/GPU_Computing.html</a>) </span></span></p>
<p style="margin-bottom: 0in;"><span style="color: #000000;"><span style="text-decoration: none;">One of the hottest GPU&#8217;s is the NVIDIA Tesla GPU which is based on the  CUDA GPU architecture code-named the &#8220;Fermi&#8221;. </span></span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;"><strong>FPGAs (Field Programmable Gate Arrays)</strong></span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">Technological devices keep getting smaller and smaller, and the machinery gets buried under tons of software burdened with the menu systems connected to the development environment from hell. </span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">FPGAs take you back to the schematic level. (I was known as a &#8220;bit-twiddler&#8221; at IBM.)<br />
</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">My old friends at National Instruments (<a title="National Instruments FPGA" href="http://www.ni.com/fpga/" target="_blank">http://www.ni.com/fpga/</a>) have <strong>NI LabView FPGA</strong>.  LabView FPGA provides graphical programming of FPGAs. </span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">Their video on FPGA Technology is a good intro to FPGA. (<a title="FPGA Technology" href="http://www.ni.com/fpga_technology/" target="_blank">http://www.ni.com/fpga_technology/</a>). Several other videos are available at this same sight go into further detail. For more info on the FPGA hardware see <a title="FPGA Hardware" href="http://en.wikipedia.org/wiki/FPGA" target="_blank">http://en.wikipedia.org/wiki/FPGA</a>.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">(I still haven&#8217;t  forgiven NI for nuking my data acquisition PC with their demo. I lost a lot of stuff. All was backed up, but re-installing was not fun.)</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">FYI -The industry is desperately seeking parallel and FPGA programmers. </span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;"><strong>Data Representation in Database Design</strong></span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">The most recent programming languages are object-oriented.  However, the most efficient databases are relational.  There are object-oriented database systems, but for the most part they are very expensive and very, very slow. Postgres is a RDBS (Relational Database Management System) that does implement a form of inheritance where one table may extend (inherit) another table.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">Then you have XML.  XML Schemas are adding another dimension to this complexity.  XML is popular for communication (SOAP) and representation (XSLT). Data comes from an RDMS, gets stuffed into objects, translated to XML on one end and sent as XML, translated to objects, and stored in a RDMS at the other end.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">The mapping of objects to RDBMS is known as object relational (O/R) impedance mismatch. See this link for a discussion (<a title="AgileData" href="http://www.agiledata.org./" target="_blank">http://www.agiledata.org./</a>) of software development processes and a link to a recent book on database techniques for mapping objects to relational datbases – <a title="Database Techniques" href="http://www.agiledata.org/essays/mappingObjects.html" target="_blank">http://www.agiledata.org/essays/mappingObjects.html</a>.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">But beware, as most of these ORMs (Object to Relational Mapping) sometimes produce a schema that wouldn&#8217;t be completely relational and therefore suffer in performance.  Also, the SQL produced by ORMs may not be optimal.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">To effectively design and develop a RDBMS , learn <strong>UML</strong> (Universal Modeling Language).  The Objects By Design web site (<a title="Objects By Design" href="http://www.objectsbydesign.com" target="_blank">http://www.objectsbydesign.com</a>) covers UML and a lot of other object-oriented topics and is worth a look.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">Rational Rose is the UML design tool that I use.  It&#8217;s now been purchased by IBM.  Rational uses what is known as the Rational Unified method, </span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">Speaking of XML, some of the UML design tools can now output XML directly from the data record definitions.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">See this link for a list of current UML products – <a title="Current UML Products" href="http://www.objectsbydesign.com/tools/umltools_byCompany.html" target="_blank">http://www.objectsbydesign.com/tools/umltools_byCompany.html</a>.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;"><strong>The End of SQL</strong></span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">The ComputerWorld blog sight has an interesting 3-part series entitled – <strong>The End of SQL and relational databases? </strong></span></p>
<p style="margin-bottom: 0in; font-weight: normal; text-decoration: none;"><span style="color: #000000;">Part 1 covers Relational Methodology and SQL.The link to part I is here – <a title="NoSQL Part 1" href="http://blogs.computerworld.com/15510/the_end_of_sql_and_relational_databases_part_1_of_3" target="_blank">http://blogs.computerworld.com/15510/the_end_of_sql_and_relational_databases_part_1_of_3</a>.</span></p>
<p style="margin-bottom: 0in; font-weight: normal; text-decoration: none;">
<p style="margin-bottom: 0in; font-weight: normal; text-decoration: none;"><span style="color: #000000;">Part 2 is a list of current NoSQL databases. The link to part 2 is here – <a title="NoSQL Part 2" href="http://blogs.computerworld.com/15556/the_end_of_sql_and_relational_databases_part_2_of_3" target="_blank">http://blogs.computerworld.com/15556/the_end_of_sql_and_relational_databases_part_2_of_3</a></span></p>
<p style="margin-bottom: 0in; font-weight: normal; text-decoration: none;"><span style="color: #000000;">Part 3 is a list of links to NoSQL sites, articles, and blog posts. The link to part 3 is here - <a title="NoSQL Part 3" href="http://blogs.computerworld.com/15641/the_end_of_sql_and_relational_databases_part_3_of_3" target="_blank">http://blogs.computerworld.com/15641/the_end_of_sql_and_relational_databases_part_3_of_3</a> </span></p>
<p style="margin-bottom: 0in; font-weight: normal; text-decoration: none;">
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;"><span style="font-weight: normal;">In short, the “NoSQL” (<a title="NoSQL " href="http://en.wikipedia.org/wiki/NoSQL" target="_blank">http://en.wikipedia.org/wiki/NoSQL</a>) movement and cloud-based data stores are striving to completely remove developers from </span>a reliance on SQL and relational databases.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;">In a post-relational world, they argue that a distributed, context-free key-value store is probably the way to go.  This makes sense when are can be thousands of sequence searchers, but only one updater. A transactional database would be overkill.</span></p>
<p style="margin-bottom: 0in; text-decoration: none;"><span style="color: #000000;"><strong>Part 5 </strong>of <strong>Effective Bioinformatics Programming</strong> coming soon..</span></p>
<p style="margin-bottom: 0in; text-decoration: none;">
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in; text-decoration: none;">
<p style="margin-bottom: 0in;"><a href="mailto:SETI@home"><span style="color: #000000;"><span style="text-decoration: none;"> </span></span></a></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;" align="CENTER">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/03/effective-bioinformatics-programming-part-4/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Effective Bioinformatics Programming - Part 3</title>
		<link>http://blog.lifeformulae.com/2010/02/effective-bioinformatics-programming-part-3/</link>
		<comments>http://blog.lifeformulae.com/2010/02/effective-bioinformatics-programming-part-3/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 20:46:30 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[bash]]></category>

		<category><![CDATA[cygwin]]></category>

		<category><![CDATA[Eclipse]]></category>

		<category><![CDATA[Git]]></category>

		<category><![CDATA[Hungarian Notation]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[NetBeans]]></category>

		<category><![CDATA[OS X]]></category>

		<category><![CDATA[rsync]]></category>

		<category><![CDATA[Snapshot backup]]></category>

		<category><![CDATA[Unix]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=162</guid>
		<description><![CDATA[ All Things Unix
Bioinformatics started with Unix.  At the Human Genome Center, for a long time, I had the one and only PC.  (We got a request from our users for a PC-based client for the Search Launcher). Everything else was Solaris (Unix) and Mac, which was followed by Linux.
Unix supports a number [...]]]></description>
			<content:encoded><![CDATA[<p><img src="file:///tmp/moz-screenshot-2.png" alt="" /> <strong>All Things Unix</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">Bioinformatics started with Unix.  At the Human Genome Center, for a long time, I had the one and only PC.  (We got a request from our users for a PC-based client for the Search Launcher). Everything else was Solaris (Unix) and Mac, which was followed by Linux.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Unix supports a number of nifty commands like <strong>grep</strong>, <strong>strings</strong>, <strong>df</strong>, <strong>du</strong>, <strong>ls, </strong>etc.  These commands are run inside the shell, or command line interpreter, for the operating system (Unix).   There have been a number of these shells in the history of Unix development.</p>
<p>The <strong>bash </strong>shell <a title="bash" href="http://en.wikipedia.org/wiki/Bash" target="_blank">http://en.wikipedia.org/wiki/Bash</a> is the default shell for the <strong>Linux</strong> environment.  This shell provides several unique capabilities over other shells.  For instance, <strong>bash</strong> supports a history buffer of system commands.  With the history buffer, the “up” arrow will return the previous command.  The <strong>history</strong> command lets you view a history of past commands.   The <strong>bang </strong>operator (!) lets you rerun a previous command from the history buffer.     (Which saves a lot of typing!)</p>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>bash </strong>enables a user to redirect program output. The pipeline feature allows the user to connect a series of commands.  With the pipeline (“|”) operator, a chain of commands can be linked together where the output of one command is the input to the next command an so forth.</p>
<p style="margin-bottom: 0in; font-weight: normal;">A <strong>shell script</strong> (<a title="Shell Script" href="http://en.wikipedia.org/wiki/Shell_script" target="_blank">http://en.wikipedia.org/wiki/Shell_script</a>) is script written for the shell or command line interpreter.  Shell scripts enable batch processing. Together with the <strong>cron</strong> command, these scripts can be set to run automatically at times when system usage is minimum.</p>
<p style="margin-bottom: 0in; font-weight: normal;">For  general information about bash, go to the Bash Reference Manual at <a title="Bash Reference Manual" href="http://www.gnu.org/software/bash/manual/bashref.html" target="_blank">http://www.gnu.org/software/bash/manual/bashref.html</a><a title="Bash Reference Manual" href="http://www.gnu.org/software/bash/manual/bashref.html" target="_blank">.</a></p>
<p style="margin-bottom: 0in; font-weight: normal;">A whole wealth of bash shell script examples is available at - <a title="Bash examples" href="http://tldp.org/LDP/abs/html/" target="_blank">http://tldp.org/LDP/abs/html/</a>.</p>
<p style="margin-bottom: 0in;"><strong>Unix on Other Platforms</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>Cygwin </strong>(<a title="Cygwin" href="http://www.cygwin.com/" target="_blank">http://www.cygwin.com/</a>) is a Linux-like environment for windows.  The basic download installs a minimum environment, but you can add additional packages at any time.  Go to <a title="Cygwin Packages" href="http://cygwin.com/packages/ " target="_blank">http://cygwin.com/packages/</a><a title="Cygwin Packages" href="http://cygwin.com/packages/ " target="_blank"> </a>for a list of Cygwin packages available for download.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Apple&#8217;s <strong>OS X </strong>is based on Unix.  Other than the MACH kernel, the OS is BSD-derived.  Their Java package is usually not the latest as Apple has to port Java due to differences such as the graphics portion.</p>
<p style="margin-bottom: 0in; font-weight: normal;">
<p style="margin-bottom: 0in;"><strong>All Things Software – Documenting and Archiving</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">I&#8217;ve run into all sorts of approaches to program code documentation in my career.    A lead engineer  demanded that every line of assembler code be documented.  A senior programmer insisted that code should be self-documenting.</p>
<p style="margin-bottom: 0in; font-weight: normal;">By that, she used variable names such as save_the_file_to_the_home_directory, and so on.  Debugging these programs was a real pain.  The first thing you had to do was set up aliases for all the unwieldy names.</p>
<p style="margin-bottom: 0in; font-weight: normal;">The <strong>FORTRAN </strong>programmers cried when variable names longer than 6 characters were allowed in version 77 of VAX FORTRAN..  Personally, I thought it was great.  The same with IMPLICIT NONE.</p>
<p style="margin-bottom: 0in; font-weight: normal;">In the ancient times, FORTRAN integers variables had to start with i thru n.  Real variables could use the other letters.  The IMPLICIT NONE directive told the compiler to shut that off.</p>
<p style="margin-bottom: 0in; font-weight: normal;">All FORTRAN variables had to be in capital letters.  But you could stuff strings into integer variables which I found extremely useful.  All FORTRAN statements had to begin with a number.  This number usually started at 10 and went up in increments of 10.</p>
<p style="margin-bottom: 0in; font-weight: normal;">At one time Microsoft used Hungarian notation (<a title="Hungarian Notation" href="http://en.wikipedia.org/wiki/Hungarian_notation" target="_blank">http://en.wikipedia.org/wiki/Hungarian_notation</a>) for variables in most of their documentation.  In this method, the name of the variable indicated it&#8217;s use. For example, lAccountNumber was a long integer.</p>
<p style="margin-bottom: 0in; font-weight: normal;">The IDEs (<strong>Eclipse</strong>, <strong>NetBeans</strong>, and others) will automatically create the header comment with a list of variables.  The user just adds the proper definitions. (If you&#8217;re using Java, the auto comment is JavaDoc compatible, etc.)</p>
<p style="margin-bottom: 0in; font-weight: normal;">Otherwise, Java supports the JavaDoc tool, Python has PyDoc, and Ruby has RDoc.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Personally, I feel that software programs should be read like a book, with documentation providing the footnotes, such as an overview of what the code in question does and  a definition of the main variables for both input and output.  Module/Object documentation should also note who uses the function and why.  Keep variable names short but descriptive and make comments meaningful.</p>
<p style="margin-bottom: 0in;">Keep code clean, but don&#8217;t go overboard.  I worked with one programmer who stated, “My code is so clean you could eat off it.”   I found that a little too obnoxious, not to mention overly optimistic as a number of bugs popped out as time went by.<strong></strong></p>
<p style="margin-bottom: 0in;"><strong>Archiving Code</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">Version Control Systems (VCS) have evolved as source code projects became larger and more complex.</p>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>RCS</strong> (Revision Control System) meant that the days of the keeping the Emacs numbered files (e.g. foo.~1~) as backups were over.  RCS used the diff concept (just kept a list of the changes make to a file as a backup strategy).</p>
<p style="margin-bottom: 0in; font-weight: normal;">I found this unsuited for what I had to do – revert to an old version in a matter of seconds.</p>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>CVS </strong>was much, much better.  CVS was replaced by Subversion. But they&#8217;re centralized repository structure can create problems.  You basically check out what you want to work on from a library and check it back in when you&#8217;re done.  This can be a slow process depending on network usage or central server available.</p>
<p style="margin-bottom: 0in; font-weight: normal;">The current favorite is <strong>Git</strong>. Git was created by Linus Torvalds (of Linux fame).  Git is a free, open source distributed version control system. (<a title="Git" href="http://git-scm.com/" target="_blank">http://git-scm.com/)</a>.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Everyone on the project has a copy of all project files complete with revision histories and tracking capabilities. Permissions allow exchanges between users and merging to a central location is fast.</p>
<p style="margin-bottom: 0in; font-weight: normal;">The IDE&#8217;s (<strong>Eclipse</strong> and <strong>NetBeans</strong>) will have CVS and Subversion plug ins already configured for accessing those repositories.  NetBeans also supports Mercurical.  Plug ins for the other versioning software  modules are available on the web.  The Eclipse plug in for Git is available at <a href="http://git.wiki.kernel.org/index.php/EclipsePlugin"></a><a title="Git Eclipse Plug In" href="http://git.wiki.kernel.org/index.php/EclipsePlugin" target="_blank">http://git.wiki.kernel.org/index.php/EclipsePlugin</a>.</p>
<p style="margin-bottom: 0in;"><strong>System Backup </strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">Always have a plan B.  My plan A had IT backup my systems on a weekly to monthly basis based on usage.   A natural disaster completely decimated my systems.  No problem, I thought, I have system backup.  Imagine how I felt when I heard that IT had not archived a single on of my systems in over three years!  Well, I had a plan B.  I had a mirror of the most important stuff on an old machine and other media.  We were back up almost immediately.</p>
<p style="margin-bottom: 0in; font-weight: normal;">The early <strong>Tandem NonStop systems</strong> (now known as HP Integrity NonStop)  automatically mirrored your system in real-time, so down time was not a problem.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Real-time backup is expensive and unless you&#8217;re a bank or airline, it&#8217;s not necessary.</p>
<p style="margin-bottom: 0in;"><strong>Snapshot Backup on Linux with rsync</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">If you&#8217;re running Linux, Mac, Solaris, or any Unix-based system,  you can use <strong>rsync</strong> for generating automatic rotating “snapshot” style back-ups.  These systems generally have rsync already installed.  If not, the source is available at – http://rsync.samba.org/.</p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">This website - <a title="rsync Snapshots" href="http://www.mikerubel.org/computers/rsync_snapshots/ " target="_blank">http://www.mikerubel.org/computers/rsync_snapshots/</a><a title="rsync Snapshots" href="http://www.mikerubel.org/computers/rsync_snapshots/ " target="_blank"> </a>will tell you everything you need to know to implement rsync based backups, complete with sample scripts.</span></p>
<p style="margin-bottom: 0in;">Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.</p>
<p style="margin-bottom: 0in;"><strong>Acknowledgment – Thanks, Bill!</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">I want to thank Bill Eaton for his assistance with these blog entries on Effective Bioinformatics Programming.    He filled in a lot of the technical details, performed product analysis, and gave me direction in writing these blog entries.</p>
<p style="margin-bottom: 0in; font-weight: normal;"><strong>To Be Continued - Part 4</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">Part 4 will cover relational database management systems (RDBMS), HPC (high performance computing) - parallel processing, FPGC, clusters, grids, and other topics.</p>
<p style="margin-bottom: 0in; font-weight: normal;">
<p style="margin-bottom: 0in; font-weight: normal;">
<p style="margin-bottom: 0in; font-weight: normal;">
<p style="margin-bottom: 0in; font-weight: normal;">
<p style="margin-bottom: 0in; font-weight: normal;">
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/02/effective-bioinformatics-programming-part-3/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Effective Bioinformatics Programming - Part 2</title>
		<link>http://blog.lifeformulae.com/2010/01/effective-bioinformatics-programming-part-2/</link>
		<comments>http://blog.lifeformulae.com/2010/01/effective-bioinformatics-programming-part-2/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 20:29:35 +0000</pubDate>
		<dc:creator>Pam</dc:creator>
		
		<category><![CDATA[Bioinformatics]]></category>

		<category><![CDATA[Bioinformatics Organization]]></category>

		<category><![CDATA[BioJava]]></category>

		<category><![CDATA[BioPerl]]></category>

		<category><![CDATA[BioPython]]></category>

		<category><![CDATA[BioRuby]]></category>

		<category><![CDATA[Eclipse]]></category>

		<category><![CDATA[freshmeat]]></category>

		<category><![CDATA[IDE]]></category>

		<category><![CDATA[instrumentation]]></category>

		<category><![CDATA[NetBeans]]></category>

		<category><![CDATA[Open Bioinformatics Foundation]]></category>

		<category><![CDATA[OpenSource]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[Public Domain Manifesto]]></category>

		<category><![CDATA[SourceForge]]></category>

		<guid isPermaLink="false">http://blog.lifeformulae.com/?p=154</guid>
		<description><![CDATA[Effective Bioinformatics Programming – Part 2
Instrumentation Programming
Instrumentation Programming usually concerns computer control over the actions of an instrument and/or the streaming or download of data from the device.  Instrumentation in the Life Sciences covers data loggers, waveform data acquisition systems, pulse generators, image capture, and others used extensively in LIMS (Laboratory Information Management Systems), [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;"><strong>Effective Bioinformatics Programming – Part 2</strong></p>
<p style="margin-bottom: 0in;"><strong>Instrumentation Programming</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">Instrumentation Programming usually concerns computer control over the actions of an instrument and/or the streaming or download of data from the device.  Instrumentation in the Life Sciences covers data loggers, waveform data acquisition systems, pulse generators, image capture, and others used extensively in LIMS (Laboratory Information Management Systems), Spectroscopy, and other scientific arenas.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Most instruments are controlled by codes called “control codes”.     These codes are usually sent or received by a C/C++ program.  Some instrumentation manufacturers, however,  have a proprietary programming language that must be used to “talk” to the instrument.</p>
<p style="margin-bottom: 0in; font-weight: normal;">Some companies are nice enough to provide information on the structure of the data that comes from their instrument.   When they don&#8217;t you may have to use good old “reverse engineering”.  That&#8217;s where the Unix/Linux <strong>od</strong> utility comes in handy, because lots of time will be spent poring over hex dumps.</p>
<p style="margin-bottom: 0in; font-weight: normal;">As you can tell, programming instruments requires a lot of patience.  This is especially true if everything hangs or gets into a confused state.  There is nothing you can do but recycle the power to everything and start over.  This is usually accompanied by a  banging of keyboards and the muttering of a few choice words.</p>
<p style="margin-bottom: 0in;"><strong>Development Platforms or IDEs (Integrated Development Environment)</strong></p>
<p style="margin-bottom: 0in; font-weight: normal;">I have to mention development platforms as they can be useful, but also problematic.   My favorite is Eclipse (<a title="Eclipse IDE" href="http://eclipse.org" target="_blank">http://www.eclipse.org</a>).  Originating at IBM, Eclipse was supported by a consortium of software vendors.  Eclipse has now become the Eclipse open source community, supported by the Eclipse Foundation.</p>
<p style="margin-bottom: 0in;"><span style="font-weight: normal;">Eclipse is a development platform for programmers </span>comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the lifecycle.   You can find plug-ins that will enable you to accomplish just about anything you want to do.  A plug-in is an addition to the Eclipse platform that is not included in the base package, like an Eclipse memory manager or a debugging a Tomcat servlet.</p>
<p style="margin-bottom: 0in;">Sun offers NetBeans (“The only IDE you need.”).  I used NetBeans (<a title="NetBeans IDE" href="http://netbeans.org" target="_blank">http://netbeans.org</a>) at lot on the Mac.  Previously, Sun offered StudioOne and Creator. I used StudioOne (on Unix) and Creator (on Linux).  I haven&#8217;t worked with NetBeans lately because they&#8217;re currently mostly Swing-centric (GUI) development and are not fully JSF (java Server Faces) aware.  NetBeans will make a template for JSF but doesn&#8217;t (as yet) provide an easy way to create a JSF interface.</p>
<p style="margin-bottom: 0in;">There are two main problems with development platforms.  For one, the learning curve is fairly steep.  There area lot of tutorials and examples available, but you still have take the time to do it.</p>
<p style="margin-bottom: 0in;">The best way to use a development platform is to divide the work. One group does web content, one group does database, one group does middleware (the glue that holds everything together), etc.  Each group or person can then become knowledgeable in their area and move on or absorb other areas as needed.</p>
<p style="margin-bottom: 0in;">The second problem with these tools in that you are stuck with their developmental approach.</p>
<p style="margin-bottom: 0in;">You have to do things a certain way and adhere to a certain structure.  Flexibility can be a problem.</p>
<p style="margin-bottom: 0in;">This is especially true of interface building. You are stuck with the code the tool generates and the files and file structures created.     With most tools, you have to use that tool to access files that the tool created.</p>
<p style="margin-bottom: 0in;">IDEs can be useful in that they will perform mundane coding tasks for you.  For instance, given a database record, the IDE can use those table elements to generate web forms and the SQL queries driving those forms.  You can then expand the simple framework or leave as is.</p>
<p style="margin-bottom: 0in;"><strong>Open Source/Free Software and Bioinformatics Libraries</strong></p>
<p style="margin-bottom: 0in;">There a lot of good an not-so-good Open Source code out there for the Life Sciences.</p>
<p style="margin-bottom: 0in;">There are several “gotchas” to look out for, including &#8211;</p>
<p style="margin-bottom: 0in; text-align: center;">Is the code reliable?  Are others using it?  Are they having problems?</p>
<p style="margin-bottom: 0in; text-align: center;">Will the code run on your architecture? What will it take to install</p>
<p style="margin-bottom: 0in; text-align: center;">What kind of user support is available?  What&#8217;s the response time?</p>
<p style="margin-bottom: 0in; text-align: center;">Is there a mailing list available for the library, package, or project of interest?</p>
<p style="margin-bottom: 0in;">The are several bioinformatics software libraries available for various languages.  All of these libraries are OpenSource/Free Software.  Installing these libraries takes a little more that just downloading and uncompressing a  package.  There are “dependencies” (other libraries, modules, programs, and access to external sites) that must be resident or accessible before a complete build of these libraries is possible.</p>
<p style="margin-bottom: 0in;">The following is a list of the most popular libraries and  their respective dependencies.</p>
<p style="margin-bottom: 0in;"><strong>BioPerl 1.6.1</strong>:  Modules section of <a title="CPAN" href="http://www.cpan.org" target="_blank">http://www.cpan.org/</a></p>
<pre><span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Required modules:</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">perl               =&gt; 5.6.1</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">IO::String         =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">DB_File            =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Data::Stag         =&gt; 0.11</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Scalar::Util       =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">ExtUtils::Manifest =&gt; 1.52</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Required modules for source build:</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Test::More    =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Module::Build =&gt; 0.2805</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Test::Harness =&gt; 2.62</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">CPAN          =&gt; 1.81</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Recommended modules:  some of these have circular dependencies</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Ace                       =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Algorithm::Munkres        =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Array::Compare            =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Bio::ASN1::EntrezGene     =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Clone                     =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Convert::Binary::C        =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Graph                     =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">GraphViz                  =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">HTML::Entities            =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">HTML::HeadParser          =&gt; 3</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">HTTP::Request::Common     =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">List::MoreUtils           =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">LWP::UserAgent            =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Math::Random              =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">PostScript::TextBlock     =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Set::Scalar               =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">SOAP::Lite                =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Spreadsheet::ParseExcel   =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Spreadsheet::WriteExcel   =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Storable                  =&gt; 2.05</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">SVG                       =&gt; 2.26</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">SVG::Graph                =&gt; 0.01</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Text::ParseWords          =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">URI::Escape               =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::Parser               =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::Parser::PerlSAX      =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::SAX                  =&gt; 0.15</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::SAX::Writer          =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::Simple               =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::Twig                 =&gt; 0</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">XML::Writer               =&gt; 0.4</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Some of these modules such as SOAP::Lite depend upon many other</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">modules.</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;"><strong>BioPython 1.53</strong>:  <a title="BioPython" href="http://biopython.org" target="_blank">http://biopython.org/</a></span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Additional packages:</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">NumPy     (recommended) http://numpy.scipy.org/</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">ReportLab (optional)    http://www.reportlab.com/software/opensource/</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">MySQLdb   (optional)    May be in core Python distribution.</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;"><strong>BioRuby 1.4.0</strong>:  <a title="BioRuby" href="http://www.bioruby.org" target="_blank">http://www.bioruby.org/</a></span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">The base distribution is self-contained and uses the RubyGems installer.</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Optional packages.</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">RAA:xmlparser</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">RAA:bdb</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">RubyForge:ActiveRecord and at least one driver (or adapter) from</span></span>
   <span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">RubyForge:MySQL/Ruby, RubyForge:postgres-pr, or RubyForge:ActiveRecord</span></span>
   <span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Oracle enhanced adapter.</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">RubyForge:libxml-ruby (Ruby language bindings for the GNOME Libxml2 XML toolkit)</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;"><strong>BioJava 1.7.1</strong>:  <a title="BioJava" href="http://www.biojava.org" target="_blank">http://www.biojava.org/</a></span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">biojava-1.7.1-all.jar:  self-contained binary distribution with</span></span>
  <span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">all dependencies included.</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">biojava-1.7.1.jar:  bare distribution that requires the following additional</span></span>
  <span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">jar files.  These are required for building from source code.</span></span>
  <span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">Most are from http://www.apache.org/</span></span>

<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">bytecode.jar:                  required to run BioJava</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">commons-cli.jar:               used by some demos.</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">commons-collections-2.1.jar:   demos, BioSQL Access</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">commons-dbcp-1.1.jar:          legacy BioSQL access</span></span>
<span style="font-family: Liberation Serif,serif;"><span style="font-size: small;">commons-pool-1.1.jar:          legacy BioSQL access
jgraph-jdk1.5.jar:          NEXUS file parsing</span></span></pre>
<p style="margin-bottom: 0in;">Don&#8217;t forget to sign up for the mailing list for that library or libraries of interest to get the lastest news, problems, solutions, etc. for that library or just life science topics in general.</p>
<p style="margin-bottom: 0in;"><strong>Software Hosting and Indexing Sites</strong></p>
<p style="margin-bottom: 0in;">There are several Software Hosting and Indexing Sites that serve as software distribution points for bioinformatics software.</p>
<p style="margin-bottom: 0in;"><a title="SourceForge" href="http://sourceforge.net" target="_blank"><strong>SourceForge.net</strong></a> – Search on bioinformatics for a list of software available. Projects include:MIAMExpress - <a title="SourceForge" href="http://sourceforge.net/projects/miamexpress/" target="_blank">http://sourceforge.net/projects/miamexpress/</a></p>
<p style="margin-bottom: 0in;"><a title="freshmeat" href="http://freshmeat.net" target="_blank"><strong>freshmeat</strong></a>– The Web&#8217;s largest index of Unix and cross-platform software</p>
<p style="margin-bottom: 0in;"><a title="Bioinformatics Organization" href="http://www.bioinformatics.org" target="_blank"><strong>Bioinformatics Organization</strong></a> – The Open Access Institute<strong> </strong></p>
<p style="margin-bottom: 0in;"><strong> </strong><a title="Bioinformatics Org Projects" href="http://www.bioinformatics.org/softwaremap/?form_cat=2" target="_blank"></a></p>
<p style="margin-bottom: 0in;"><strong><a title="OpenBio" href="http://www.open-bio.org/wiki/Main_Page" target="_blank">Open Bioinformatics Foundation (O|B|F)</a> </strong>- Hosts Many Open Bioinformatics Projects</p>
<p style="margin-bottom: 0in;"><strong>Public Domain Manifesto</strong></p>
<p style="margin-bottom: 0in;">In this time of curtailment of civil rights, the Public Domain Manifesto seems appropriate (<a title="Public Domain Manifesto" href="http://www.publicdomainmanifesto.org/node/8" target="_blank">http://www.publicdomainmanifesto.org/node/8</a><a href="http://www.publicdomainmanifesto.org/node/8">)</a>. Sign the petition while you&#8217;re there.</p>
<p style="margin-bottom: 0in;">This is the end of Part 2.  Part 3 will explore more software skills, project management, and other computational topics.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.lifeformulae.com/2010/01/effective-bioinformatics-programming-part-2/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

