LifeFormulae Blog » Posts in 'Science Communicaton' category

Effective Bioinformatics Programming - Part 1 No comments yet

The PLOS Computational Biology website recently published “A Quick Guide for Developing Effective Bioinformatics Programming Skills” by Joel T. Dudley and Atul J. Butte (http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000589).

This article is a good that survey covers all the latest topics and mentions all the currently-popular buzzwords circulating above, around, and through the computing ionosphere. It’s a good article, but I can envision readers’ eyes glazing over about page 3. It’s a lot of computer-speak in a little space.

I’ll add in a few things they skipped or merely skimmed over to give a better overview of what’s out there and how it pertains to bioinformatics.

They state that a biologist should put together a Technology Toolbox. They continue, “The most fundamental and versatile tools in your technology toolbox are programming languages.”

Programming Concepts

Programming languages are important, but I think that Programming Concepts are way, way more important. A good grasp of programming concepts will enable you to understand any programming language.

To get a good handle on programming concepts, I recommend at book. This book, Structure and Implementation of Computer Programs from MIT Press (http://mitpress.mit.edu/sicp/),is the basis for an intro to computer science at MIT. It’s called the Wizard Book or the Purple Book.

I got the 1984 version of the book which used the LISP language. The current 1996 version is based on LISP/Scheme. Scheme is basically a cleaned-up LISP, in case you’re interested.

Best of all course (and the down loadable book) are freely available from MIT through the MIT OpenCourseWare website – http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-001Spring-2005/CourseHome/index.htm.

There’s a blog entry - http://onlamp.com/pub/wlg/8397 - that goes into further explanation about the course and the book..

And just because you can program, it doesn’t mean you know (or even need to know) all the concepts. For instance, my partner for a engineering education extension course was an electrical engineer who was programming microprocessors. When the instructor mentioned the term “scope” in reference to some topic, he turned to me and asked, “What’s scope?”

According to MIT’s purple book –” In a procedure definition, the bound variables declared as the formal parameters of the procedure have the body of the procedure as their scope.”

You don’t need to know about scope to program in assembler, because everything you need is right there. (In case you’re wondering, I consider assembler programmers to be among the programming elites.)

Programming Languages

The article mentions Perl, Python, and Ruby as the “preferred and most prudent choices” in which to seek mastery for bioinformatics.

These languages are selected because “they simplify the programming process by obviating the need to manage many lower level details of program execution (e.g. memory management), affording the programmer the ability to focus foremost on application logic…”

Let me add the following. There are differences in programming languages. By that, I mean compiled vs scripted. Languages such as C, C++, and Fortran are compiled. Program instructions written in these languages are parsed and translated into object code, or a language specific to the computer architecture the code is to run on. Compiled code has a definite speed advantage, but if the code is the main or any supporting module is changed, the entire project must be recompiled. Since the program is compiled into the machine code of a specific computer architecture, portability of the code is limited.

Perl, Python, and Ruby are examples of scripted or interpreted languages. These languages are translated into byte code which is optimized and compressed, but is not machine code. This byte code is then interpreted by a virtual machine (or byte code interpreter) usually written in C.

An interpreted program runs more slowly than a compiled program. Every line of an interpreted program must be analyzed as it is read. But the code isn’t particularly tied to one machine architecture making portability easier (provided the byte code interpreter is present). Since code is only interpreted at run time, extensions and modifications to the code base is easier, making these languages great for beginning programmers or rapid prototyping.

But, let’s get back to the memory management. This, and processing speed will be a huge deal in next gen data analysis and management.

Perl automatic memory management has a problem with circularity, as Perl (and Ruby and Python) count references.

If object 1 points to object 2 and object 2 points back to 1 , but nothing else in the program points to either object 1 or object 2 (this is a weak reference), these objects don’t get destroyed. They remain in memory. If these objects get created again and again, it’s called a memory leak.

I also have to ask – What about C/C++ , Fortran, and even Turbo Pascal? The NCBI Toolkit is written in C/C++. If you work with foreign scientists, you will probably see a lot Fortran.

Debugging

You can’t mention programming with mentioning debugging. I consider the act of debugging code an art form any serious programmer should doggedly pursue.

Here’s a link to a ebook, The Art of Debugging http://www.circlemud.org/cdp/hacker/. It’s mainly Unix-based, C-centric and a little dated. But good stuff never goes out of style.

Chapter 4, Debugging: Theory explains various debugging techniques. Chapter 5 – Profiling talks about profiling your code, or determining where your program is spending most of its time.

He also mentions core dumps. A core is what happens when your C/C++/Fortran program crashes in Unix/Linux. You can examine this core to determine where your program went wrong. (It gives you a place to start.)

The Linux Foundation Developer Network has an on-line tutorial – Zen and the Art of Debugging C/C++ in Linux with GDB – http://ldn.linuxfoundation.org/article/zen-and-art-debugging-cc-linux-with-gdb. They write a C program (incorporating a bug), create a make file, compile, and then use gdb to find the problem. You are also introduced to several Unix/Linux commands in the process.

You can debug Perl by invoking it with the -d switch. Perl usually crashes at the line number that caused the problem and some explanation of what went wrong.

The -d option also turns on parser debugging output for Python.

Object Dumps

One of the most useful utilities in Unix/Linux is od (object dump). You can examine files in octal (default), hex, or ASCII characters

od is very handy for examining data structures, finding hidden characters, and reverse engineering.

If you think you’re code is right, the problem may be in what you are trying to read. Use od to get a good look at the input data.

That’s it for Part 1. Part 2 will cover Open Source, project management, archiving source code and other topics.

Science Communication No comments yet

A Commentary piece, “Science Communication Revisited”, in the June, 2009, issue of Nature Biotechnology discusses increasing public involvement in science issues and decision-making.

Concerns are raised about the state of science education and scientific literacy more generally.

If only the public were more knowledgeable about things scientific, the article states, they would see things through the eyes of the expert.

Education

I was fortunate enough to have attended private schools from elementary through high school. Very few children are so lucky.

My biology teacher was a Catholic nun. She introduced us to Teilhard de Chardin (http://en.wikipedia.org/wiki/Pierre_Teilhard_de_Chardin).

Teilhard was a Jesuit priest who was trained as a paleontologist and geologist and took part in the discovery of Peking Man. He also studied botany and zoology. His book, The Phenomenon of Man, talks about the unfolding of the material cosmos towards the Omega Point, a maximum level of consciousness, that is pulling all creation towards it. Evolution, according to Teilhard, was the process of matter becoming aware of itself.

Therefore, I was able to receive a fairly sound exposure to evolution. On the other hand, the chapters on male and female biology and the reproductive process was ripped out of my text book.

(I know, because we found an unaltered book and read that forbidden text.)

At any rate, I grew up in an agricultural environment and knew what it was all about.

If you’re interested in the state of scientific education or education in public schools in Texas, I recommend the Texas Freedom Network (http://txfree.convio.net/site/PageServer ).

Experts

Concerning experts, I remember my section chief telling me, “You have to forgive Bryan, he still believes in experts.” Brian was our lead engineer.

As far as experts go, you have to be able to separate the good from the bad.

I recommend this article, Crap Detection 101 (http://www.sfgate.com/cgi-bin/blogs/rheingold/detail?entry_id=42805) and the CRAP Test (http://www.workliteracy.com/the-crap-test).

The CRAP test is a way to evaluate an internet source based on the following criteria: Currency, Reliability, Authority and Purpose/Point of View.

The article and test’s main focus is the internet — how to tell real from bogus. It’s not too hard to extrapolate the points they make to everyday life.

Scientific Literacy

Science and technology are changing so rapidly, that many people have simply given up on trying to keep up. Their scientific literacy consists of newspaper articles or blurbs on the TV news.

A lot of what is presented as science on network television is implausible (not to mention the technology used on these shows).

I think to really succeed, real scientists must pay attention to what is presented to the general public and critique it through publications, such as letters to the editor, blogs, appearances, etc. as much as possible.

Scientists should also by of an open mind as to the intelligence of your audience.

We have way too many people with 200 point IQ levels digging ditches in this country. We spend an inordinate amount of funds and interest on educating special children. We should be spending just as much time and funds (if not more) identifying and encouraging the geniuses among us who find education boring and quickly loose interest.

The interest in science is out there, but scientists must take an interest in how what passs for science is disseminated, validate or invalidate that science, identify the appropriate target audience, and address that audience level to really open up the forum on true scientific communication.

Salon, an e-zine, has a really good article. Why America is flunking science (http://www.salon.com/env/feature/2009/07/13/science_illiteracy/?source=newsletter) that is worth the read.

Here’s another link where the author lists current “myths” surrounding scientists engaging with the media.

http://scienceblogs.com/christinaslisrant/2009/07/when_discussing_scientists_eng.php

Top of page / Subscribe to new Entries (RSS)