All Things Unix
Bioinformatics started with Unix. At the Human Genome Center, for a long time, I had the one and only PC. (We got a request from our users for a PC-based client for the Search Launcher). Everything else was Solaris (Unix) and Mac, which was followed by Linux.
Unix supports a number of nifty commands like grep, strings, df, du, ls, etc. These commands are run inside the shell, or command line interpreter, for the operating system (Unix). There have been a number of these shells in the history of Unix development.
The bash shell http://en.wikipedia.org/wiki/Bash is the default shell for the Linux environment. This shell provides several unique capabilities over other shells. For instance, bash supports a history buffer of system commands. With the history buffer, the “up” arrow will return the previous command. The history command lets you view a history of past commands. The bang operator (!) lets you rerun a previous command from the history buffer. (Which saves a lot of typing!)
bash enables a user to redirect program output. The pipeline feature allows the user to connect a series of commands. With the pipeline (“|”) operator, a chain of commands can be linked together where the output of one command is the input to the next command an so forth.
A shell script (http://en.wikipedia.org/wiki/Shell_script) is script written for the shell or command line interpreter. Shell scripts enable batch processing. Together with the cron command, these scripts can be set to run automatically at times when system usage is minimum.
For general information about bash, go to the Bash Reference Manual at http://www.gnu.org/software/bash/manual/bashref.html.
A whole wealth of bash shell script examples is available at - http://tldp.org/LDP/abs/html/.
Unix on Other Platforms
Cygwin (http://www.cygwin.com/) is a Linux-like environment for windows. The basic download installs a minimum environment, but you can add additional packages at any time. Go to http://cygwin.com/packages/ for a list of Cygwin packages available for download.
Apple’s OS X is based on Unix. Other than the MACH kernel, the OS is BSD-derived. Their Java package is usually not the latest as Apple has to port Java due to differences such as the graphics portion.
All Things Software – Documenting and Archiving
I’ve run into all sorts of approaches to program code documentation in my career. A lead engineer demanded that every line of assembler code be documented. A senior programmer insisted that code should be self-documenting.
By that, she used variable names such as save_the_file_to_the_home_directory, and so on. Debugging these programs was a real pain. The first thing you had to do was set up aliases for all the unwieldy names.
The FORTRAN programmers cried when variable names longer than 6 characters were allowed in version 77 of VAX FORTRAN.. Personally, I thought it was great. The same with IMPLICIT NONE.
In the ancient times, FORTRAN integers variables had to start with i thru n. Real variables could use the other letters. The IMPLICIT NONE directive told the compiler to shut that off.
All FORTRAN variables had to be in capital letters. But you could stuff strings into integer variables which I found extremely useful. All FORTRAN statements had to begin with a number. This number usually started at 10 and went up in increments of 10.
At one time Microsoft used Hungarian notation (http://en.wikipedia.org/wiki/Hungarian_notation) for variables in most of their documentation. In this method, the name of the variable indicated it’s use. For example, lAccountNumber was a long integer.
The IDEs (Eclipse, NetBeans, and others) will automatically create the header comment with a list of variables. The user just adds the proper definitions. (If you’re using Java, the auto comment is JavaDoc compatible, etc.)
Otherwise, Java supports the JavaDoc tool, Python has PyDoc, and Ruby has RDoc.
Personally, I feel that software programs should be read like a book, with documentation providing the footnotes, such as an overview of what the code in question does and a definition of the main variables for both input and output. Module/Object documentation should also note who uses the function and why. Keep variable names short but descriptive and make comments meaningful.
Keep code clean, but don’t go overboard. I worked with one programmer who stated, “My code is so clean you could eat off it.” I found that a little too obnoxious, not to mention overly optimistic as a number of bugs popped out as time went by.
Archiving Code
Version Control Systems (VCS) have evolved as source code projects became larger and more complex.
RCS (Revision Control System) meant that the days of the keeping the Emacs numbered files (e.g. foo.~1~) as backups were over. RCS used the diff concept (just kept a list of the changes make to a file as a backup strategy).
I found this unsuited for what I had to do – revert to an old version in a matter of seconds.
CVS was much, much better. CVS was replaced by Subversion. But they’re centralized repository structure can create problems. You basically check out what you want to work on from a library and check it back in when you’re done. This can be a slow process depending on network usage or central server available.
The current favorite is Git. Git was created by Linus Torvalds (of Linux fame). Git is a free, open source distributed version control system. (http://git-scm.com/).
Everyone on the project has a copy of all project files complete with revision histories and tracking capabilities. Permissions allow exchanges between users and merging to a central location is fast.
The IDE’s (Eclipse and NetBeans) will have CVS and Subversion plug ins already configured for accessing those repositories. NetBeans also supports Mercurical. Plug ins for the other versioning software modules are available on the web. The Eclipse plug in for Git is available at http://git.wiki.kernel.org/index.php/EclipsePlugin.
System Backup
Always have a plan B. My plan A had IT backup my systems on a weekly to monthly basis based on usage. A natural disaster completely decimated my systems. No problem, I thought, I have system backup. Imagine how I felt when I heard that IT had not archived a single on of my systems in over three years! Well, I had a plan B. I had a mirror of the most important stuff on an old machine and other media. We were back up almost immediately.
The early Tandem NonStop systems (now known as HP Integrity NonStop) automatically mirrored your system in real-time, so down time was not a problem.
Real-time backup is expensive and unless you’re a bank or airline, it’s not necessary.
Snapshot Backup on Linux with rsync
If you’re running Linux, Mac, Solaris, or any Unix-based system, you can use rsync for generating automatic rotating “snapshot” style back-ups. These systems generally have rsync already installed. If not, the source is available at – http://rsync.samba.org/.
This website - http://www.mikerubel.org/computers/rsync_snapshots/ will tell you everything you need to know to implement rsync based backups, complete with sample scripts.
Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically.
Acknowledgment – Thanks, Bill!
I want to thank Bill Eaton for his assistance with these blog entries on Effective Bioinformatics Programming. He filled in a lot of the technical details, performed product analysis, and gave me direction in writing these blog entries.
To Be Continued - Part 4
Part 4 will cover relational database management systems (RDBMS), HPC (high performance computing) - parallel processing, FPGC, clusters, grids, and other topics.
Effective Bioinformatics Programming – Part 2
Instrumentation Programming
Instrumentation Programming usually concerns computer control over the actions of an instrument and/or the streaming or download of data from the device. Instrumentation in the Life Sciences covers data loggers, waveform data acquisition systems, pulse generators, image capture, and others used extensively in LIMS (Laboratory Information Management Systems), Spectroscopy, and other scientific arenas.
Most instruments are controlled by codes called “control codes”. These codes are usually sent or received by a C/C++ program. Some instrumentation manufacturers, however, have a proprietary programming language that must be used to “talk” to the instrument.
Some companies are nice enough to provide information on the structure of the data that comes from their instrument. When they don’t you may have to use good old “reverse engineering”. That’s where the Unix/Linux od utility comes in handy, because lots of time will be spent poring over hex dumps.
As you can tell, programming instruments requires a lot of patience. This is especially true if everything hangs or gets into a confused state. There is nothing you can do but recycle the power to everything and start over. This is usually accompanied by a banging of keyboards and the muttering of a few choice words.
Development Platforms or IDEs (Integrated Development Environment)
I have to mention development platforms as they can be useful, but also problematic. My favorite is Eclipse (http://www.eclipse.org). Originating at IBM, Eclipse was supported by a consortium of software vendors. Eclipse has now become the Eclipse open source community, supported by the Eclipse Foundation.
Eclipse is a development platform for programmers comprised of extensible frameworks, tools and runtimes for building, deploying and managing software across the lifecycle. You can find plug-ins that will enable you to accomplish just about anything you want to do. A plug-in is an addition to the Eclipse platform that is not included in the base package, like an Eclipse memory manager or a debugging a Tomcat servlet.
Sun offers NetBeans (“The only IDE you need.”). I used NetBeans (http://netbeans.org) at lot on the Mac. Previously, Sun offered StudioOne and Creator. I used StudioOne (on Unix) and Creator (on Linux). I haven’t worked with NetBeans lately because they’re currently mostly Swing-centric (GUI) development and are not fully JSF (java Server Faces) aware. NetBeans will make a template for JSF but doesn’t (as yet) provide an easy way to create a JSF interface.
There are two main problems with development platforms. For one, the learning curve is fairly steep. There area lot of tutorials and examples available, but you still have take the time to do it.
The best way to use a development platform is to divide the work. One group does web content, one group does database, one group does middleware (the glue that holds everything together), etc. Each group or person can then become knowledgeable in their area and move on or absorb other areas as needed.
The second problem with these tools in that you are stuck with their developmental approach.
You have to do things a certain way and adhere to a certain structure. Flexibility can be a problem.
This is especially true of interface building. You are stuck with the code the tool generates and the files and file structures created. With most tools, you have to use that tool to access files that the tool created.
IDEs can be useful in that they will perform mundane coding tasks for you. For instance, given a database record, the IDE can use those table elements to generate web forms and the SQL queries driving those forms. You can then expand the simple framework or leave as is.
Open Source/Free Software and Bioinformatics Libraries
There a lot of good an not-so-good Open Source code out there for the Life Sciences.
There are several “gotchas” to look out for, including –
Is the code reliable? Are others using it? Are they having problems?
Will the code run on your architecture? What will it take to install
What kind of user support is available? What’s the response time?
Is there a mailing list available for the library, package, or project of interest?
The are several bioinformatics software libraries available for various languages. All of these libraries are OpenSource/Free Software. Installing these libraries takes a little more that just downloading and uncompressing a package. There are “dependencies” (other libraries, modules, programs, and access to external sites) that must be resident or accessible before a complete build of these libraries is possible.
The following is a list of the most popular libraries and their respective dependencies.
BioPerl 1.6.1: Modules section of http://www.cpan.org/
Required modules:
perl => 5.6.1
IO::String => 0
DB_File => 0
Data::Stag => 0.11
Scalar::Util => 0
ExtUtils::Manifest => 1.52
Required modules for source build:
Test::More => 0
Module::Build => 0.2805
Test::Harness => 2.62
CPAN => 1.81
Recommended modules: some of these have circular dependencies
Ace => 0
Algorithm::Munkres => 0
Array::Compare => 0
Bio::ASN1::EntrezGene => 0
Clone => 0
Convert::Binary::C => 0
Graph => 0
GraphViz => 0
HTML::Entities => 0
HTML::HeadParser => 3
HTTP::Request::Common => 0
List::MoreUtils => 0
LWP::UserAgent => 0
Math::Random => 0
PostScript::TextBlock => 0
Set::Scalar => 0
SOAP::Lite => 0
Spreadsheet::ParseExcel => 0
Spreadsheet::WriteExcel => 0
Storable => 2.05
SVG => 2.26
SVG::Graph => 0.01
Text::ParseWords => 0
URI::Escape => 0
XML::Parser => 0
XML::Parser::PerlSAX => 0
XML::SAX => 0.15
XML::SAX::Writer => 0
XML::Simple => 0
XML::Twig => 0
XML::Writer => 0.4
Some of these modules such as SOAP::Lite depend upon many other
modules.
BioPython 1.53: http://biopython.org/
Additional packages:
NumPy (recommended) http://numpy.scipy.org/
ReportLab (optional) http://www.reportlab.com/software/opensource/
MySQLdb (optional) May be in core Python distribution.
BioRuby 1.4.0: http://www.bioruby.org/
The base distribution is self-contained and uses the RubyGems installer.
Optional packages.
RAA:xmlparser
RAA:bdb
RubyForge:ActiveRecord and at least one driver (or adapter) from
RubyForge:MySQL/Ruby, RubyForge:postgres-pr, or RubyForge:ActiveRecord
Oracle enhanced adapter.
RubyForge:libxml-ruby (Ruby language bindings for the GNOME Libxml2 XML toolkit)
BioJava 1.7.1: http://www.biojava.org/
biojava-1.7.1-all.jar: self-contained binary distribution with
all dependencies included.
biojava-1.7.1.jar: bare distribution that requires the following additional
jar files. These are required for building from source code.
Most are from http://www.apache.org/
bytecode.jar: required to run BioJava
commons-cli.jar: used by some demos.
commons-collections-2.1.jar: demos, BioSQL Access
commons-dbcp-1.1.jar: legacy BioSQL access
commons-pool-1.1.jar: legacy BioSQL access
jgraph-jdk1.5.jar: NEXUS file parsing
Don’t forget to sign up for the mailing list for that library or libraries of interest to get the lastest news, problems, solutions, etc. for that library or just life science topics in general.
Software Hosting and Indexing Sites
There are several Software Hosting and Indexing Sites that serve as software distribution points for bioinformatics software.
SourceForge.net – Search on bioinformatics for a list of software available. Projects include:MIAMExpress - http://sourceforge.net/projects/miamexpress/
freshmeat– The Web’s largest index of Unix and cross-platform software
Bioinformatics Organization – The Open Access Institute
Open Bioinformatics Foundation (O|B|F) - Hosts Many Open Bioinformatics Projects
Public Domain Manifesto
In this time of curtailment of civil rights, the Public Domain Manifesto seems appropriate (http://www.publicdomainmanifesto.org/node/8). Sign the petition while you’re there.
This is the end of Part 2. Part 3 will explore more software skills, project management, and other computational topics.
The PLOS Computational Biology website recently published “A Quick Guide for Developing Effective Bioinformatics Programming Skills” by Joel T. Dudley and Atul J. Butte (http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000589).
This article is a good that survey covers all the latest topics and mentions all the currently-popular buzzwords circulating above, around, and through the computing ionosphere. It’s a good article, but I can envision readers’ eyes glazing over about page 3. It’s a lot of computer-speak in a little space.
I’ll add in a few things they skipped or merely skimmed over to give a better overview of what’s out there and how it pertains to bioinformatics.
They state that a biologist should put together a Technology Toolbox. They continue, “The most fundamental and versatile tools in your technology toolbox are programming languages.”
Programming Concepts
Programming languages are important, but I think that Programming Concepts are way, way more important. A good grasp of programming concepts will enable you to understand any programming language.
To get a good handle on programming concepts, I recommend at book. This book, Structure and Implementation of Computer Programs from MIT Press (http://mitpress.mit.edu/sicp/),is the basis for an intro to computer science at MIT. It’s called the Wizard Book or the Purple Book.
I got the 1984 version of the book which used the LISP language. The current 1996 version is based on LISP/Scheme. Scheme is basically a cleaned-up LISP, in case you’re interested.
Best of all course (and the down loadable book) are freely available from MIT through the MIT OpenCourseWare website – http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-001Spring-2005/CourseHome/index.htm.
There’s a blog entry - http://onlamp.com/pub/wlg/8397 - that goes into further explanation about the course and the book..
And just because you can program, it doesn’t mean you know (or even need to know) all the concepts. For instance, my partner for a engineering education extension course was an electrical engineer who was programming microprocessors. When the instructor mentioned the term “scope” in reference to some topic, he turned to me and asked, “What’s scope?”
According to MIT’s purple book –” In a procedure definition, the bound variables declared as the formal parameters of the procedure have the body of the procedure as their scope.”
You don’t need to know about scope to program in assembler, because everything you need is right there. (In case you’re wondering, I consider assembler programmers to be among the programming elites.)
Programming Languages
The article mentions Perl, Python, and Ruby as the “preferred and most prudent choices” in which to seek mastery for bioinformatics.
These languages are selected because “they simplify the programming process by obviating the need to manage many lower level details of program execution (e.g. memory management), affording the programmer the ability to focus foremost on application logic…”
Let me add the following. There are differences in programming languages. By that, I mean compiled vs scripted. Languages such as C, C++, and Fortran are compiled. Program instructions written in these languages are parsed and translated into object code, or a language specific to the computer architecture the code is to run on. Compiled code has a definite speed advantage, but if the code is the main or any supporting module is changed, the entire project must be recompiled. Since the program is compiled into the machine code of a specific computer architecture, portability of the code is limited.
Perl, Python, and Ruby are examples of scripted or interpreted languages. These languages are translated into byte code which is optimized and compressed, but is not machine code. This byte code is then interpreted by a virtual machine (or byte code interpreter) usually written in C.
An interpreted program runs more slowly than a compiled program. Every line of an interpreted program must be analyzed as it is read. But the code isn’t particularly tied to one machine architecture making portability easier (provided the byte code interpreter is present). Since code is only interpreted at run time, extensions and modifications to the code base is easier, making these languages great for beginning programmers or rapid prototyping.
But, let’s get back to the memory management. This, and processing speed will be a huge deal in next gen data analysis and management.
Perl automatic memory management has a problem with circularity, as Perl (and Ruby and Python) count references.
If object 1 points to object 2 and object 2 points back to 1 , but nothing else in the program points to either object 1 or object 2 (this is a weak reference), these objects don’t get destroyed. They remain in memory. If these objects get created again and again, it’s called a memory leak.
I also have to ask – What about C/C++ , Fortran, and even Turbo Pascal? The NCBI Toolkit is written in C/C++. If you work with foreign scientists, you will probably see a lot Fortran.
Debugging
You can’t mention programming with mentioning debugging. I consider the act of debugging code an art form any serious programmer should doggedly pursue.
Here’s a link to a ebook, The Art of Debugging – http://www.circlemud.org/cdp/hacker/. It’s mainly Unix-based, C-centric and a little dated. But good stuff never goes out of style.
Chapter 4, Debugging: Theory explains various debugging techniques. Chapter 5 – Profiling talks about profiling your code, or determining where your program is spending most of its time.
He also mentions core dumps. A core is what happens when your C/C++/Fortran program crashes in Unix/Linux. You can examine this core to determine where your program went wrong. (It gives you a place to start.)
The Linux Foundation Developer Network has an on-line tutorial – Zen and the Art of Debugging C/C++ in Linux with GDB – http://ldn.linuxfoundation.org/article/zen-and-art-debugging-cc-linux-with-gdb. They write a C program (incorporating a bug), create a make file, compile, and then use gdb to find the problem. You are also introduced to several Unix/Linux commands in the process.
You can debug Perl by invoking it with the -d switch. Perl usually crashes at the line number that caused the problem and some explanation of what went wrong.
The -d option also turns on parser debugging output for Python.
Object Dumps
One of the most useful utilities in Unix/Linux is od (object dump). You can examine files in octal (default), hex, or ASCII characters
od is very handy for examining data structures, finding hidden characters, and reverse engineering.
If you think you’re code is right, the problem may be in what you are trying to read. Use od to get a good look at the input data.
That’s it for Part 1. Part 2 will cover Open Source, project management, archiving source code and other topics.

I was going through a box of textbooks last week and stumbled upon a copy of the Enron Code of Ethics. I have another one stored away with a form, signed by Ken Lay, that states I have read and will comply with the Enron Code of Ethics.
I was employed at Enron from 2000 through 2002 and was there when the wheels came off. Our department was left intact. Otherwise, whole floors of the Enron building were vacated. It really was a shame, because Enron was a great place to work. Several friends and acquaintances lost most of what they had because of the malfeasance of a greedy few.
This had to be the most blatant example of unethical conduct in the workplace I encountered. There were others, that appeared seemingly minor, ended up costing companies money and talent. Most of these losses were mostly the result of mismanagement and not outright unethical behavior. But, then again, is mismanagement itself unethical?
I book I read recently entitled “A Small Treatise of the Great Virtues, The Uses of Philosophy in Everyday Life” by Andre Comte-Sponville (Metropolitan Books), talks about truth as “Good Faith”.
He states on page 196, that “at the very least that one speaks the truth about what one believes, and this truth, even if what one believes is false, is less true for all that. Good faith, in this sense, is what we cal sincerity (or truthfulness or candor) and is the opposite of mendacity, hypocrisy, and duplicity, in short, the opposite of bad faith in all its private and public forms.”
In my position at a major hardware/software developer I was told that I “didn’t need to know about a product to sell it.”
At another position, I found that a few fraudulent claims by a contractor caused a company to fork over three quarters of a million dollars for custom software when a fifteen thousand dollar piece of hardware would have an enabled an already existing piece of commercial software to do the job. With a more accurate accountability of the data, I might add.
In fact, the whole program was completely mismanaged, to the detriment of the company, not the contractor. In fact, he was ready for the next program as he had one of his engineers hired in to head up that project. An engineer who didn’t have the slightest idea about our system, much less its theory. Thankfully, we got him transferred out of there and back to design where he belonged. The contractor was kicked out of the company.
These are straight-forward examples of bad faith. The following are a little harder to classify.
Beware the ulterior motive, especially if the new system you are proposing will impose on someone’s fiefdom.
Data analysis for the existing program consisted of placing a request with the a data analysis group and waiting up to 3 days for results. The system proposed (and later deployed) would give each and every engineer access to an analysis application that they could use to inspect the data one and a half hours after a particular test cycle was completed. A little training and they were ready to go.
Countless hours were spent in useless meetings defending the system. Everybody shut up when the system came up on day one and stayed up through months of testing.
This test/record/analysis cycle fits perfectly into the Laboratory Information Management Systems (LIMS) cycle of genomics research. A successful LIMS implementation in one lab aroused the ire of yet another lab attempting to develop their own solution. Let’s just say a lot of bad faith erupted.
The real loser in the above examples is the company. Money is wasted and talented people go elsewhere.
Biotechnology is a hot commodity right now. Stimulus funding bringing fresh capital to many projects. Companies are leveraging existing corporate products by repackaging them as biotech ready.
National Instruments LabView is one of these. I used it a lot in engineering. Now it’s a big player in the lab, incorporating interfaces for research lab instrumentation.
What is a LIMS (Laboratory Information Management System)? Is it an inventory management system? Is it a data pipeline? Can one size fit all?
Some companies have taken existing Inventory Management Systems and relabeled them as a Laboratory Information Management Systems. (At least the acronym fits.) Most of these systems don’t distinguish between research and manufacturing environments. They also don’t support basic validation of the LIMS application for its intended purpose. No wonder some 80% of LIMS users are dissatisfied.
At a recent conference I talked with researchers from various pharmaceutical companies and they were thoroughly dissatisfied with their LIMS systems. One scientist stated that they had a problem with their LIMS. When they went to report the problem, they found the company was no longer in business.
The latest IT (Information Technology) trends – SaaS, Cloud computing – may work in a business environment , but they won’t translate well to a pharmaceutical research area where they want everything safe behind the firewall.
There are many, many factors that go into developing biotechnology applications. Getting the right people, controlling the political environment, finding or developing the right software – it’s a jungle out there.
Keep to Good Faith and please be careful.
Today, one in ten engineers is a woman – http://www.dol.gov/wb/factsheets/hitech02.htm In avionics, it’s fewer than that.
This is really a shame, because I find that women are extremely well suited for jobs in high tech careers.
Here’s a short list of why I think this is true along with explanations as to why I think this is so.
- Women are more patient and determined
- Women can juggle a lot of tasks simultaneously
- Women can attend to small details and see the big picture at the same time
- Women don’t get derailed by the small stuff
- Women have a better support system.
- Women are more sympathetic and understanding
I’ll stop at this group of six, although I could add a few more. They are not true of all women, but that’s probably because they haven’t had the experience.
Just take a look at what current society expects of women and I think you’ll see why I think women are more patient and determined! Case in point, I just got an email on “How to Create Perfect Eyes” through makeup application. Can you imagine a heterosexual male having the patience to take the time to apply all the goop we women have to put on our faces to be seen in public? Also, remember how determined we were to walk in high heels so we could pretend we were grown-ups?
Programming, system design and integration requires patience and determination. It’s a step-by-step process. All the pieces have to work together to produce the correct outcome. It’s no different that making a food dish from a recipe, although in most cases you’ll have only your experience to formulate the list of ingredients and right steps to finish the job.
Think about getting the family ready for school/work in the morning. How many things are you trying to do at once? Multi-tasking is standard operating procedure for most women, who can adapt to chaos in the blink of an eye.
I know chaos. Other than being the oldest of nine children (5 girls, 4 boys), I drove a school bus for about 4 years while I was attending college. I was given a long, country route that paid well and gave me enough hours to qualify for health insurance. After I had driven the route for about six weeks, my supervisor asked me how i was doing and what did I think of the kids. I said I thought I was doing okay and the kids were a little rowdy, but we got that under control. Otherwise, I said the kids were a bright bunch and generally inquisitive about everything. (“Miss Pam, what’s a hickey? Our teacher says it’s something you get in dominoes.”)
I found out later that these kids had been through 4 bus drivers in 4 weeks. The last day of that school year the kids on the route gave me a plaque that said “World’s Best School Bus Driver”. I was impressed, even though they misspelled my name.
I’ve discovered that women, as a whole, performed better on mission critical tasks that required a lot on concentration and coordination of several activities that had to occur simultaneously.
I couldn’t make a practice session for a particular field test, so the guys were going to fill in for me. I heard that it took them an extra long time to get started, because they couldn’t figure out how to calibrate the instrumentation. (They took the same training class that I did!) Let’s just say that they were more than happy to let me take over the operation after they were introduced to all the steps involved in the pre and post fly-over operations.
Lots of tasks mean lots of details to keep track of with almost no time to double-check anything. Women do this sort of thing all the time. Think about putting together a meal, folding clothes fresh from the dryer, putting on makeup. You don’t really think about it, you just do it. Juggling home, family, and career by itself is one big accomplishment.
We took two years to perfect all the pieces that made up the testing for the 727QF certification. We worked out the weather station in Roswell, NM. We took the acoustic analyzer to Moses Lake, WA ( to work out the routine we needed for testing. (Desert dust at 35 knots in no fun, but it can’t hold a candle to the volcanic ash from Mt. St. Helen’s that we ran into in Moses Lake. They got about a foot of ash from that explosion and the ash was dumped at the airport. Right where we were working!)
The only missing piece was the data download from the data logger on the meteorological (met) plane.
I sat under the wing of the small Cessna in the hot Texas August heat with a laptop atop my crossed legs, dodging fire ants, as I worked out the best method for our technician to save the data acquired after each run of the met plane. I got it down to a few steps, ran through it with him, and we had the met data canned.
All those pieces, met plane, weather station, acoustic analyzer and DAT (digital audio tape) data, were part of the big picture that was noise testing. The other parts were the group support systems – data download, availability, and analysis, There was so much data flowing through the pipeline, we held a meeting every morning to discuss who needed what, how much, how they wanted it, and what data could be taken to archive.
The next-gen sequencing efforts are producing an astronomical amount of raw data. Data that has to be stored, analyzed, and archived, creating one complex system. It’s a massive task and one I can sympathize with.
Women don’t get derailed by the small stuff.
Maybe this wasn’t so small, and sometimes it hit close to home, but a lot of the things I did got satirized via a cartoon or paste-up on bulletin boards all over the plant on the 727QF program.
For instance, I developed this relational database model that would store measurement information for the two aircraft we were testing.
One of the technicians had started his own local database, but he had no understanding of relational data concepts. So he had thermocoupleA and thermocoupleB, where A represented on aircraft and B represented the other. The thermocouple in question was the same on both aircraft, causing duplicate records for the same part info.
At a informal meeting we were having in the instrumentation lab, I said that his database design was stupid because we didn’t need more than one copy of the part’s basic attributes. The next day there was a flyer on the bulletin boards with a picture of the tech with a bubble over his head that said, “I stupid.”
There was some other verbiage, “Coming soon son of stupid. When relational is not enough.”

Stupid Database Flyer
Since the technician was a friend, this was funny. There were others that weren’t so entertaining.
I think women are more sympathetic and understanding of other people. The problem is to not be so understanding that you are taken for a ride.
As a support system, we have probably the best weapon in the arsenal – we can cry. Not in public, not on the job, but we can got somewhere private and cry. Sometimes this is the only way to get it our of your system.
I put a lot of dents in a lot of old hardware and ran miles and miles, but. sometimes. even that did not cover it.
I will end by saying that I was pleasantly surprised at the number of women involved in the life sciences. By this, I mean as directors, P.I.’s, or other positions of power. However, men in the field still earn one-third more than the women.
Maybe one day, women will wield as much power in all branches of technology, and their paychecks will actually reflect this status.
BioCamp 2009 at Rice University
Bill and I attended BioCamp 2009 at Rice University on Saturday, Sept. 12. There were several presentations followed by lively question and answer sessions.
The atttendance consisted of entrepreneurs, those seeking guidance on turning their ideas and research into viable products, consultants searching for marketable products, and members of the legal profession offering advice on intellectual property, patents, trademarks, and the like.
A Commentary piece, “Science Communication Revisited”, in the June, 2009, issue of Nature Biotechnology discusses increasing public involvement in science issues and decision-making.
Concerns are raised about the state of science education and scientific literacy more generally.
If only the public were more knowledgeable about things scientific, the article states, they would see things through the eyes of the expert.
Education
I was fortunate enough to have attended private schools from elementary through high school. Very few children are so lucky.
My biology teacher was a Catholic nun. She introduced us to Teilhard de Chardin (http://en.wikipedia.org/wiki/Pierre_Teilhard_de_Chardin).
Teilhard was a Jesuit priest who was trained as a paleontologist and geologist and took part in the discovery of Peking Man. He also studied botany and zoology. His book, The Phenomenon of Man, talks about the unfolding of the material cosmos towards the Omega Point, a maximum level of consciousness, that is pulling all creation towards it. Evolution, according to Teilhard, was the process of matter becoming aware of itself.
Therefore, I was able to receive a fairly sound exposure to evolution. On the other hand, the chapters on male and female biology and the reproductive process was ripped out of my text book.
(I know, because we found an unaltered book and read that forbidden text.)
At any rate, I grew up in an agricultural environment and knew what it was all about.
If you’re interested in the state of scientific education or education in public schools in Texas, I recommend the Texas Freedom Network (http://txfree.convio.net/site/PageServer ).
Experts
Concerning experts, I remember my section chief telling me, “You have to forgive Bryan, he still believes in experts.” Brian was our lead engineer.
As far as experts go, you have to be able to separate the good from the bad.
I recommend this article, Crap Detection 101 (http://www.sfgate.com/cgi-bin/blogs/rheingold/detail?entry_id=42805) and the CRAP Test (http://www.workliteracy.com/the-crap-test).
The CRAP test is a way to evaluate an internet source based on the following criteria: Currency, Reliability, Authority and Purpose/Point of View.
The article and test’s main focus is the internet — how to tell real from bogus. It’s not too hard to extrapolate the points they make to everyday life.
Scientific Literacy
Science and technology are changing so rapidly, that many people have simply given up on trying to keep up. Their scientific literacy consists of newspaper articles or blurbs on the TV news.
A lot of what is presented as science on network television is implausible (not to mention the technology used on these shows).
I think to really succeed, real scientists must pay attention to what is presented to the general public and critique it through publications, such as letters to the editor, blogs, appearances, etc. as much as possible.
Scientists should also by of an open mind as to the intelligence of your audience.
We have way too many people with 200 point IQ levels digging ditches in this country. We spend an inordinate amount of funds and interest on educating special children. We should be spending just as much time and funds (if not more) identifying and encouraging the geniuses among us who find education boring and quickly loose interest.
The interest in science is out there, but scientists must take an interest in how what passs for science is disseminated, validate or invalidate that science, identify the appropriate target audience, and address that audience level to really open up the forum on true scientific communication.
Salon, an e-zine, has a really good article. Why America is flunking science (http://www.salon.com/env/feature/2009/07/13/science_illiteracy/?source=newsletter) that is worth the read.
Here’s another link where the author lists current “myths” surrounding scientists engaging with the media.
http://scienceblogs.com/christinaslisrant/2009/07/when_discussing_scientists_eng.php
The End of Bioinformatics?!
I read with some interest the announcement of the Wolfram Alpha. Wolfram intends to be the end all and be all data mining systems and some say, makes bioinformatics obsolete.
Wolfram’s basis is a formal Mathematica representation. It’s inference engine is a large number of hand-written scripts that access data that has been accumulated and curated. The developers stress that the system is not Artificial Intelligence and is not aiming to be. For instance, a sample query,
“List all human genes with significant evidence of positive selection since the human-chimpanzee common ancestor, where either the GO category or OMIM entry includes ‘muscle’”
could currently be executed with SQL, provided the underlying data is there.
Wolfram won’t replace bioinformatics. What it will do is make it easier for a neophyte to get answers to his or her questions because they can be asked in a simpler format.
I would guess Wolfram uses one or more these scripts to address a specific data set in conjunction with a natural language parser. These scripts would move this data to a common model that could then be modeled on a web page.
But why not AI? Why not replace all those “hand-written” scripts, etc. with a real inference engine.
I rode the first AI wave. I was a member of the first of 25 engineers selected to be a part of the McAir AI Initiative at McDonnell Aircraft Company. (”There is AI in McAir”). In all, 100 engineers were chosen from engineering departments to attend courses leading to a Certificate in Artificial Intelligence from Washington University in St. Louis.
One of the neat things about the course was the purchase of at least 30 workstations (maybe as many as 60) for a young company called Sun that were loaned to Washington University for the duration of the course. Afterwards, we got a few Symbolics machines for our CADD project.
Other than Lisp and Prolog, the software we used was called KEE (Knowledge Engineering Environment). Also, there was a DEC (Digital Equipment Company) language called OPS5.
The course was quite fast-paced but very extensive. We had the best AI consultants available at the time lecture and give assignments in epistemology, interviewing techniques, and so on. I had a whole stack of books.
The only problem was that no money was budgeted (or so I was told) for AI development for the departments for the engineers when they returned from the course eager to AI everything. A lot of people left.
Anyway, my group of three developed a “Battle Damage Repair” system that basically “patched up” the composite wing skins of combat aircraft. Given the size and location of the damage, the system would certify whether the aircraft would be able to return to combat, and would output the size and substance of the patch if the damage wasn’t that bad.
One interesting tidbit: We wanted to present our system at a conference in San Antonio and had a picture of a battle-damaged F-15 we wanted to use. Well, we were told that the picture was classified and, as such, we couldn’t use it. Well, about that same time, a glossy McAir brochure featuring our system and that photo were distributed at the AAAI (American Assn. of Artificial Intelligence) to thousands of people.
Another system I developed dealt with engineering schematics. These schematics were layered. Some layers and circuits were classified. Still another system scheduled aircraft for painting and yet another charted a path for aircraft through hostile territory, activating electronic counter measures as necessary.
I guess the most sophisticated system I worked on was with the B-2 program. The B-2 skin is a composite material. This material has to be removed from a freezer, molded into a final shape and cooked in a huge autoclave before it completely thawed.
We had to schedule materials, and the behavior of that material under various circumstances, as well as people and equipment. The purpose was to avoid “bottlenecks” in people and equipment. I was exposed to the Texas Instruments Explorer and Smalltalk-80 on an Apple. I’ve been in love with Smalltalk ever since.
The system was developed, but it was never used. The problem was that we had to rank workers by expertise. That’s union workers and that wasn’t allowed.
It was a nice system that integrated a lot of systems and worked well. Our RFP (Request for Proposals) went out to people like Carnegie-Mellon. We had certain performance and date requirements that we wanted to see in the final system. We were told that the benchmarks would be difficult, in not impossible, to attain. Well, we did it, on our own without their help.
We also had a neural net solution that inspected completed composite parts. The parts were submerged in water and bombarded with sound waves. The echoes were used by the system to determine part quality.
AI promised the world, and then it couldn’t really deliver. So it kind of went to the back burner.
One problem with the end and be all. It will only be as good as your model. It will only be as good as the developers can determine the behavior of the parts and how they interact with the whole. Currently, this is a moving target and is changing day to day. Good luck.
Links -
Will Wolfram Make Bioinformatics Obsolete? - http://johnhawks.net/weblog/reviews/genomics/bioinformatics/wolfram-alpha-bioinformatics-2009.html
The most complex system I’ve configured was the airborne data acquisition and ground support systems. However, not many people have to or want do anything that large or complex. Some labs will need info from thermocouples, strain gauges, or other instrumentation, but most of you will be satisfied with a well-configured system that can handle today’s data without a large cash outlay that can be expanded at minimum cost to handle the data of tomorrow.
This week’s guest blogger, Bill Eaton, provides some guidelines for the configuration of a Database Server, a Web Server, and a Compute Node, the three most requested configurations.
(Bill Eaton)
General Considerations
Choice of 32-bit or 64-bit Operating System on standard PC hardware
- A 32-bit operating system limits the maximum memory usage of a program to 4 GB or less, and may limit maximum physical memory.
- Linux: for most kernels, programs are limited to 3 GB. Physical memory can usually exceed 4 GB.
- Windows :The stock settings limit a program to 2 GB, and physical memory to 4 GB.
The server versions have a /3GB boot flag to allow 3 GB programs and a /PAE flag to enable more than 4 GB of physical memory.
Other operating systems usually have a 2 or 3 GB program memory limit.
- A 64-bit operating system removes these limits. It also enables some additional CPU registers and instructions that may improve performance. Most will allow running older 32-bit program files.
Database Server:
Biological databases are often large, 100 GB or more, often too large to fit on a single physical disk drive. A database system needs fast disk storage and a large memory to cache frequently-used data. These systems tend to be I/O bound.
Disk storage:
- Direct-attached storage: disk array that appears as one or more physical disk drives, usually connected using a standard disk interface such as SCSI.
- Network-attached storage: disk array connected to one or more hosts by a standard network. These may appear as network file systems using NFS, CIFS, or similar, or physical disks using iSCSI.
- SAN: includes above cases, multiple disk units sharing a network dedicated to disk I/O. Fibre Channel is usually used for this.
- Disk arrays for large databases need high I/O bandwidth, and must properly handle flush-to-disk requests.
Databases:
- Storage overhead: data repositories may require several times the amount of disk space required by the raw data. Adding an index to a table can double its size. A test using a simple mostly numeric table with one index gave these overheads for some common databases.
- MySQL using MyISAM 2.81
- MySQL using InnoDB 3.28
- Apache Derby 5.88
- PostgreSQL 7.02
- Data Integrity support: The server and disk system should handle failures and power loss as cleanly as possible. A UPS with clean shutdown support is recommended.
Web Server and middleware hosts:
A web server needs high network bandwidth, and should have a large memory to cache frequently-used content.
Web Service Software Considerations:
- PHP: Thread support still has problems. PHP applications running on a Windows system under either Apache httpd or IIS may encounter these. We had seen a case where WordPress run under Windows IIS and Apache httpd gave error messages, but worked without problems under Apache httpd on Linux. IIS FastCGI made the problem worse. PHP acceleration systems may be needed to support large user bases.
- Perl: similar thread support and scaling issues may be present. For large user bases, use of mod_perl or FastCGI can help.
- Java-based containers: (Apache Tomcat, JBoss, GlassFish, etc) These run on almost anything without problems, and usually scale quite well.
Compute nodes:
Requirements depend upon the expected usage. Common biological applications tend to be memory-intensive. A high-bandwidth network between the nodes is recommended, especially for large clusters. Network attached storage is often used to provide a shared file system visible to all the nodes.
- Classical “Beowulf” cluster: used for parallel tasks that require frequent communication between nodes. These usually use the MPI communication model, and often have a communication network tuned for this use such as Myrinet. One master “head” node controls all the others, and is usually the only one connected to the outside world. The cluster may have a private internal Ethernet network as well.
- Farm: used where little inter-node communication is needed. Nodes usually just attach to a conventional Ethernet network, and may be visible to the outside world.
The most important thing is to have a plan from the beginning that addresses all the system’s needs for storage today and is scalable for tommorrow’s unknowns.
Data Stewardship -
The Conducting, Supervising, and Management of Data
Next-gen sequencing promises to unload reams and reams of data on the world. Pieces of that data will prove relevant to one or the other of specific research projects in your enterprise. At the same time, your lab may produce more data by annotation or simple research. How do you handle it all?
First, you should appoint a data steward. This person must understand where the data comes from, how it is modeled, who uses what parts of it, and any results this data may produce, such as forms, etc. Most importantly, they must be able to verify the integrity of that data.
Data, Data, Data
I’ve handled lots of engineering and bioinformatics data in my time…
In engineering, I had to be sure all instrumentation was calibrated correctly and production data was representative or correct. Every morning at 7 a.m., I held a meeting with data analysts, system administrators, database representatives, etc. focused on who was doing what to which data, what data could be archived, what data should be recovered from archive, and so on. This data inventory session proved to be extremely useful as there were terabytes of data swept through the system on a weekly basis.
For bioinformatics, I had to locate and merge data from disparate sources into one whole and run that result against several analysis programs to isolate the relevant data. That data was then uploaded to a local database for access by various applications. As the amount of available sequence data grew, culling the data, storage of this data, and archiving of the initial and final data became something of a headache.
My biggest bioinformatics problem was NCBI data, as that was how we got most of our data.
I spent weeks/months/years plowing though the NCBI toolkit, mostly in debug. Grep became my friend.
I tried downloading complete GenBank reports from the NCBI ftp website but that took too much space. I used keywords with the Entrez eutils, but the granularity wasn’t fine enough, and I ended up with way too much data. Finally, I resorted to the NCBI Toolkit on NCBI ASN.1 binary files.
LARTS would have made this part so much easier.
The Data Steward should also be familiar with data maintenance and storage strategies.
Our guest blogger, Bill Eaton, explains the difference between backup and archiving of data, and lists the pros and cons of various storage technologies.
Bill Eaton: Data Backup and Archival Storage
Backups are usually kept for a year or so, then the storage media is reused.
Archives are kept forever. Retrievals are usually infrequent for both.
Storage Technologies
Tape: suitable for backup, not as good for archiving.
Pro: Current tape cartridge capacities are around 800 GB uncompressed.
Cost per bit is roughly the same as for hard disks.
Con: Tape hardware compression is ineffective on already-compressed data.
Tapes and tape drives wear out with use.
Software is usually required to retrieve tape contents. (tar, cpio, etc)
Tape technology changes frequently, formats have a short life.
Optical: better for archiving than backup
Pro: DVD 8.5 GB, Blu-Ray 50 GB
DVD contents can be a mountable file system, so that no special software is needed for retrieval.
Unlimited reading, no media wear.
Old formats are readable in new drives.
Con: Limited number of write cycles.
Hard Disks: could replace tape
Pro: Simple: Use removable hard disks as backup/archive devices.
Disk interfaces are usually supported for several years.
Con: Drives may need to be spun up every few months and contents
rewritten every few years.
MAID: Massive Array of Idle Disks
Disk array in which most disks are powered down when
not in active use.
Pro: The array controller manages disk health,
spinning up and copying disks as needed.
The array usually appears as a file system. Some can emulate a tape drive.
Con: Expensive.
Classical: the longest-life archival formats are those known
to archaeologists.
Pro: Symbols carved into a granite slab are
often still readable after thousands of years.
Con: Backing up large amounts of data this way could take hundreds of years.
Women in Flight Test
I spent some 10 plus years in engineering. As a woman in engineering it was daunting. As a woman in Avionics Flight Test, it was even more so.
I was working as a systems programmer at McDonnell-Douglas (now Boeing) in St. Louis, Missouri. Our project consisted of 6 compilers that supported a completely integrated database system for hospitals. A woman on our team was married to a section chief who worked at McDonnell Aircraft. McAir (as it was commonly called) manufactured the F-4, F-18, and F-15 aircraft. As our project was in trouble, she said her husband said Flight Test was looking for someone who could develop database systems and do other programming for the Ground Support Systems (GSS) unit.
At my interview, I was told, “We manufacture high-tech war machines that might kill innocent women and children. So, I don’t want any wimps or pacifists working for me.”
I wondered at the time if my interviewer was wearing cammo underwear.
I was accepted. At the time I was one (if not the first) woman professional working at McAir Flight Test.
My desk was on the fourth floor on the West end of a hanger. Flight Test had offices and labs on the East and West ends of the hanger. Since our unit was the first set of desks one encountered when accessing our area, I was assumed to be a secretary and asked all sorts of questions. The fix was to turn my desk around so everybody was looking toward the interior and my desk was facing the window. I got an up-close look at the planes taking off and landing as they were just clearing the building. Take offs were bad because the fumes from the jet fuel were overwhelming. I would have to ask the person I was talking to on the phone if they would hold because I couldn’t hear them over the noise.
The hush houses were less than 100 yards away. You couldn’t hear the noise (that’s what hush houses do), but the thud-thud-thud vibration of the engines became a bit much at times. Oh yeah, our hanger sat right on top of the Flight Test fuel dump.
The first project was to develop a database system to track the equipment used by Flight Test. The original system consisted of a 6-ft. by 20-ft. rack of index cards in pull-down trays. Each piece of equipment had a card which stated what it was, where it was, etc.
Every time a Flight Test program was scheduled, all the parts for that program had to be tagged and their index cards updated. This would take anywhere from 3 days to a week under the old card system.
The new system did it in a few hours and produced timely reports on all parts and their locations.
Next system was a database system for Flight Test electronics like Vishay resistors, etc.
Another project was providing the documentation and training aids for the Digital Data Acquisition System for the F-18. This system directly connected the plane’s computer system and uploaded mission information and, subsequently, downloaded mission results.
The old system was called the “taco wagon”. It was a large roll-about cart. These carts cost about $300K and used a card reader to upload mission info.
Our system replaced the “taco wagon” with an early Compaq laptop that cost about $3K.
Then Flight Test submitted my name as one of two people for a special program. The other person from Flight Test was an ex-Air Force major who flew F-15s.
I went through the program and upon returning to Flight Test was asked to make a presentation to our executives.
The major and I made our presentation and opened it up to questions. Our VP asked the major a question that started with the phrase, “As Flight Test’s designated expert in this area…”.
Later, I told my section chief what happened. He said and I quote, “Flight Test is not ready for a woman to be expert at anything.”
These are two of the most glaring examples. There were lots of others.
However, when I left, the VP said, “We will tell your story around the campfires.”
I took that as the highest compliment.
After Flight Test, I worked on AI projects before returning to Flight Test.
The Dee Howard Company in San Antonio ran an ad in Aviation Week for Flight Test Engineers. I answered the ad. Alenia (the world’s largest aerospace company, headquartered in Italy) had a stake in Dee Howard. They were taking on a new project, the UPS 727QF. The FAA had mandated that all cargo aircraft had to cut their engine noise levels. UPS decided to re-engine their 727 aircraft with new, quieter, Rolls-Royce Tay 650-powered engines. Dee Howard was to do the work and conduct the testing.
At the time of the contract, the count of planes to be re-engined was given as 60 plus. The number actually re-engined has been given as 44 and 48.
Previously, Dee Howard was known for customizing aircraft interiors. The interior of the 747 that NASA uses to ferry the space shuttle was done by the company. They also fitted an emir’s 747 with a movie studio, solid gold toilet fixtures, and a complete operating studio. The tale was that the emir, who had a bad heart, had a living heart donor traveling with him at all times. Anyway, it makes a nice story.
I was hired in and proceeded to work on the system for the new program.
We were the first to replace all 3 engines on the 727. Previously, only the two external engines were replaced, the tail engine was left as is. We were going to replace all three.
We were to have two planes. The critical plane was to have a new data acquisition system. The other plane was to use a system from Boeing - ADAS. Originally designed in 1965, ADAS had 64K memory, filled half a good sized room, used 8-inch diskettes, and the measurements were programmed by way of EEPROM.
The new acquisition system was better. We bought a ruggedized cabinet and started adding boards. PC-on-a-chip wasn’t quite there, but we did have PC-on-a-board and we could set things up via a PC interface. To analyze the PCM data stream I used BBN/Probe instead of the custom software that was used on the previous system.
First flight came. The system came up and stayed up. Except for the one time, the flight engineer turned the system on before power was switched from the APU to the aircraft (the 8=mm tape recorder died), it worked every time.
On the fighters, flight test equipment was mounted on pallets in the bomb bay. It was neat to ride on the plane during a test flight. An airplane, with all the seats and padding removed, is your basic tin can.
I always got along with the technicians. They are the ones who do the real work. They make the engineer’s design come to life or markedly point out the error of his ways.
It was really nice to ask for such and such a cable or gadget and have it brought to my desk. The best (and worst) part of the program was field testing. I got to go to lively places like Roswell, NM, Moses Lake, WA, and Uvalde, TX with 35 guys. The length of the test depended on flying conditions. We were usually stuck there for 3-4 weeks.
We also did some testing at home. For one ground test, we taped tiny microphones to different places on the engines. The microphones were connected to the acoustic analyzer and DAT recorders. The engines were then run at various levels. I ran the acoustic analyzer for a few seconds for one set of mikes and flipped to record another mike set for a few seconds more. I had to wear headphones for the noise. We had to yell anyway. It was really hot, because this was San Antonio in July. We were on a unused runway at the airport, next to a well-traveled road. The test took several hours. The guys took the door off the head (which was right across from the open front access door) so I could watch the traffic when I used it. (Did I tell you I had to clean the head when the test was over?)
As the revs got higher, the airplane moaned and groaned. One engine finally belched. We were lucky it didn’t catch fire!
Other testing conditions were just as much fun. Roswell has the desert. Desert dust at 35 knots is awful. Tumbling weeds have nasty stickers. Moses Lake had volcanic ash. Mt. St. Helen’s dumped about a foot of ash at the Moses Lake airport. Airport officials dumped the collected ash on a spot at the airport that they thought was unused. One of our trucks got stuck in it.
At Uvalde, we had heat and gnats. You inhaled them and they flew down your throat.
A local asked where the women went to the bathroom because there weren’t any trees.
Other than the conditions, there was the schedule. The equipment had to be set up, calibrated, and ready to go at sun-up. If conditions were good, we worked all day with a break for lunch and put everything away after dark.
Wake-up was 3 or 4 a.m. We usually got back to the motel at dark, after we prepped the plane. It got to the point of going out to get dinner or getting an extra hour’s sleep.
The testing was fun, too. The plan was to fly over at different altitudes carrying varying weights. (We had to unload 14 thousand pounds of ballast at one point, consisting of 50-lb. round lead weights with handles on each site. I took my place in line with the guys. Same thing with the car batteries for the transponders and loading and unloading the generator from the truck.)
The locals thought we might be flying in drugs, so they called the law, and the local sheriff came to call.
The testing, when in progress, was intense. After set-up, the microphones were calibrated. We had mikes at center line and other mikes on the peripheral. I ran the acoustic analyzer.
I set the analyzer to trigger on a signal from the aircraft and turn on the tape recorders. After the fly-over I had to download the data, pass it off to an engineer who analyzed it via a curve-fit program, and reset everything for the next fly-over.
The fly-overs came one after the other about 5-7 minutes apart. We had to re-calibrate the mikes after a few, so we got an extra 5-10 minutes. We got a break for lunch (with the gnats).
It was hard, dirty work. But it was fun - and dangerous. One test consisted of engine stalls on a 30-year old aircraft at 19,000 ft. (it was too turbulent down below). Another test had the aircraft stall during takeoff with different loads. I was on board and loving it.
My supervisor said that “field testing separates the men and women from the boys and girls.” He was right.
One day we had a visitor in the lab. One of the techs was working on something and let lose a string of expletives. The visitor said the tech should be quiet because there was a lady present. The tech looked at the visitor and said, “That’s no lady, that’s Pam!” (You had to know the guys. I took it as a compliment.)
If you haven’t guessed, as a woman working in this environment, you have to have a thick skin. You have to work really hard because you have to be really, really good. But I think it’s all worth it.
It’s too easy to become a “he-she” or a “shim” (dress and act like the guys), but I didn’t. I wore my hair longish and always wore make-up. Even in the field. I laid out my clothes the night before. I could be up and ready to go in 5 minutes, complete with mascara, eye liner, sunscreen and blush. I always had my knitting nearby.
It was hard work, but it all paid off. The UPS plane was certified the latter part of 1992.
I’ve got some memories, some good, some not, but I know I made the grade.
Addendum -
The group working on this project was international in scope and I worked closely with most of them. We had several people from the British Isles representing Rolls-Royce, including J. who spoke with a find Scottish accent.
I worked with two engineers from Alenia on the acoustics aspect of the program. They hailed from Naples, Italy. M. spoke excellent English. E. didn’t, but engineering is universal, so we were able to make it work. Another acoustic team member was a Russian, L. His English was also excellent.
I can honestly say that I know Fortran in Russian and Italian. I had to grab whatever Fortran text I could find in a pinch and the Russian or Italian text was usually the closest.
We communicated with facilities in Italy, England, and France on an almost daily basis. The time difference was the only snag.
It was interesting to see how our American ways are interpreted by other cultures.