LifeFormulae Blog » Posts for tag 'web config'

Computer System Configurations No comments yet

The most complex system I’ve configured was the airborne data acquisition and ground support systems.  However, not many people have to or want do anything that large or complex.  Some labs will need info from thermocouples, strain gauges, or other instrumentation, but most of you will be satisfied with a well-configured system that can handle today’s data without a large cash outlay that  can be expanded at minimum cost to handle the data of tomorrow.

This week’s guest blogger, Bill Eaton, provides some guidelines for  the configuration  of a Database Server,  a Web Server, and a Compute Node, the three most requested configurations.

(Bill Eaton)

General Considerations
Choice of 32-bit or 64-bit Operating System on standard PC hardware

  • A 32-bit operating system limits the maximum memory usage of a program to 4 GB or less, and may limit maximum physical memory.
    • Linux:  for most kernels, programs are limited to 3 GB.  Physical memory can usually exceed 4 GB.
    • Windows :The stock settings limit a program to 2 GB, and physical memory to 4 GB.
      The server versions have a /3GB boot flag to allow 3 GB programs and a /PAE flag to enable more than 4 GB of physical memory.
      Other operating systems usually have a 2 or 3 GB program memory limit.
  • A 64-bit operating system removes these limits.  It also enables some additional CPU registers and instructions that may improve performance. Most will allow running older 32-bit program files.

Database Server:
Biological databases are often large, 100 GB or more, often too large to fit on a single physical disk drive. A database system needs fast disk storage and a large memory to cache frequently-used data.  These systems tend to be I/O bound.

Disk storage:

  • Direct-attached storage:  disk array that appears as one or more physical disk drives, usually connected using a standard disk interface such as SCSI.
  • Network-attached storage:  disk array connected to one or more hosts by a standard network.  These may appear as network file systems using NFS, CIFS, or similar, or physical disks using iSCSI.
  • SAN:  includes above cases, multiple disk units sharing a network dedicated to disk I/O.  Fibre Channel is usually used for this.
  • Disk arrays for large databases need high I/O bandwidth, and must properly handle flush-to-disk requests.

Databases:

  • Storage overhead:  data repositories may require several times the amount of disk space required by the raw data.  Adding an index to a table can double its size.  A test using a simple mostly numeric table with one index gave these overheads for some common databases.
    • MySQL using MyISAM 2.81
    • MySQL using InnoDB 3.28
    • Apache Derby       5.88
    • PostgreSQL         7.02
  • Data Integrity support:  The server and disk system should handle failures and power loss as cleanly as possible.  A UPS with clean shutdown support is recommended.

Web Server and middleware hosts:
A web server needs high network bandwidth, and should have a large memory to cache frequently-used content.

Web Service Software Considerations:

  • PHP:  Thread support still has problems.  PHP applications running on a Windows system under either Apache httpd or IIS may encounter these.  We had seen a case where WordPress run under Windows IIS and Apache httpd gave error messages, but worked without problems under Apache httpd on Linux.  IIS FastCGI made the problem worse. PHP acceleration systems may be needed to support large user bases.
  • Perl:  similar thread support and scaling issues may be present. For large user bases, use of mod_perl or FastCGI can help.
  • Java-based containers:  (Apache Tomcat, JBoss, GlassFish, etc) These run on almost anything without problems, and usually scale quite well.

Compute nodes:
Requirements depend upon the expected usage.  Common biological applications tend to be memory-intensive.  A high-bandwidth network between the nodes is recommended, especially for large clusters.  Network attached storage is often used to provide a shared file system visible to all the nodes.

  • Classical “Beowulf” cluster:  used for parallel tasks that require frequent communication between nodes.  These usually use the MPI communication model, and often have a communication network tuned for this use such as Myrinet.  One master “head” node controls all the others, and is usually the only one connected to the outside world. The cluster may have a private internal Ethernet network as well.
  • Farm:  used where little inter-node communication is needed.  Nodes usually just attach to a conventional Ethernet network, and may be visible to the outside world.

The most important thing is to have a plan from the beginning that addresses all the system’s needs for storage today and is scalable for tommorrow’s unknowns.

Top of page / Subscribe to new Entries (RSS)