Prokaryotic Genomics


Haemophilus influenzae strain Rd became the first free-living organism to have its genome sequenced (Fleischmann et al., 1995). The floodgates have opened with over 100 prokaryotic genomes completely or partially sequenced. However, the acquisition and analysis of sequence data is not an end in itself; instead it is a starting point for generating hypotheses that can be tested in the laboratory. It is clear that knowledge of the complete genome sequence of an organism does not tell us a great deal about the composition or functional capabilities of the organism. Homology, or sequence similarity, provides clues, but it does not prove gene function. Furthermore, a large percentage of genes have no matches to known genes. For example, at the time of sequence release, up to 62% of predicted protein-coding genes in the Methanococcus jannaschii genome had no matches with genes from other organisms (Bult et al., 1996). Elucidating the function of these "ORFan" or "FUN" (function unknown) genes is one of the biggest challenges of the post-genomic era.

The avalanche of genome sequence data has coincided with important technological advances in four research areas: bioinformatics, gene mutagenesis, nucleic acid hybridization technology and protein chemistry. These advances will liberate scientific understanding from the piecemeal study of individual genes or operons towards a comprehensive analysis of the entire gene and protein complement of the prokaryotic cell. This new technology will allow a holistic approach to the functional characterization of prokaryotes at the mutational, transcriptional, and protein expression levels (see Fig. 1). The application of functional genomic approaches in the smaller genomes of prokaryotes is the forerunner for the study of functional genomics in higher organisms, including humans. An important exception is the efforts of the Saccharomyces cerevisiae research community, which is a shining example of what can be achieved through functional genomics studies (Lashkari et al., 1997; Winzeler et al., 1998; Winzeler et al., 1999a; Winzeler et al., 1999b; Uetz et al., 2000).

Fig. 1. Schematic showing available complementary functional genomics approaches.

Prokaryotic Genome Projects and the Birth of Comparative Genomics

The availability of genome sequences has spawned the new scientific discipline of comparative genomics, which allows the comparison of genome sequence data between strains, species, genera and even kingdoms. Such studies will provide important taxonomic insights and will have far-reaching implications for the study of evolution. The virtual genome center is a useful web-based site ( for evolutionary comparisons of proteins, protein families and genome sequences. In the future, comparative analysis of genome sequence data will be facilitated by high-density array DNA hybridization analysis (see section on Applications of High-Density DNA Arrays and Genomotyping in this Chapter).

The salient features of prokaryotic genome sequences are summarized in chronological order of publication. First, Haemophilus influenzae strain KW20 (Rd; 1.83 Mb; Fleischmann et al., 1995) was sequenced using the now widely adopted whole genome, random shotgun, sequencing approach. The full complement of genes enabled the deciphering of the polysaccharide structure of the organism. Several iterative sequences (dinucleotide and tetranucleotide) repeats were identified suggesting H. influenzae probably uses recombination and slipped-strand mispairing within repeats as a mechanism for antigenic/phase variation and adaptive evolution (Hood et al., 1996).

Mycoplasma genitalium (0.58 Mb; Fraser et al., 1995) has the smallest known genome content of any free-living organism. As a consequence its physiology and metabolic capacity differ from that of most living organisms. Mycoplasma genitalium represents an important system for determination of the minimal number of genes required for host-independent existence (Hutchison et al., 1999).

Methanococcus jannaschii (1.66 Mb; Bult et al., 1996) was the first archaeon to be sequenced. In M. jannaschii the majority of genes related to cell division, energy production and metabolism are more similar to those found in bacteria; by contrast, most of the genes involved in transcription, translation and replication are more similar to those found in eukaryotes.

Synechocystis sp. strain PCC6803 (3.57 Mb; Kaneko et al., 1996) is a photosynthetic bacterium where 5% of identified open reading frames (ORFs) were dedicated to photosynthetic reactions and 99 ORFs showed similarity to transposase genes, suggesting frequent rearrangement of the genome.

The sequencing of Mycoplasma pneumoniae M129 (0.81 Mb; Himmelreich et al., 1996) was the first occasion that two organisms within the same genera were sequenced. A subset of essential genes was identified in both Mycoplasma species. Anabolic and metabolic pathways were absent, which is consistent with its obligate parasitic lifestyle.

Sequence analysis of Helicobacter pylori, strain 26695 (1.67 Mb; Tomb et al., 1997), demonstrated that a surprisingly large proportion of the genome content was dedicated to DNA restriction modification, motility and sequestration of iron. Similarly, several adhesins and outer-membrane proteins were identified suggesting a complex host-pathogen life style. Surprisingly, few regulatory sequences and σ factors were identified, which is consistent with the restricted ecological niche of the human stomach in which the pathogen resides.

The Escherichia coli K-12 (4.64 Mb; Blattner et al., 1997) genome appeared highly organized and contains insertion sequence (IS) elements and phage remnants, indicating genome plasticity through horizontal transfer.

The Methanobacterium thermoautotrophicum strain delta H (1.75 Mb; Smith et al., 1997) sequence predicted that most of the proteins involved in DNA metabolism, transcription and translation were of eukaryotic origin, whereas gene structure and organization have features that are typical of bacteria. Comparisons with the M. jannaschii genome underline the extensive divergence that has occurred between these two Methanogens.

Bacillus subtilis (4.21 Mb; Kunst, 1997) was the first Gram-positive organism to be sequenced. A quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, and a significant proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources. The genome contains at least ten prophages suggesting that bacteriophage infection has played an important evolutionary role in horizontal gene transfer.

Archaeoglobus fulgidus VC-16 (2.18 Mb; Klenk et al., 1997) was the first sulfur-metabolizing organism to have its genome sequence determined. A quarter of the genome encodes novel proteins indicating substantial archaeal gene diversity.

Borrelia burgdorferi B31 (1.44 Mb; Fraser et al., 1997) appears unique among prokaryotes as its genome contains a linear chromosome (0.91 Mb) and at least 17 linear and circular plasmids (totaling 0.53 Mb). The biological significance of the multiple plasmid-encoded genes is not clear, although it is postulated that they may be involved in antigenic variation or immune evasion.

The complex metabolic machinery needed for Aquifex aeolicus (1.55 Mb; Deckert et al., 1998) to function as a chemolithoautotroph is encoded within a genome a third the size of the E. coli genome. Metabolic flexibility seems to be reduced as a result of the limited genome size. Although A. aeolicus grows at 95°C, the extreme thermal limit of the bacteria, only a few specific indications of thermophily are apparent from the genome.

Pyrococcus horikoshii OT3 (1.74 Mb; Kawarabayasi et al., 1998) is a hyperthermophilic archaebacterium whose genome sequence provided evidence that a considerable number of ORFs were generated by sequence duplication. Eleven ORFs were assumed to contain inteins.

Mycobacterium tuberculosis strain H37Rv, (4.41 Mb; Cole et al., 1998) has a very high G+C content that is reflected in the biased amino acid content of the proteins. Mycobacterium tuberculosis differs from other bacteria in that much of its coding capacity is devoted to lipogenesis and lipolysis. Two novel families of glycine-rich proteins are present that have a repetitive structure, which may represent a source of antigenic variation.

In Treponema pallidum (1.13 Mb; Fraser et al., 1998), the systems for DNA replication, transcription, translation and repair are intact, but catabolic and biosynthetic activities are minimized. Comparison of the T. pallidum genome sequence with that of another pathogenic spirochete, B. burgdorferi, identified both unique and common genes and confirms the considerable diversity observed among pathogenic spirochetes.

Although the obligate intracellular pathogen Chlamydia trachomatis (1.04 Mb; Stephens et al., 1998) lacks many biosynthetic capabilities, it retains functions for the interconversion of metabolites obtained from their mammalian host cells. The apparent wide origin of chlamydial genes, including a large number of genes of eukaryote origin, implies a complex evolution for adaptation to obligate intracellular parasitic status.

The sequence of Rickettsia prowazekii (1.11 Mb; Andersson et al., 1998) revealed surprising similarity to mitochondrial genes. For example, ATP production in Rickettsia is the same as that in mitochondria. The R. prowazekii genome contains the highest proportion of noncoding DNA (24%) for a prokaryote, and R. prowazekii more closely related to mitochondria than any other microbe studied to date.

The sequence of Helicobacter pylori strain J99 (Alm et al., 1999) allowed the first genome comparison of two strains from the same species. Prior to the availability of the second sequence, H. pylori was thought to exhibit a large degree of genomic and allelic diversity, but the overall genomic organization and gene order appeared quite similar. Between 6 to 7% of the genes are specific to each strain. Almost half of these genes are clustered in a single hypervariable region, termed "a plasticity zone."

Analysis of the Chlamydia pneumoniae (1.23 Mb; Kalman et al., 1999) genome revealed 214 protein-coding sequences not found in C. trachomatis, many without similarity to other known sequences. Significant comparative findings included conservation of a type-III secretion virulence system, expansion of a novel family of outer-membrane proteins, and three serine/threonine protein kinases.

Aeropyrum pernix K1 (1.70 Mb; Kawarabayasi et al., 1999) grows optimally at 95°C. All genes in the tricarboxylic acid (TCA) cycle were present except for that of α-ketoglutarate dehydrogenase. Sequence comparison among the assigned ORFs suggested that a considerable number of ORFs were generated by sequence duplication.

Of the eubacteria sequenced to date, Thermotoga maritima MSB8 (1.86 Mb; Nelson et al., 1999) has the highest percentage (24%) of genes that are most similar to archaeal genes. Conservation of gene order between T. maritima and archaea in several clustered regions of the genome suggests that lateral gene transfer may have occurred between thermophilic eubacteria and archaea.

The genome of the radiation-resistant bacterium Deinococcus radiodurans R1 (3.28 Mb; White et al., 1999) is, unusually, composed of two chromosomes (2.65 and 0.41 Mb), a large plasmid (0.178 Mb) and a small plasmid (0.045 Mb). Several regions of the genome were identified that allow D. radiodurans to survive under conditions of oxidative stress, desiccation, starvation and high amounts of DNA damage.

Campylobacter jejuni (1.64 Mb; Parkhill et al., 2000) was the first foodborne pathogen to be sequenced. The genome is unusual in that there are virtually no IS or phage-associated sequences and very few repeat sequences. A striking feature was the presence of hypervariable sequences commonly found in genes encoding surface structures. The apparently high rate of variation of these homopolymeric tracts may play an important role in the survival strategy of C. jejuni. Despite its close phylogenetic relationship to H. pylori, strong similarities between these organisms are mainly confined to housekeeping genes. In most functions related to survival, transmission, and pathogenesis, the organisms have remarkably little in common. This indicates that selective pressures have driven profound evolutionary changes to create two very different pathogens from a close common ancestor.

Neisseria meningitidis serotype B strain MC58 (2.23 Mb; Tettelin et al., 2000) revealed three major regions of horizontal DNA transfer. The sequence revealed insight into the commensal and virulence nature of the organism; in particular, the organism appears to undergo more phase variation than any prokaryote studied to date. In an accompanying paper, over 350 candidate antigens were expressed in E. coli and tested for their vaccine efficacy (Pizza et al., 2000). The sequence Neisseria meningitidis serotype A strain Z2491 (2.18 Mb) revealed hundreds of repeat elements ranging from short homopolymeric tracts to gene duplications again suggesting extreme genome fluidity, which probably plays a significant role in antigenic variation of this human specific pathogen (Parkhill et al., 2000). Comparison between N. meningitidis serotype B and N. meningitidis serotype A awaits further analysis.


Bioinformatics and the range of new supercomputers are poised to change forever the way in which we tackle prokaryotic research. The unprecedented deluge of sequence data requires processing and rapid access to functional genomics information and is central to the revolution that is taking place in prokaryotic molecular genetics. Bioinformatics is essentially the evolution of computer-based technology dedicated to the analysis of genome sequences. It is a cross-disciplinary activity, including aspects of computer science, software engineering, molecular biology, and mathematics. The past few years have seen vast improvements in the algorithms used to analyze sequence data, and an increasing range of bioinformatics software has been developed and released into the public domain by way of the Internet. Careful and intelligent use of this software can afford important new insights into protein structure and function and allow the generation of testable hypotheses.

Coincident with the availability of genome sequence data, several other factors have meant that it is easier than ever before for scientists to exploit such data. For example, the rise of the Internet and the World Wide Web, coupled with well-supported free software facilities, has made it easier than ever for scientists to use remote computing facilities (see examples on Table 1).

Table 1. Bioinformatics-based web sites.



BLAST searches

Microbial Genomes at NCBI



Genome Annotation Consortium

A compendium of electronic resources for molecular biology research

Genome browsers for bacterial pathogens

General functional genomics software (focus on E. coli)

STD Sequence Databases (STD pathogens)

Abbreviations: BLAST, basic local alignment search tool; NCBI, National Center for Biotechnology Information; PEDANT, protein extraction, Description, and analysis tool; ARTEMIS, a DNA sequence viewer and annotation tool; STD, sexually transmitted disease.

The most common use of bioinformatics is the search for sequence similarity with homologous genes/gene products deposited in the numerous nucleotide and protein databases worldwide. The basic local alignment search tool [{}] (BLAST); Altschul et al., 1990) is the most widely used program for such analysis. A simple example of the application of bioinformatics is the rapid identification of iterative nucleotide sequences in a genome sequence. These can act as markers for polymorphic regions important for antigenic/phase variation and in host-pathogen interactions (Hood et al., 1996). Comparative analysis of gene pathways from several complete genome sequences allows more definitive information on components of the pathway. Comparison of the citric acid cycle from numerous genome sequences makes it possible to reason confidently about the absence or presence of the different parts and branches of the cycle and even the overall metabolic scheme of the organisms (Huynen et al., 1999). The Kyoto Encyclopedia of Genes and Genomes worldwide [{}] shows many metabolic and regulatory pathways based on orthologous sequences from databases. Observations on chromosomal organization and gene order have proven useful in identifying recently acquired DNA sequences. This approach has been exploited in the T. maritima genome project to suggest that lateral gene transfer has occurred between thermophilic eubacteria and archaea (Nelson et al., 1999). The operon structure of prokaryotes makes it possible to identify coregulations, coexpressions, and more generally, gene clusters that might infer a common function (Overbeek et al., 1999).

The published annotated forms of the genome sequences fall short of being definitive. Generally, narrow sets of analysis programs have been employed, and no doubt, several reanalyses of the sequence data will be undertaken. In addition to the benefits of such static analysis, an ongoing dynamic analysis is needed, constantly reevaluating the sequence data in the light of newly published sequences. One example of such re-analysis software is the protein extraction, description, and analysis tool PEDANT, which is a software system that utilizes modern bioinformatics methods to provide complete functional and structural characterization of protein sequence sets from individual sequences to complete genomes. For other examples, see Table 1.

Sustained improvements in computing speed and in hard-drive capacity mean that even very computationally intensive analyses can be performed on readily available hardware. The availability of supercomputers will trigger a revolution in the complexity of problems we tackle to understand the basic biology and evolution of the prokaryotes (Butler, 2000).

Functional Genomics

In the past, geneticists have assigned gene function by specifically disabling or "knocking out" a single gene, usually by transposon mutagenesis (insertion mutagenesis) or allelic replacement (usually deletion mutagenesis), and then comparing the phenotypes of the mutant strain with the parent strain. This approach is still valid, but with the availability of information at the genome-wide level, a global approach to the study of gene function at the mutational, transcriptional and translational levels is now possible.

Table 2. General functional genomics web sites.




Virtual genome center

Evolutionary comparison of genomes

General functional genomics

Pharmaceutical research emphasized

Functional genomic analysis

E. cell project

An environment for modeling and simulating biochemical and genetic processes

Encyclopedia of genes and genomes

Emphasis on metabolic and regulatory pathways

Microbial biodegradation

Mass spectrometry

E. coli proteome

DNA microarrays

Construction and application

Bacterial pathogen DNA microarrays

Affymetrix biochip

Bacterial pathogenesis

Encyclopaedia of E. coli genes and metabolism

MICADO (B. subtilis and E. coli)

A network-oriented database for microbial genomes

Mutational Analysis

The construction of defined mutants by transposon mutagenesis or allelic replacement has proven to be a powerful method for determining gene function in numerous prokaryotes. Information about the biological functions can be inferred by monitoring the fitness of the null mutant under a variety of selected growth conditions. However, in conjunction with the construction of mutants is the potential to label each mutant with a unique DNA-signature tag permitting simultaneous analysis of several hundred mutants for phenotypic features (Fig. 2; Hensel et al., 1995).

Fig. 2. Schematic showing how differential gene expression of a bacterium at a site of infection or in culture is determined using a DNA microarray.

Signature-Tagged Mutagenesis

The use of DNA-signature tags was validated using Salmonella and a murine model of typhoid fever (Hensel et al., 1995). In the original design, the tags consisted of a central 40-bp variable region that allows differentiation between tags, flanked by constant 20-bp arms to which primers can bind for DNA amplification. By negative selection, mutants that fail to be recovered from the host following inoculation of a mixed pool of mutants can be identified (Fig. 2). Thus, when the hybridization signals from the tagged input pool of 96 mutants were compared with the respective tagged output pools of mutants, several mutants essential for the in vivo survival of Salmonella typhimurium could be identified (see Fig. 2; Hensel et al., 1995). This included the identification, and subsequent characterization, of a novel type III secretion system (SPI 2) in S. typhimurium (Shea et al., 1996). Signature-tagged mutagenesis (STM) has since been successfully used in the identification of virulence-associated factors in Staphylococcus aureus, Vibrio cholerae, Neisseria meningitidis, Streptococcus pneumoniae, Legionella pneumophila, Yersinia enterocolitica, Yersinia pseudotuberculosis, Proteus mirabilis, Mycobacterium tuberculosis and Brucella suis (Mei et al., 1997; Chiang and Mekalanos, 1998; Claus et al., 1998; Polissi, 1998; Camacho et al., 1999; Darwin and Miller, 1999; Edelstein et al., 1999; Foulongne et al., 2000).

The use of wholescale tagging of bacterial mutants will have its greatest potential impact on in vivo studies, insofar as the number of animal experiments required for the assessment of bacterial virulence can be drastically reduced. A review on how STM can help to identify virulence genes and other applications has been published recently (Shea and Holden, 2000). However, the tagging and analysis of prokaryotes also should be useful to measure the survivability of mutants in other complex environments ranging from biofilms to deep-sea ocean beds.

Exploitation of a DNA array of a given organism (see Macroarrays and Microarrays in this Chapter) may obviate the need to tag transposons and could be used to identify essential genes. Assuming that a transposon can insert randomly into a prokaryotic genome, a single primer reading out from the transposon (Karlyshev et al., 2000) could be used to detect all interrupted genes in a single hybridization to an organism-specific DNA array. Potentially by screening populations of input pools and output pools from a selective environment (e.g., stress), some transposon mutants will drop out of the pool, and those genes required for stress survival will be identified. Additionally, because cDNA will only be synthesized from genes with an integrated transposon, by deduction it should be possible to detect genes essential for the survival of the organism throughout the genome.

Signature-Tagged Allele Replacement

Because transposons often fail to integrate (or integrate randomly) into the chromosome of many prokaryotes, transposon mutagenesis is not universally applicable. Allelic replacement is often a useful alternative to the construction of defined deletion mutants. Furthermore, the availability of entire genomic sequences means that the large-scale, systematic construction of defined mutants is now possible. Thus, all genes can be tested methodically under a particular condition. The coupling of the incorporation of DNA tags with allelic replacement has been referred to as signature-tagged allele replacement (STAR). The STAR method does not require the use of transposons, but enables a systematic unbiased genetic analysis of the genome. As the gene target is predetermined, the need to sequence mutation sites is obviated, and by using a systematic approach, the number of mutants required for screening is minimized. Such an attempt has been made in S. cerevisiae (Shoemaker et al., 1996; Winzeler et al., 1999), which was coupled to the quantifying mutants with a specifically designed Affymetrix "barcoding" biochip (see Affymetrix Oligonucleotide Arrays in this Chapter) containing all the complementary DNA sequences of the tags used in mutant construction. The unique tags are based on an algorithm to select a set of over 9,000 maximally distinguishable 20mer sequences with similar Tm values (61 ± 5°C; Shoemaker et al., 1996). The optimized DNA tags increase the sensitivity of probes in complex hybridization reactions and enable the semiquantitative determination of viable bacteria. Thus, the barcoding biochip has the capacity to measure the relative abundance of defined mutants and can measure the growth rate of all tagged mutants simultaneously.

Parallel quantitative assessment of multiple strains significantly decreases labor and material required for screening, and it increases the reliability of data obtained. This approach also removes the need for onerous, hazardous, and repetitive filter-based radioactive hybridizations. Finally, the inclusion of DNA tags is an invaluable way of tracking and validating strains distributed to laboratories worldwide.

Transcriptome Analysis

Cellular processes are governed by the repertoire of expressed genes, in particular by the levels and the timing of their expression. The mRNA complement of a cell reflects the state of a cell, uniquely defining growth, division, stress adaptation and apoptosis. Transcriptome analysis offers the potential for the simultaneous measurement of expression levels for all transcripts (mRNAs) from a genome, giving a "snapshot" of the transcriptional activity of all genes in that genome. Analyses can be performed simultaneously at a given time point in growth or in any environment. This versatility is possible because mRNA expressed under a range of environmental conditions can be extracted and hybridized to a high-density gridded array of an organism's DNA content. The availability of ever-cheaper oligonucleotides, 384-well polymerase chain reaction (PCR) technology, robotics, and complete genome sequence data makes possible the highly attractive option of using gridded libraries of PCR products, constituting a defined and complete set of ORFs and intergenic regions. Such high-throughput analysis allows massive parallel gene expression and gene discovery studies to be undertaken.

High-Density DNA Arrays

There are two main high-density DNA array formats. These are generally referred to as microarrays (or macroarrays), which consist of 100- to 1,000-bp stretches of DNA, and Affymetrix biochips that consist of in situ synthesized oligonucleotides (~20 bases).

DNA Macroarrays and Microarrays

High-speed robots assemble DNA arrays on nylon membrane (often referred to as DNA macroarrays) or glass-slide solid supports (DNA microarrays). Pat Browne and colleagues, at Stanford University, have pioneered the construction and application DNA microarrays (Brown and Botstein, 1999; DeRisi and Iyer, 1999), which include the building of a robotic microarrayer from component parts. The principles and an example of the application of a DNA microarray for transcriptome analysis (also referred to as differential gene expression) are shown in Fig. 3.

Fig. 3. Schematic illustrating the principles of a signature-tagged mutagenesis screen, which can distinguish virulent from nonvirulent mutants.

To measure relative differences in gene expression, sample DNA or RNA is labeled (normally by fluorescence) and hybridized to the array (Fig. 3). For example, mRNA from cells grown under standard culture conditions is labeled with the red fluorescent dye Cy3, and sample mRNA from a site of infection is labeled with the green fluorescent dye Cy5. After cohybridization to the microarray, the fluorescent intensities of each fixed DNA sample are read to determine the relative abundance of mRNA from the two test conditions. A red signal indicates gene expression of cells grown in culture, a green signal indicates gene expression only during infection, and a yellow signal indicates genes expressed in both conditions (Fig. 3). To date, most applications of DNA microarrays have been used on eukaryotic systems (Ross et al., 2000; Scherf et al., 2000). A useful web site for the construction and application of bacterial pathogen DNA microarrays is the [{} {BUGAS website}].

Affymetrix Oligonucleotide Arrays

Oligonucleotide arrays are constructed by in situ light-directed combinatorial nucleotide synthesis (Chee et al., 1996), a process termed "photolithography" (Fig. 4; see [{}]). Because Affymetrix DNA arrays (biochips) consist of in situ synthesized oligonucleotide sequences, they offer two major advantages over DNA microarrays: 1) greater sample capacity (>50,000 samples per cm2 compared with 4,000 samples per cm2 for a typical DNA microarray) and 2) an ability to detect single nucleotide polymorphisms. However, due to cost and selective availability of biochips, Affymetrix technology is beyond the reach of most academic institutes. The large capacity of Affymetrix biochips has meant that their application has been used mainly for eukaryotic systems (Chiang and Mekalanos, 1998; Cho et al., 1998; Cho et al., 1999). However, synthesis of prokaryotic biochips containing several genome sequences (e.g., E. coli, B. subtilis, H. pylori, Staphylococcus aureus, and Streptococcus pneumoniae) is planned.

Fig. 4. Schematic showing stages in the in situ synthesis of an Affymetrix oligonucleotide array.

Applications of High-Density DNA Arrays and Genomotyping

Perhaps the most straightforward application of DNA microarrays is hybridization test strain DNA (e.g., from clinical or environmental isolates) with the genome of a sequenced strain arrayed on the solid support. Thus, a comprehensive assessment of genome diversity of a large number of strains can be rapidly attained. The use of DNA microarrays to compare the genome complements of several strains has been termed "genomotyping." Among bacterial pathogens, such an approach will be an invaluable molecular epidemiological tool, allowing the sources and routes of transmission of economically and clinically important pathovars to be determined. By assessing correlates of pathogenicity, a basic understanding of the evolution of virulence can be gained. To date, microarrays have been reported in the identification of gene sequences absent in the attenuated Mycobacterium bovis strain BCG compared to virulent M. tuberculosis strain H37Rv (Behr et al., 1999) and have been applied to S. cerevisiae (Winzeler et al., 1999).

Other potential applications of microarrays to prokaryote analysis include: 1) differential gene expression (DGE) by hybridizing mRNA extracted under varying environmental conditions; 2) DGE by comparing mutant to the wild-type strains, particularly to decipher regulatory networks; 3) testing genome plasticity of an individual strain by DNA hybridization; and 4) identifying single nucleotide polymorphisms (SNPs) by use of the Affymetrix DNA array.

Proteome Analysis

The case for global monitoring of mRNA applies equally to proteins, with the added advantage that post-translational modifications, which frequently play key roles in prokaryotic interactions, also can be studied. Proteomics, the study of the complete set of proteins that is expressed and modified by the entire genome in the lifetime of a cell, is an important rapidly evolving discipline, readily applied to prokaryotes.

2-D Gel Electrophoresis Protein Identification

Recent improvements in high-sensitivity biological mass spectrometry have provided a powerful adjunct to traditional 2D-gel electrophoresis (Pappin, 1997; Fernandez et al., 1998). Proteins cut out of a 2D-gel can now be peptide-mass-fingerprinted, and constituent peptides can be sequenced by mass spectrometry. New software takes data from mass spectrometry and uses it to find the best match in a sequence database, allowing one to go from a spot on a polyacrylamide gel electrophoresis (PAGE) gel to protein identification in a matter of hours (Fig. 1). Thus, the entire complement of soluble proteins expressed by a cell (the proteome) can be defined. This kind of approach already has been used to provide insights into the function of an anatomical subset of the proteome (such as the cell envelope; Qi, 1996). Proteome studies are made even more powerful when applied to an organism whose genome has been sequenced. Synergistic interactions between the two approaches maximize information return. For example, ORFan or FUN gene products can be identified as functional proteins. Proteome or partial proteome analysis has been reported for M. genitalium, S. typhimurium, and M. tuberculosis (Wasinger et al. 2000; O'Connor et al., 1997; Sonnenberg and Belisle, 1997; Jungblut et al., 1999; Tekaia et al., 1999). The continued E. coli proteome is on the [{}]. A comprehensive web resource for mass spectrometrists is found on Base Peak (

Potential applications of 2D sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDSPAGE)/mass spectrometry analysis for prokaryotes include: 1) characterization of regulons and stimulons; 2) study of posttranslational modifications; 3) study of protein complexes; 4) identification of immunogenic proteins by immunoblotting; and 5) determination of mechanisms of drug action and identifying drug targets.

Protein-Protein Interaction Maps

Interactions between proteins mediate the majority of biological processes, and various biochemical assays have been developed to measure such interactions. The most widely applied is the yeast two-hybrid system. The two-hybrid system exploits the ability of two interacting proteins to fetch a transcription domain into the locality of a DNA-binding site that regulates the expression of an adjacent reporter gene (Fields and Song, 1989; Chien et al., 1991; Fields and Sternglanz, 1994). The system can be used to identify proteins that bind to a protein of interest or to define domains or residues critical for an interaction (Fields and Song, 1989; Chien et al., 1991; Fields and Sternglanz, 1994).

Protein-protein interaction maps of selected components of prokaryotes should provide invaluable data on structural components of the cell. Some recent examples where this has been applied include outer-membrane proteins, secretion systems, and mutation proteins (Williams et al., 1998; Day and Plano, 1998; Hartland et al., 1999; Hall and Matson, 1999).

Once all DNA, RNA, and proteins are known, it should be possible to compile complete interaction maps of the genome. This already has been accomplished for bacteriophage (55 proteins), and it should be possible to tackle prokaryotes. Indeed, a comprehensive analysis of protein-protein interactions in S. cerevisiae has been performed recently using exhaustive yeast two-hybrid screens (Uetz et al., 2000).

Studies and Practical Implications

A multidimensional analysis, looking at genome sequences, mutants, transcripts, and proteins, will result in a quantum leap in our understanding of the biology of prokaryotes. In determining the activity of large sets of genes, proteins, and the interactions between them, an important step towards constructing a functional model of the entire organism will be taken. This basic information will provide the framework for future research; for example, in bacterial pathogens, opportunities for vaccine design are unprecedented because the complete inventory of genes encoding every virulence factor and potential immunogen is available for selection. The combination of 2D-gel electrophoresis and immunoblot analysis of the whole organism, or a subset such as the cell envelope, should identify all immunodominant proteins. Indeed, in an attempt to identify a N. meningitidis vaccine target, over 350 candidate antigens based on the recently completed N. meningitidis type B genome sequence were expressed in E. coli and tested for their vaccine efficacy (Pizza et al., 2000). By deduction, approaches for the systematic mutagenesis of all genes in a genome will identify genes essential for the viability of the organism. Such genes/gene products are potential targets for drug design. The availability of the genome content of multiple organisms on a DNA microarray or an Affymetrix biochip should allow more accurate identification of organisms and, in a clinical setting, more rapid diagnosis. Exploitation of a database of nucleotide differences among strains should allow the design of a biochip to differentiate subtle differences between strains. Such a universal prokaryote biochip would have profound implications in studying the epidemiology, population genetics, molecular phylogeny, and evolution of prokaryotes.

Genomics Glossary

Adapted from the ([{}]).

Affymetrix DNA chip: High-density array of evenly spaced in-situ synthesized oligonucleotides on a silicon support. The company Affymetrix produces these DNA arrays using in-situ, light-directed, combinatorial nucleotide synthesis. An Affymetrix DNA chip the size of a thumbnail can contain up to 50,000 oligonucleotide probes.

allelic replacement: The exchange of a gene via homologous recombination. This method is often used to specifically exchange a wild-type gene with a mutated gene to construct a rationally defined isogenic mutant.

apoptosis: The process by which cells are programmed to self-destruct at an appropriate moment in an organism's life cycle. If the apoptotic process malfunctions in a cell, uncontrolled cell growth may result and can contribute to the development of cancer.

bioinformatics: The science of informatics as applied to biological research. Informatics is the management and analysis of data using advanced computing techniques. Bioinformatics is particularly important as an adjunct to genomics research because of the large amount of complex data this research generates.

cloning vector: A DNA molecule originating from a virus, a plasmid, or the cell of a higher organism into which another DNA fragment of appropriate size can be integrated without loss of the vector's capacity for self-replication. Vectors introduce foreign DNA into host cells, where it can be reproduced in large quantities; examples are plasmids, cosmids, and yeast artificial chromosomes. Vectors are often recombinant molecules containing DNA sequences from several sources.

combinatorial chemistry: A technique for rapidly and systematically assembling a variety of molecular entities, or building blocks, in many different combinations, to create tens of thousands of diverse compounds that can be tested in drug discovery screening assays.

comparative genomics: The study of the degree of relatedness of complete genome sequences from different strains or organisms. Such studies provide important taxonomic and evolutionary insights.

contigs: Groups of clones representing overlapping regions of a genome.

differential gene expression (DGE): Comparative analysis of mRNA levels of individual genes from cells functioning in different environments. This method is frequently performed at the genome level as transcriptome analysis.

directed mutagenesis: A specific alteration of a cloned gene in vitro before the gene is placed back into the organism.

DNA microarray: High-density array of evenly spaced DNA spots of generally 100 to 1,000 bp gridded onto glass slides. Typically, about 4,000 individual gene fragments can be spotted onto a microscope slide.

DNA macroarray: High-density array of evenly spaced DNA spots of generally 100 to 1,000 bp gridded onto nylon membranes. Generally, arrays have a lower density than Affymetrix chips or microarrays and generally they are not amenable to differential fluorescence hybridizaton analysis.

expressed sequence tag (EST): A short strand of DNA (ca. 200 bps), which is part of a cDNA. Because cDNAs correspond to a particular gene in the genome, and ESTs correspond to particular cDNAs, ESTs can be used to help identify unknown genes and to map their position in the genome.

functional genomics: The process of determining the function of individual genes at a genome-wide scale.

gene library: A collection of cloned DNA fragments, which, taken together, represent the entire genome of a specific organism. Such libraries or "gene banks" are assembled so as to allow the isolation and study of individual genes. Gene libraries are produced by first breaking up or "fractionating" an entire genome. This fractionation can be accomplished either by sonication or other physical methods.

gene expression: The process by which the information in a gene is used to create proteins.

genetic polymorphism: A difference in DNA sequence among individuals, groups or populations. Genetic polymorphisms may be the result of chance processes or may have been induced by external agents (such as viruses or radiation). If a difference in DNA sequence among individuals has been shown to be associated with disease, it usually will be called a genetic mutation. Changes in DNA sequence, which have been confirmed to be caused by external agents, are also generally called "mutations" rather than "polymorphisms."

genetic map: A map of a genome that shows the relative positions of the genes and/or markers on the chromosomes.

genome: All the genetic material in a particular organism; its size is generally given as its total number of base pairs.

genomic library: A collection of clones made from a set of randomly generated overlapping DNA fragments representing the entire genome of an organism.

genomics: The study of genes and their function. Recent advances in genomics are bringing about a revolution in our understanding of the molecular mechanisms of disease, including the complex interplay of genetic and environmental factors.

homologies: Similarities in DNA or protein sequences between individuals of the same species or between species.

hybridization: The process of joining two complementary strands of DNA or one of DNA or of RNA to form a double-stranded molecule.

messenger RNA (mRNA): The DNA of a gene is transcribed into mRNA molecules, which then serve as templates for protein synthesis.

mutation: A change, deletion or rearrangement in the DNA sequence that may lead to the synthesis of an altered inactive protein or to the loss of the ability to produce the protein.

oligonucleotide: A molecule made up of a small number of nucleotides, typically fewer than 25. Oligonucleotides are frequently used as DNA synthesis primers in sequencing, in PCR, or in Affymetrix DNA arrays.

orthologues: Homologous genes between species.

paralogues: Homologous genes within the same strain/species.

pharmacogenomics: The science of understanding the correlation between an individual patient's genetic makeup (genotype) and their response to drug treatment. Some drugs work well in some patient populations and not as well in others. Studying the genetic basis of patient response to therapeutics allows drug developers to more effectively design therapeutic treatments.

physical map: A map of the locations of identifiable landmarks on DNA (e.g., restriction enzyme cutting sites, genes), regardless of inheritance. Distance is measured in base pairs. For the human genome, the lowest-resolution physical map is the banding patterns on the 24 different chromosomes; the highest-resolution map would be the complete nucleotide sequence of the chromosomes.

pleiotropy: One gene leading to many different phenotypic expressions. An example of a pleiotropic gene is phoP in Salmonella, which controls over 40 genes associated with virulence and survival inside and outside the host.

proteome: The complete set of proteins that is expressed and modified by the entire genome in the lifetime of a cell.

proteomics: The study of the proteome using technologies of large-scale protein separation and identification.

rational drug design: A process for designing drugs based upon the structure of the protein target of the drug. This approach has been enhanced recently through use of combinatorial chemistry and high-throughput screening.

restriction fragment length polymorphism (RFLP): Variation between individuals in DNA fragment sizes produced after cutting DNA with specific restriction enzymes. Polymorphic sequences that result in RFLPs are used as markers on both physical maps and genetic linkage maps. Almost all RFLPs are caused by mutations in the restriction enzyme recognition site.

sequencing: Determining the order of nucleotides in a DNA or RNA molecule, or determining the order of amino acids in a protein.

shotgun method: Cloning of DNA fragments randomly generated from a genome.

signature-tagged allele replacement (STAR): A modification of STM where mutants are DNA-tagged systematically prior to allele replacement during their construction.

signature-tagged mutagenesis (STM): Mutants generated by transposon mutagenesis are tagged with 20-base sequences, which act as unique identifiers. The pool of tagged mutants allows one to study en masse the relative abundance and survivability of all mutants in any given environment.

single nucleotide polymorphism (SNP): A single nucleotide alteration in a gene, frequently characteristic of a genetic trait. A SNP can be assayed en masse using an Affymetrix DNA array.

telomere: A series of repeated DNA sequences located at the end of a chromosome. Telomeres serve to assure that a chromosome is replicated properly each time a cell divides. Each time a cell divides, some of the telomere is lost in the process. Eventually little or no telomere remains and the cell dies.

transcriptome analysis: The simultaneous analysis of all transcripts within a cell by hybridizing mRNA to a DNA micro-, macro- or Affymetrix DNA array.

transposable elements: A class of DNA sequences that can move from one chromosome or plasmid site to another. Transposable elements are frequently referred to as jumping genes.

transposon mutagenesis: Insertion of a transposable element into a genome (usually bacterial) to generate a pool of random mutants.

toxicogenomics: A new scientific subdiscipline that combines the emerging technologies of genomics and bioinformatics to identify and characterize mechanisms of action of known and suspected toxicants. Currently, the premier toxicogenomic tools are the DNA microarray and the DNA chip.

wild type: The form of an organism that occurs most frequently in nature.

Literature Cited

  • Alm, R. A., L. S. Ling, D. T. Moir, B. L. King, E. D. Brown, P. C. Doig, D. R. Smith, B. Noonan, B. C. Guild, B. L. deJonge, G. Carmel, P. J. Tummino, A. Caruso, M. Uria-Nickelsen, D. M. Mills, C. Ives, R. Gibson, D. Merberg, S. D. Mills, Q. Jiang, D. E. Taylor, G. F. Vovis, and T. J. Trust. 1999 Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori Nature 397 176-180
  • Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990 Basic local alignment search tool J. Molec. Biol. 215 403-410
  • Andersson, S. G., A. Zomorodipour, J. O. Andersson, T. Sicheritz-Ponten, U. C. Alsmark, R. M. Podowski, A. K. Naslund, A. S. Eriksson, H. H. Winkler, and C. G. Kurland. 1998 The genome sequence of Rickettsia prowazekii and the origin of mitochondria Nature 396 133-140
  • Behr, M. A., M. A. Wilson, W. P. Gill, H. Salamon, G. K. Schoolnik, S. Rane, and P. M. Small. 1999 Comparative genomics of BCG vaccines by whole-genome DNA microarray Science 284 1520-1523
  • Blattner, F. R., G. Plunkett 3rd, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W. Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao. 1997 The complete genome sequence of Escherichia coli K-12 Science 277 1453-1474
  • Brown, P. O., and D. Botstein. 1999 Exploring the new world of the genome with DNA microarrays Nat. Genet. 21 33-37
  • Bult, C. J., O. White, G. J. Olsen, L. Zhou, R. D. Fleischmann, G. G. Sutton, J. A. Blake, L. M. FitzGerald, R. A. Clayton, J. D. Gocayne, A. R. Kerlavage, B. A. Dougherty, J. F. Tomb, M. D. Adams, C. I. Reich, R. Overbeek, E. F. Kirkness, K. G. Weinstock, J. M. Merrick, A. Glodek, J. L. Scott, N. S. M. Geoghagen, and J. C. Venter. 1996 Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii Science 273 1058-1073
  • Butler, D. 2000 Computing 2010: From black holes to biology Nature 402 67-70
  • Camacho, L. R., D. Ensergueix, E. Perez, B. Gicquel, and C. Guilhot. 1999 Identification of a virulence gene cluster of Mycobacterium tuberculosis by signature-tagged transposon mutagenesis Molec. Microbiol. 34 257-267
  • Chee, M., R. Yang, E. Hubbell, A. Berno, X. C. Huang, D. Stern, J. Winkler, D. J. Lockhart, M. S. Morris, and S. P. Fodor. 1996 Accessing genetic information with high-density DNA arrays Science 274 610-614
  • Chiang, S. L., and J. J. Mekalanos. 1998 Use of signature-tagged transposon mutagenesis to identify Vibrio cholerae genes critical for colonization Molec. Microbiol. 27 797-805
  • Chien, C. T., P. L. Bartel, R. Sternglanz, and S. Fields. 1991 The two-hybrid system: A method to identify and clone genes for proteins that interact with a protein of interest Proc. Natl. Acad. Sci. USA 88 9578-7582
  • Cho, R. J., M. J. Campbell, E. A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. G. Wolfsberg, A. E. Gabrielian, D. Landsman, D. J. Lockhart, and R. W. Davis. 1998 A genome-wide transcriptional analysis of the mitotic cell cycle Mol Cell 2 65-73
  • Cho, R. J., M. Mindrinos, D. R. Richards, R. J. Sapolsky, M. Anderson, E. Drenkard, J. Dewdney, T. L. Reuber, M. Stammers, N. Federspiel, A. Theologis, W. H. Yang, E. Hubbell, M. Au, E. Y. Chung, D. Lashkari, B. Lemieux, C. Dean, R. J. Lipshutz, F. M. Ausubel, R. W. Davis, and P. J. Oefner. 1999 Genome-wide mapping with biallelic markers in Arabidopsis thaliana Nat. Genet. 23 203-207
  • Claus, H., M. Frosch, and U. Vogel. 1998 Identification of a hotspot for transformation of Neisseria meningitidis by shuttle mutagenesis using signature-tagged transposons Mol. Gen. Genet. 259 363-371
  • Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V. Gordon, K. Eiglmeier, S. Gas, C. E. Barry, 3rd, F. Tekaia, K. Badcock, D. Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T. Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, B. G. Barrell, et al. 1998 Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence Nature 393 537-544
  • Darwin, A. J., and V. L. Miller. 1999 Identification of Yersinia enterocolitica genes affecting survival in an animal host using signature-tagged transposon mutagenesis Molec. Microbiol. 32 51-62
  • Day, J. B., and G. V. Plano. 1998 A complex composed of SycN and YscB functions as a specific chaperone for YopN in Yersinia pestis Molec. Microbiol. 30 777-788
  • Deckert, G., P. V. Warren, T. Gaasterland, W. G. Young, A. L. Lenox, D. E. Graham, R. Overbeek, M. A. Snead, M. Keller, M. Aujay, R. Huber, R. A. Feldman, J. M. Short, G. J. Olsen, and R. V. Swanson. 1998 The complete genome of the hyperthermophilic bacterium Aquifex aeolicus Nature 392 353-358
  • DeRisi, J. L., and V. R. Iyer. 1999 Genomics and array technology Curr. Opin. Oncol. 11 76-79
  • Edelstein, P. H., M. A. Edelstein, F. Higa, and S. Falkow. 1999 Discovery of virulence genes of Legionella pneumophila by using signature tagged mutagenesis in a guinea pig pneumonia model Proc. Natl. Acad. Sci. USA 96 8190-8195
  • Fernandez, J., F. Gharahdaghi, and S. M. Mische. 1998 Routine identification of proteins from sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) gels or polyvinyl difluoride membranes using matrix assisted laser desorption/ionization-time of flight-mass spectrometry (MALDI-TOF-MS) Electrophoresis 19 1036-1045
  • Fields, S., and O. Song. 1989 A novel genetic system to detect protein-protein interactions Nature 340 245-246
  • Fields, S., and R. Sternglanz. 1994 The two-hybrid system: an assay for protein-protein interactions Trends Genet. 10 286-292
  • Fleischmann, R. D., M. D. Adams, O. White, R. A. Clayton, E. F. Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty, J. M. Merrick, et al. 1995 Whole-genome random sequencing and assembly of Haemophilus influenzae Rd Science 269 496-512
  • Foulongne, V., G. Bourg, C. Cazevieille, S. Michaux-Charachon, and D. O'Callaghan. 2000 Identification of Brucella suis genes affecting intracellular survival in an in vitro human macrophage infection model by signature-tagged transposon mutagenesis Infect. Immunol. 68 1297-1303
  • Fraser, C. M., J. D. Gocayne, O. White, M. D. Adams, R. A. Clayton, R. D. Fleischmann, C. J. Bult, A. R. Kerlavage, G. Sutton, J. M. Kelley, et al. 1995 The minimal gene complement of Mycoplasma genitalium Science 270 397-403
  • Fraser, C. M., S. Casjens, W. M. Huang, G. G. Sutton, R. Clayton, R. Lathigra, O. White, K. A. Ketchum, R. Dodson, E. K. Hickey, M. Gwinn, B. Dougherty, J. F. Tomb, R. D. Fleischmann, D. Richardson, J. Peterson, A. R. Kerlavage, J. Quackenbush, S. Salzberg, M. Hanson, R. van Vugt, N. Palmer, M. D. Adams, J. Gocayne, J. C. Venter, et al. 1997 Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi Nature 390 580-586
  • Fraser, C. M., S. J. Norris, G. M. Weinstock, O. White, G. G. Sutton, R. Dodson, M. Gwinn, E. K. Hickey, R. Clayton, K. A. Ketchum, E. Sodergren, J. M. Hardham, M. P. McLeod, S. Salzberg, J. Peterson, H. Khalak, D. Richardson, J. K. Howell, M. Chidambaram, T. Utterback, L. McDonald, P. Artiach, C. Bowman, M. D. Cotton, J. C. Venter, et al. 1998 Complete genome sequence of Treponema pallidum, the syphilis spirochete Science 281 375-388
  • Hall, M. C., and S. W. Matson. 1999 The Escherichia coli MutL protein physically interacts with MutH and stimulates the MutH-associated endonuclease activity J. Biol. Chem. 274 1306-1312
  • Hartland, E. L., M. Batchelor, R. M. Delahay, C. Hale, S. Matthews, G. Dougan, S. Knutton, I. Connerton, and G. Frankel. 1999 Binding of intimin from enteropathogenic Escherichia coli to Tir and to host cells Molec. Microbiol. 32 151-158
  • Hensel, M., J. E. Shea, C. Gleeson, M. D. Jones, E. Dalton, and D. W. Holden. 1995 Simultaneous identification of bacterial virulence genes by negative selection Science 269 400-403
  • Himmelreich, R., H. Hilbert, H. Plagens, E. Pirkl, B. C. Li, and R. Herrmann. 1996 Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae Nucleic Acids Res. 24 4420-4449
  • Hood, D. W., M. E. Deadman, M. P. Jennings, M. Bisercic, R. D. Fleischmann, J. C. Venter, and E. R. Moxon. 1996 DNA repeats identify novel virulence genes in Haemophilus influenzae Proc. Natl. Acad. Sci. USA 93 11121-11125
  • Hutchison, C. A., S. N. Peterson, S. R. Gill, R. T. Cline, O. White, C. M. Fraser, H. O. Smith, and J. C. Venter. 1999 Global transposon mutagenesis and a minimal Mycoplasma genome Science 286 2165-2169
  • Huynen, M. A., T. Dandekar, and P. Bork. 1999 Variation and evolution of the citric-acid cycle: A genomic perspective Trends Microbiol. 7 281-291
  • Jungblut, P. R., U. E. Schaible, H. J. Mollenkopf, U. Zimny-Arndt, B. Raupach, J. Mattow, P. Halada, S. Lamer, K. Hagens, and S. H. Kaufmann. 1999 Comparative proteome analysis of Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: Towards functional genomics of microbial pathogens Molec. Microbiol. 33 1103-1117
  • Kalman, S., W. Mitchell, R. Marathe, C. Lammel, J. Fan, R. W. Hyman, L. Olinger, J. Grimwood, R. W. Davis, and R. S. Stephens. 1999 Comparative genomes of Chlamydia pneumoniae and C. trachomatis Nat. Genet. 21 385-389
  • Kaneko, T., S. Sato, H. Kotani, A. Tanaka, E. Asamizu, Y. Nakamura, N. Miyajima, M. Hirosawa, M. Sugiura, S. Sasamoto, T. Kimura, T. Hosouchi, A. Matsuno, A. Muraki, N. Nakazaki, K. Naruo, S. Okumura, S. Shimpo, C. Takeuchi, T. Wada, A. Watanabe, M. Yamada, M. Yasuda, and S. Tabata. 1996 Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II: Sequence determination of the entire genome and assignment of potential protein-coding regions DNA Res. 3 109-136
  • Karlyshev, A. V., M. Pallen, and B. W. Wren. in press A single primer PCR (SP-PCR) procedure for rapid identification of transposon insertion sites In: Biotechniques
  • Kawarabayasi, Y., M. Sawada, H. Horikawa, Y. Haikawa, Y. Hino, S. Yamamoto, M. Sekine, S. Baba, H. Kosugi, A. Hosoyama, Y. Nagai, M. Sakai, K. Ogura, R. Otsuka, H. Nakazawa, M. Takamiya, Y. Ohfuku, T. Funahashi, T. Tanaka, Y. Kudoh, J. Yamazaki, N. Kushida, A. Oguchi, K. Aoki, and H. Kikuchi. 1998 Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3 DNA Res. 5 55-76
  • Kawarabayasi, Y., Y. Hino, H. Horikawa, S. Yamazaki, Y. Haikawa, K. Jin-no, M. Takahashi, M. Sekine, S. Baba, A. Ankai, H. Kosugi, A. Hosoyama, S. Fukui, Y. Nagai, K. Nishijima, H. Nakazawa, M. Takamiya, S. Masuda, T. Funahashi, T. Tanaka, Y. Kudoh, J. Yamazaki, N. Kushida, A. Oguchi, H. Kikuchi, et al. 1999 Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1 DNA Res. 6 83-101, 145-152
  • Klenk, H. P., R. A. Clayton, J. F. Tomb, O. White, K. E. Nelson, K. A. Ketchum, R. J. Dodson, M. Gwinn, E. K. Hickey, J. D. Peterson, D. L. Richardson, A. R. Kerlavage, D. E. Graham, N. C. Kyrpides, R. D. Fleischmann, J. Quackenbush, N. H. Lee, G. G. Sutton, S. Gill, E. F. Kirkness, B. A. Dougherty, K. McKenney, M. D. Adams, B. Loftus, J. C. Venter, et al. 1997 The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus Nature 390 364-370
  • Kunst, F., N. Ogasawara, I. Moszer, A. M. Albertini, G. Alloni, V. Azevedo, M. G. Bertero, P. Bessieres, A. Bolotin, S. Borchert, R. Borriss, L. Boursier, A. Brans, M. Braun, S. C. Brignell, S. Bron, S. Brouillet, C. V. Bruschi, B. Caldwell, V. Capuano, N. M. Carter, S. K. Choi, J. J. Codani, I. F. Connerton, A. Danchin, et al. 1997 The complete genome sequence of the Gram-positive bacterium Bacillus subtilis Nature 390 249-256
  • Lashkari, D. A., J. L. DeRisi, J. H. McCusker, A. F. Namath, C. Gentile, S. Y. Hwang, P. O. Brown, and R. W. Davis. 1997 Yeast microarrays for genome wide parallel genetic and gene expression analysis Proc. Natl. Acad. Sci. USA 94 13057-13062
  • Mei, J. M., F. Nourbakhsh, C. W. Ford, and D. W. Holden. 1997 Identification of Staphylococcus aureus virulence genes in a murine model of bacteraemia using signature-tagged mutagenesis Molec. Microbiol. 26 399-407
  • Nelson, K. E., R. A. Clayton, S. R. Gill, M. L. Gwinn, R. J. Dodson, D. H. Haft, E. K. Hickey, J. D. Peterson, W. C. Nelson, K. A. Ketchum, L. McDonald, T. R. Utterback, J. A. Malek, K. D. Linher, M. M. Garrett, A. M. Stewart, M. D. Cotton, M. S. Pratt, C. A. Phillips, D. Richardson, J. Heidelberg, G. G. Sutton, R. D. Fleischmann, J. A. Eisen, C. M. Fraser, et al. 1999 Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima Nature 399 323-329
  • O'Connor, C. D., M. Farris, R. Fowler, and S. Y. Qi. 1997 The proteome of Salmonella enterica serovar typhimurium: Current progress on its determination and some applications Electrophoresis 18 1483-1490
  • Overbeek, R., M. Fonstein, M. D'Souza, G. D. Pusch, and N. Maltsev. 1999 The use of gene clusters to infer functional coupling Proc. Natl. Acad. Sci. USA 96 2896-2901
  • Pappin, D. J. 1997 Peptide mass fingerprinting using MALDI-TOF mass spectrometry Methods Molec Biol. 64 165-173
  • Parkhill, J., M. Achtman, K. D. James, S. D. Bentley, C. Churcher, S. R. Klee, G. Morell, D. Basham, D. Brown, T. Chillingworth, R. M. Davies, P. Davis, K. Devlin, T. Feltwell, N. Hamlin, S. Holroyd, K. Jagels, S. Leather, S. Moule, K. Mungall, M. A. Quail, M.-A. Rajandream, K. M. Rutherford, M. Simmonds, J. Skelton, S. Whitehead, B. G. Spratt, and B. G. Barrell. 2000a Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491 Nature 404 502-506
  • Parkhill, J., B. W. Wren, K. Mungall, J. M. Ketley, C. Churcher, D. Basham, T. Chillingworth, R. M. Davies, T. Feltwell, S. Holroyd, K. Jagels, A. V. Karlyshev, S. Moule, M. J. Pallen, C. W. Penn, M. A. Quail, M. A. Rajandream, K. M. Rutherford, A. H. van Vliet, S. Whitehead, and B. G. Barrell. 2000b The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences Nature 403 665-668
  • Pizza, M., S. V. Masignani, M. M. Giuliani, B. Arico, M. Comanducci, G. T. Jennings, L. Baldi, E. Bartolini, B. Capecchi, C. L. Galeotti, E. Luzzi, R. Manetti, E. Marchetti, M. Mora, S. Nuti, G. Ratti, L. Santini, S. Savino, M. Scarselli, E. Storni, P. Zuo, M. Broeker, E. Hundt, B. Knapp, and E. Blair. 2000 Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing Science 287 1816-1820
  • Polissi, A. P. A., G. Feger, M. Altieri, H. Mottl, L. Ferrari, and D. Simon. 1998 Large-scale identification of virulence genes from Streptococcus pneumoniae Infect. Immunol. 66 5620-5629
  • Ross, D. T., U. Scherf, M. B. Eisen, C. M. Perou, C. Rees, P. Spellman, V. Iyer, S. S. Jeffrey, M. Van De Rijn, M. Waltham, A. Pergamenschikov, J. C. Lee, D. Lashkari, D. Shalon, T. G. Myers, J. N. Weinstein, D. Botstein, and P. O. Brown. 2000 Systematic variation in gene expression patterns in human cancer cell lines Nat. Genet. 24 227-235
  • Scherf, U., D. T. Ross, M. Waltham, L. H. Smith, J. K. Lee, L. Tanabe, K. W. Kohn, W. C. Reinhold, T. G. Myers, D. T. Andrews, D. A. Scudiero, M. B. Eisen, E. A. Sausville, Y. Pommier, D. Botstein, P. O. Brown, and J. N. Weinstein. 2000 A gene expression database for the molecular pharmacology of cancer Nat. Genet. 24 236-244
  • Shea, J. E., M. Hensel, C. Gleeson, and D. W. Holden. 1996 Identification of a virulence locus encoding a second type III secretion system in Salmonella typhimurium Proc. Natl. Acad. Sci. USA 93 2593-2597
  • Shea, S., and D. W. Holden. 2000 Signature-tagged mutagenesis helps identify virulence genes American Society for Microbiology News 66 15-20
  • Shoemaker, D. D., D. A. Lashkari, D. Morris, M. Mittmann, and R. W. Davis. 1996 Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy Nat. Genet. 14 450-456
  • Smith, D. R., L. A. Doucette-Stamm, C. Deloughery, H. Lee, J. Dubois, T. Aldredge, R. Bashirzadeh, D. Blakely, R. Cook, K. Gilbert, D. Harrison, L. Hoang, P. Keagle, W. Lumm, B. Pothier, D. Qiu, R. Spadafora, R. Vicaire, Y. Wang, J. Wierzbowski, R. Gibson, N. Jiwani, A. Caruso, D. Bush, J. N. Reeve, et al. 1997 Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: Functional analysis and comparative genomics J. Bacteriol. 179 7135-7155
  • Sonnenberg, M. G., and J. T. Belisle. 1997 Definition of Mycobacterium tuberculosis culture filtrate proteins by two-dimensional polyacrylamide gel electrophoresis, N-terminal amino acid sequencing, and electrospray mass spectrometry Infect. Immunol. 65 4515-424
  • Stephens, R. S., S. Kalman, C. Lammel, J. Fan, R. Marathe, L. Aravind, W. Mitchell, L. Olinger, R. L. Tatusov, Q. Zhao, E. V. Koonin, and R. W. Davis. 1998 Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis Science 282 754-759
  • Tekaia, F., S. V. Gordon, T. Garnier, R. Brosch, B. G. Barrell, and S. T. Cole. 1999 Analysis of the proteome of Mycobacterium tuberculosis in silico Tuber. Lung Dis. 79 329-342
  • Tettelin, H. S. N., J. Heidelberg, A. C. Jeffries, K. E. Nelson, J. A. Eisen, K. A. Ketchum, D. W. Hood, J. F. Peden, R. J. Dodson, W. C. Nelson, M. L. Gwinn, R. DeBoy, J. D. Peterson, E. K. Hickey, D. H. Haft, S. L. Salzberg, O. White, R. D. Fleischmann, B. A. Dougherty, T. Mason, A. Ciecko, D. S. Parksey, E. Blair, and H. Cittone. 2000 Complete genome sequence of neisseria meningitidis serogroup B strain MC58 Science 287 1809-1815
  • Tomb, J. F., O. White, A. R. Kerlavage, R. A. Clayton, G. G. Sutton, R. D. Fleischmann, K. A. Ketchum, H. P. Klenk, S. Gill, B. A. Dougherty, K. Nelson, J. Quackenbush, L. Zhou, E. F. Kirkness, S. Peterson, B. Loftus, D. Richardson, R. Dodson, H. G. Khalak, A. Glodek, K. McKenney, L. M. Fitzegerald, N. Lee, M. D. Adams, J. C. Venter, et al. 1997 The complete genome sequence of the gastric pathogen Helicobacter pylori Nature 388 539-547
  • Uetz, P., L. Giot, G. Cagney, T. A. Mansfield, R. S. Judson, J. R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J. M. Rothberg. 2000 A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Nature 403 623-627
  • Wasinger, V. C., J. D. Pollack, and I. Humphery-Smith. 2000 The proteome of Mycoplasma genitalium Chaps-soluble component Eur. J. Biochem. 267 1571-1582
  • White, O., J. A. Eisen, J. F. Heidelberg, E. K. Hickey, J. D. Peterson, R. J. Dodson, D. H. Haft, M. L. Gwinn, W. C. Nelson, D. L. Richardson, K. S. Moffat, H. Qin, L. Jiang, W. Pamphile, M. Crosby, M. Shen, J. J. Vamathevan, P. Lam, L. McDonald, T. Utterback, C. Zalewski, K. S. Makarova, L. Aravind, M. J. Daly, C. M. Fraser, et al. 1999 Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1 Science 286 1571-1577
  • Williams, J. M., G. C. Chen, L. Zhu, and R. F. Rest. 1998 Using the yeast two-hybrid system to identify human epithelial cell proteins that bind gonococcal Opa proteins: intracellular gonococci bind pyruvate kinase via their Opa proteins and require host pyruvate for growth Molec. Microbiol. 27 171-186
  • Winzeler, E. A., D. R. Richards, A. R. Conway, A. L. Goldstein, S. Kalman, M. J. McCullough, J. H. McCusker, D. A. Stevens, L. Wodicka, D. J. Lockhart, and R. W. Davis. 1998 Direct allelic variation scanning of the yeast genome Science 281 1194-1197
  • Winzeler, E. A., B. Lee, J. H. McCusker, and R. W. Davis. 1999a Whole genome genetic-typing in yeast using high-density oligonucleotide arrays Parasitology 118 (Suppl.) S73-80
  • Winzeler, E. A., D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, A. M. Chu, C. Connelly, K. Davis, F. Dietrich, S. W. Dow, M. El Bakkoury, F. Foury, S. H. Friend, E. Gentalen, G. Giaever, J. H. Hegemann, T. Jones, M. Laub, H. Liao, R. W. Davis, et al. 1999b Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis Science 285 901-906