▶Comparative genomics

With genomic data, via DNA sequencing, becoming more easily available, it is becoming increasingly relevant to consider the whole of a microorganism rather than its individual genes. In this way subtle differences can be examined (for example, what makes Bacillus anthracis the causative agent of anthrax compared to the genetically very similar Bacillus cereus). The arrangement of genes relative to one another, their presence or absence, the sequence of the genes, and intergenic regions are all used not only to compare species of Bacteria, but also to examine more distant relationships, for example, between higher animals and microorganisms. The size of higher organisms’ genomes (the human genome is 3200 million bp (Mbp) has made widespread whole genome sequencing a task performed by a few consortia of public and private laboratories worldwide. However, the relatively small size of Bacterial and Archaeal genomes (3–8 Mbp) has led to the release of new complete genome sequences on an almost weekly basis. The way in which DNA sequences are obtained and assembled into genome-sized pieces is beyond the scope of this text, but a primary consideration in approaching genomics is how the microorganism’s genome is arranged.

▶Generalized structure of the Bacterial genome

The Bacterial genome is often portrayed as a stable, single, circular molecule. However, the genomes of most Bacteria are fluid (constantly changing in response to external stimuli) and composed of several molecules including extra chromosomes, mega plasmids and plasmids.

The model organism for molecular biology, Escherichia coli, is considered to be the paradigm for all Bacterial and Archaeal genomes. However, its single haploid circular chromosome, consisting of around 4.6 million bp, is rather unusual compared with other genera, but is by far the best studied. Other Bacterial genomes comprise several chromosomes, some of which are circular and some of which are linear.

The size of a Bacterial genome is related to the ecological niche in which the organisms live. Obligate pathogens, such as the causative agent of epidemic typhus (Rickettsia prowazekii), seem to have minimized their genomes to such an extent that they rely on host proteins and metabolites in order to replicate. This is taken to the extreme in the smallest known genome, that of Carsonella ruddii, which is composed of only 159 663 base pairs of DNA. In comparison, free-living organisms, such as the metabolically versatile Pseudomonas aeruginosa and Streptomyces coelicolor, have to cope with changes in temperature over tens of degrees, varying carbon and energy sources in the space of minutes, and other environmental challenges. As a consequence they have a larger complement of genes regulated by a more complex sensing apparatus, and thus a larger genome.

Another strategy used by microorganisms to cope with transient environmental change is the acquisition of plasmids. Plasmids are small circular extrachromosomal pieces of DNA, which replicate independently of the genome. In contrast to the singular genome, there may be between 10 and 100a•›000 complete copies of a plasmid in a Bacterial cell. Plasmids may carry genes that allow the microorganism to become pathogenic (one of the main differences between species of Salmonella is the presence of plasmid(s) carrying pathogenicity factors), resist antibiotics (resistance to kanamycin, streptomycin, and many other antibiotics may be carried on plasmids) or metabolize a particular set of compounds (for example, the proteins making up the xyl pathway used by Pseudomonas putida for the degradation of toluene). Occasionally these plasmids are integrated into the genome and only exist as extrachromosomal DNA in the presence of certain physiological stimuli. While the plasmids that are used in molecular biology are in the range of 2.5–10 thousand bp (Kbp), naturally occurring plasmids can be many hundreds of thousands of base pairs in size, bringing into question the philosophical difference between these megaplasmids and the chromosomes themselves.

The characteristics that distinguish Bacterial genomes from the eukaryotes lie mainly in how the genetic information is arranged. Relatively speaking, the Bacterial genome is information-rich, containing many regions coding for proteins and RNA but comparatively few regions involved with the regulation of expression. Genes of similar function tend to be clustered together, and often genes in a single metabolic pathway or all involved in the synthesis of a complex multi-subunit protein are found in operons. Genes in an operon are sometimes so tightly packed together that they overlap.

The fluidity of the Bacterial genome is reflected in gene order found in different Bacterial genera: there is no similarity in the arrangement of genes among the major phyla, and often gene order is very different in species of the same genus. Different Bacterial genomes have varying composition in terms of nucleotides. The G+C content of the Bacteria ranges from 25 to 75%, and this is often reflected in the more frequent use of certain codons for certain amino acids (termed codon usage). While Bacterial genomes do contain repeating elements, they are often long repeats of >10 bp and may be associated with pathogenicity islands, insertion sequences or the remnants of excised lysogenic bacteriophage.

▶Generalized structure of Archaeal genome

A typical Archaeal genome is very similar to that of a Bacterial one. Generally the chromosome is single and circular, of a similar size to the Bacteria, and may be complemented by the presence of plasmids. The main differences are in the fine structure of the arrangement of genes and the proteins that associate with the genomic

DNA. While the Archaea have operons and tend to exhibit clustering of genes according to function, the arrangement of the genes has elements in common with both the eukaryotes and the Bacteria. An Archaeal operon may contain genes that have close relatives in both the other kingdoms, and rarely the genes themselves may be made up of domains that may have origins in different kingdoms. However, about a third of the genes in any archaeon are unique to this kingdom.

The replication origin of the Archaeal genome has many features in common with the eukaryotes and this similarity in the gross chromosomal features is apparent through the use of histone-like proteins to stabilize the chromosomal tertiary structure.

▶Eukaryotic genomes

The smallest eukaryotic genome is that of the parasite Encephalitozoon cuniculi (2.5million bp), and many eukaryotic microorganisms have smaller genomes than the larger more differentiated organisms. Eukaryote genomes are characterized by having a large number of chromosomes (between 4 and 105 in the haploid state) but do not generally have stable extrachromosomal DNA as plasmids. As well as the nuclear chromosomes, some of the cell organelles (mitochondrion, chloroplast) have their own chromosomes, which code for proteins specific for the function of the organelle.

Fungal genomes are characterized by their lack of introns (only 43% of Saccharomyces pombe genes contain introns of a total of 4730). These introns are small, being only 50–200 bp in size compared with the introns of >10 kb in mammals. Although the genes are not as tightly packed as in Bacteria or Archaea, fungal genomes are information-rich and contain little repetitive DNA (50–60% of the S. cerevisiae nuclear genome is transcribed, compared with 33% of Schizophyllum commune and only 1% in Homo sapiens).