The first stage of converting the primary genetic information from the stable DNA code into protein is to transcribe the DNA into messenger RNA (mRNA). This process is carried out by RNA polymerase, which has many common features with the DNA polymerases. The structure of the RNA polymerase is a little similar to the DNA polymerases in the active site region, and also requires Mg2+ to function and synthesizes nucleic acid in a 5’ to 3’ direction. However, the RNA polymerases do not require a primer to initiate synthesis. Instead, their signal for the initiation of transcription is a specific sequence on the DNA, known as the promoter region.

A gene’s promoter is said to be ‘upstream’, in that the promoter region is situated to the 5’ end of the coding region. The promoter region allows the RNA polymerase to bind and begin transcription so that the resulting mRNA contains not only the coding region itself but also all the signals to start and stop the synthesis of the polypeptide. How RNA polymerase works is intrinsic to the concept that one gene makes one polypeptide. In eukaryotes, genes are arranged so that the promoter region is in such a position that when transcription occurs a single mRNA molecule is produced that can be used to code for a single polypeptide (monocistronic). Genes that code for polypeptides that have a common purpose (such as the manufacture of a multi-polypeptide protein) are placed in many different parts of the genome, frequently on different chromosomes. In prokaryotes (both the Bacteria and the Archaea) genes are more likely to be arranged so that the coding regions for enzymes involved in a single pathway are clustered together. Furthermore, several genes may be arranged so close to one another that they are transcribed from a single promoter. This polycistronic arrangement is called an operon.


RNA polymerase is composed of four different polypeptides, called beta, beta prime, alpha, and the sigma factor (αββ’σ). The core polymerase (αββ’) has a high processivity (RNA polymerizing activity) but low DNA affinity. The addition of the sigma factor to make holo-polymerase confers DNA sequence specificity, forcing the RNA polymerase to bind at the promoter region. Once the RNA polymerase has bound, the sigma factor dissociates. The core polymerase then synthesizes RNA complementary to the lower(template) DNA strand. The promoter region tends to have a slightly higher A and T content than the surrounding DNA, and the most abundant form of promoter in Escherichia coli has two fairly well conserved sequences, TATAAT (the Pribnow box or –10 sequence) and TTGACA (the –35 sequence). The higher A/T content means that there are fewer hydrogen bonds to hold the double-stranded DNA together and thus the double helix is easier to force apart, a process termed melting. Melting of the promoter region allows access of the RNA polymerase, which specifically targets the –10 and –35 regions by the use of the sigma factor. The numbers –10 and –35 refer to relative position in relation to the number of bases the sequence is from the base where transcription starts (known as the transcription start site).

The –10(TATAAT)/–35(TTGACA) consensus promoter is not the only sequence of promoter in the genome. There are many promoters with a sequence very similar to –10/–35 and the greater the difference in the base sequence is to this consensus, the weaker the promoter. A strong promoter (such as that of the lac operon) forms a very tight bond with the sigma factor, and transcription is very likely to be initiated from such a promoter. A weak promoter binds the sigma factor by only a few bases, and is concomitantly less likely to initiate RNA synthesis. Other promoters have completely different sequences that bind alternative sigma factors. A good example of this is the alternative sigma factor produced in response to low oxygen in E. coli and some other bacteria. The sigma factor still allows RNA polymerase to recognize a –10 site, but instead will only allow the holo-polymerase to bind where there is an additional specific sequence (FNR site) at around 42 bases upstream of the transcription start site. The use of alternative sigma factors allows a whole group of genes and operons, known as a regulon, to be switched on and off according to external changes in the cell’s environment. Induction of the FNR regulon allows the cell to induce all the genes that are useful to cope with low oxygen, principally alternative electron acceptors.

▶Termination of transcription

Confusingly, the signal for termination of transcription is provided by a structure on the mRNA itself. After the last gene in the operon has been completely transcribed, the RNA polymerase continues transcription past the last gene’s termination codon. This part of the mRNA folds up into a stem-loop structure that causes the RNA polymerase to pause and cease transcription. The RNA polymerase then dissociates from the DNA and the mRNA it has generated, leaving the complete transcript ready for the ribosomes to translate it into protein.

▶Regulation of transcription

The strength of a promoter and alternative sigma factors represent two ways in which the cell can adjust the rate of mRNA transcription from a particular promoter, and thus the amount of individual proteins in the cell. There are some genes that seem to be switched on most of the time, particularly those genes involved in ‘housekeeping’ functions of the cell, such as central metabolic pathways, and those that account for gene products, such as the ribosomal components. However, we have come to recognize that all genes are regulated in some way at some stage during the cell cycle, and the bacterial cell has a variety of means of altering the flow of information from the genome to the proteome.

The two most common forms of regulation cited in the study of bacteria are derepression (where a protein bound to a promoter stopping transcription is removed and the gene is switched on) and attenuation (where the presence or absence of a substrate necessary for the function of the gene product governs the transcription of the gene itself ). The former is exemplified by the E. coli lac operon, the latter by the trp operon from the same organism. It should be noted that activation (where the presence of a protein is used to switch a gene on) is used much less frequently in Bacteria than derepression. The contrary is true in the Archaea and in eukaryotes, where transcription factors are the main regulators in activating gene expression.

▶The lac operon

The most commonly used model for transcription and the regulation of transcription is the lac operon of E. coli. The operon comprises a promoter, an operator (a site on the DNA where regulatory proteins bind), and three genes. These three genes allow the Bacterium to use lactose instead of glucose as a carbon source, lacZ coding for the enzyme b-galactosidase (the gene product LacZ – note the difference in italicization and capitalization between the gene and the protein it makes), lacY coding for a permease (gene product LacY) that allows lactose through the membrane, and lacA, a gene of poorly understood function that codes for a transacetylase. Upstream of the lacZYA promoter is another gene called lacI, which has its own promoter. The product of this gene, the protein LacI, is the primary regulator of the lac operon and is sometimes called the Lac repressor. When LacI is bound to the lacZYA operon operator, transcription is blocked, the cell is unable to produce b-galactosidase, and thus is unable to use lactose as a carbon source.

The functions of all the genes were found as a result of analysis of mutations. Changes in the DNA sequence in the genes themselves can knock out individual genes. The finding of a mutation that meant an inactive LacI was produced, and this allowed constitutive expression of lacZ (i.e. in the presence and absence of lactose), was significant and was one finding that allowed Jacob and Monod to begin to elucidate the mechanism of LacI repression. They proposed that LacI was always produced from its own promoter, but was structurally altered by the presence of lactose itself, which stopped it binding to the lacZYA promoter. It was found that the molecule that caused this change and induced the lac operon was allolactose. Allolactose is the primary inducer but there are other molecules such as XGAL that can also induce the operon.

The model was backed up with studies in the nonstructural parts of the operon, in the promoters of both lacZYA and lacI, and the operator of lacZYA. The lac promoter itself is relatively weak, even when induced by allolactose. High levels of transcription are only achieved when not only is the promoter derepressed but also when it is activated by the cAMP receptor protein (CRP). This is sometimes called catabolite activator protein (CAP). If the cell is rich in energy from other sources, it may not need to utilize lactose at all. The precursor of ATP, AMP, is in equilibrium with a cyclic form, cAMP. When glucose is low, there is an abundance of AMP, which in turn meansthere is a lot of cAMP. The cAMP binds to CRP, which can then activate the lac operon by as much as 40-fold.

▶The trp operon

The tryptophan operon comprises five genes (trpEDCBA), which, when transcribed and translated, enable the cell to synthesize tryptophan from glutamine and chorismate . Tryptophan itself inhibits the first step of the pathway at the protein level, but the cell also imposes regulation during both transcription and translation. It is important to remember that the cell does not need the tryptophan synthesis enzymes when there is an abundance of tryptophan itself, a contrast to the lac operon where lactose abundance requires that the cell induces a metabolism system. The transcription of the trp operon is thus stopped by the binding of the trp repressor complexed to tryptophan. As the levels of tryptophan fall, the operon is derepressed. The difference between repression and derepression is about 70-fold, much less than that of the lac operon in its ‘on’ and ‘off’ states, and this gave a clue that there was more than one mechanism regulating trpEDCBA expression.

The number of bases between the trp promoter/operator and triplet 1 of trpE is unusually long. The 162 nt (nucleotides) form a Rho-independent terminator site. When transcribed, the site consists of a short GC-rich palindrome followed by eight uracil bases. If this palindrome can hybridize to itself, it forms a stem-loop structure (or hairpin), which is a highly efficient transcriptional terminator, allowing the RNA polymerase to synthesize only the first 140 bp of the operon. The mechanism by which these short transcripts are made is termed attenuation.

In bacteria, transcription by RNA polymerase and translation by ribosomes often occur in close proximity, with the ribosome binding to its mRNA binding site as soon as the RNA polymerase has transcribed it. To ensure that the RNA polymerase and ribosome are as close to one another as possible, the initial transcription of the first hundred bases results in the formation of a stem-loop between regions 1 and 2 of the mRNA . This causes the RNA polymerase to pause, and while this takes place, a ribosome loads itself onto the mRNA at the 5’ end.

When tryptophan is absent or at very low levels, the ribosome begins to translate the 5’ end of the trp mRNA, until it encounters two codons coding for tryptophan itself (UUGUUG, region 1 ). As tryptophan is at such low levels, there are few tRNAtryptophan molecules around for the ribosome to incorporate into the growing peptide and, while the ribosome is waiting, a hairpin between regions 2 and 3 forms which reinforces this pause and allows the RNA polymerase to transcribe the entire operon. Contrast this with the situation when tryptophan is abundant: again the RNA polymerase begins transcription, pauses, and the ribosome loads, but now there is no barrier to the ribosome incorporating tryptophan, so it can move forward, unfolding the hairpin between regions 2 and 3. This causes region 3 to be available to form another secondary structure, this time with region 4 . However, the combination of regions 3 and 4 coupled to the presence of poly U a little further downstream is a Rho-independent transcription termination signal. The RNA polymerase falls off after only having manufactured the first 140 nt of the trp mRNA. In this way, the transcription of the trp operon is attenuated in the presence of tryptophan, but permitted in its absence, allowing the cell to conserve energy by avoiding the wasteful transcription of unneeded mRNA.