The chain elongation of Okazaki fragments in E. coli is catalyzed by DNA polymerase III holoenzyme (10). This enzyme possesses a capacity to synthesize DNA with a very high processivity, sufficient for completion of about 2 kb of Okazaki fragment. In addition, the Pol III holoenzyme dissociates from the nascent Okazaki fragment and restarts the next round of Okazaki fragment synthesis from an RNA primer newly settled near the replication fork (11). Enzymes to remove primer RNA and fill the gap, such as ribonuclease H and DNA polymerase I of E. coli, are essential for the sealing of Okazaki fragments by DNA ligase (12). Mutants defective in either DNA polymerase I or DNA ligase show a massive accumulation of short Okazaki fragments under restrictive conditions.
Although the basic biochemical processes that occur at eukaryotic and prokaryotic replication forks are similar, there are many differences in detail (13). For example, primer synthesis in eukaryotic cells is catalyzed by DNA polymerase a, which synthesizes 2 to 12 nucleotides of RNA (initiator RNA) and further adds about 20 nucleotides of DNA to the initiator RNA. The size of Okazaki fragments (40 to 300 nucleotides) in eukaryotes is significantly shorter than those observed in prokaryotes.
The human genome consists of approximately three billion pairs of nucleotides (bases) that encode about 40,000 genes in a string array consisting of a DNA polymer duplex. The information in the protein-coding genes is converted to functional elements through the copying of base sequences to RNA. RNA may itself be active—heterogeneous ribonucleotide proteins can interact with newly synthesized RNA to regulate structural changes or conformation—or it may be copied to protein. Whereas DNA is a relatively stable molecule, RNA is much less so, and RNA molecules are replaced by other RNA molecules rapidly in the cell. Models of how the genome is packaged within the nucleus, copied, and read by the cell have evolved at the molecular level. Science has also come to recognize how the genome is partitioned during cell division and during the formation of germ cells. At all levels, the information content is protected by mechanisms safeguarding the stability of the genome. Information is transferred out of the genome along two paths: information is transferred within the cell for defined functions specific to cell type (horizontal information transfer) and is transferred from one generation of cells to another through cell division, either for cell multiplication or for reproduction (vertical transfer).
The advances in the past century, from the verification of Mendel’s observations1 to having in hand the essentially complete sequence of the human genome, occurred in bursts in understanding or in technology. During the first 50 years of the 20th century, the principle of inheritance by means of packets of genetic information that were stable and that persisted independently of other units of inheritance from generation to generation was verified in animals and plants, with the fruit fly Drosophila melanogaster being a notable organism of study. In the 1940s, DNA was unequivocally shown to be the chemical basis of the gene.2 Within another dozen years, the structure of DNA at the chemical level was proposed by Watson and Crick.3 The model had immediate implications for the copying of genes and the mode of transfer of information. During the next 2 decades, the fundamental rules and mechanisms of these processes were determined; these advances relied heavily on the study of bacteria and their viruses, the phages. In the 1970s, three different technological advances catapulted genetics to the point at which the human DNA sequence was determined by century’s end. These disparate techniques were as follows: the development of the ability to determine DNA sequence information in a relatively simple and reproducible manner; the ability to move and duplicate isolated segments of DNA; and the evolution of computers with adequate power for storing and comparing large amounts of sequence information. Refinements in these basic advances, coupled with automation, led to the sequencing of the human genome during the last 15 years before the turn of the millennium.
Understanding the structure of the DNA duplex led at once to recognition that the polymer strand, composed only of four bases, could encode information in a linear format. DNA is composed of two polymer strands with a sugar-phosphate backbone, with the bases attached to each deoxyribose moiety [see Figure 1]. The two strands of DNA are stabilized by the bonding of hydrogen between the bases—the purine adenine (A) pairs with the pyrimidine thymine (T), and the purine gua-nine (G) pairs with the pyrimidine cytosine (C). The discovery of this was pivotal in modeling the structure of DNA. Alternative forms of pairing can be projected, but the pairings based on the most common structures of the bases and the strongest hydrogen bonds are predominant. The sugar-phosphate backbones of the strands have a chemical polarity, and the strands are antiparallel with respect to the polarity of the bonds between deoxyribose components. The information can be duplicated using each separate strand as a template. The linear code of the four-base alphabet allows enormous potential for information content—each cell contains about 1 m of DNA packaged within it.
In Escherichia coli, more than 20 different proteins participate in DNA replication (2). These were identified by screening mutants defective in DNA replication and by purifying enzymes required for in vitro DNA synthesis (see dna genes). From their biochemical roles at different stages of chromosomal DNA replication, it appears that at least eight proteins are involved in the discontinuous replication in E. coli. Those are (1) primosome proteins, including DNA helicases and primase: PriA, PriB, PriC, DnaT, DnaB, DnaC, and DnaG (primase); (2) proteins required for chain elongation: DNA polymerase III (Pol III) holoenzyme and single-stranded DNA binding protein (SSB); and (3) proteins required for connecting Okazaki fragments: DNA polymerase I (Pol I), RNaseH, and DNA Ligase. Among these proteins, the DnaB helicase, primase (DnaG), and Pol III holoenzyme are the basic components acting on the discontinuous DNA synthesis at the replication fork, probably forming a multiprotein complex called a replisome.
The DNA double helix is packaged into the chromosome, a structure recognizable by light microscopy. The packaging is very efficient; the DNA is in the form of supercoils—like a rubber band that is tightly wound until it compacts upon itself. It is then folded into the chromatin assembly by the binding of basic histone proteins. The resulting structure resembles beads on a string, with the DNA wound tightly around a core of histone proteins—two H2A, two H2B, two H3, and two H4 residues— to form the nucleosome. Nucleosomes are spaced approximately 80 bases apart. The DNA structure is further condensed by the addition of other proteins. The DNA in chromatin also may be modified by the addition of methyl groups to certain positions, and the histones may be modified by phosphoryla-tion, acetylation, or the addition of ubiquitin. These modifications to the chromatin are related to the regulation of gene expression.
Figure 2 (a) Synthesis initiates with priming of a lagging strand. (b) Elongation proceeds from 5′ to 3′ on each strand. (c) The discontinuous fragment reaches the 5′ terminus of the lagging strand. (d) The replicative complex releases the lagging strand to form a new initiation complex.
Information is copied from the long-lived DNA molecule to the less stable messenger RNA (mRNA) molecule, which serves to translate the information into proteins. mRNA represents only 1% or so of the RNA in the cell; other categories of RNA synthesize proteins or act as catalytic units in the processing of genetic information. For protein synthesis, the mRNA must be read and the information transferred in a three-base code (codon). This allows for more triplet codons than the 20 amino acids used to build proteins would require. Several of the codons are delegated to terminating protein synthesis, and there is redundancy in the codons for the amino acids.
Chromosome identification and characterization was much improved by the development of staining or banding techniques. This process involves partial denaturation of the DNA and proteins, followed by staining. The resulting preparations allow identification of up to 800 bands, which may then be evaluated for structural changes. These techniques have led to the recognition of many rearrangements, leading to localization of genes.
The enzyme uses all four bases with equal affinity, and the reaction is determined by the template. DNA polymerases are part of an assembly of proteins termed the replisome. The synthesis of new DNA is faithful; only about one base in 100,000 is a misincorporation. The repli-some is actively proofreading the product, an exonuclease, which can remove a mispaired base from the growing end of the DNA strand (all DNA is made only in the 5′ to 3′ direction, based on the convention for the sugar-phosphate backbone). This proofreading by the replisome removes 99% of misincor-porations. A last line of defense for integrity of information during replication is the system of mismatch repair. A complex of proteins tracks DNA synthesis, recognizes mismatches in the DNA that occur as a result of misincorporation, and corrects those mismatches. About 99% of mismatches are removed; the overall mistake rate in replication is about one in one billion to 10 billion.