Central dogma of molecular biology

Last update : August 9, 2013

The central dogma of molecular biology is not really a dogma, but a framework for understanding the transfer of sequenced information between biopolymers in living organisms. There are 3 major classes of such biopolymers :

  • DNA
  • RNA
  • Protein

In it’s simplest form, the dogma of molecular biology states that DNA makes RNA and RNA makes protein. The dogma was first stated by Francis Crick in 1958 and re-stated in a Nature paper (Vol 227) published in August 1970.

There are 3×3 = 9 conceivable direct transfers of information that can occur between these biopolymers classed into 3 groups :

  • general transfers
  • special transfers
  • unknown transfers

The general transfers describe the normal flow of biological information :

  • DNA Replication : process by which one double-stranded DNA molecule produces two identical copies of the molecule
  • Transcription : process by which the information contained in a section of DNA is transferred to a newly assembled piece of messenger RNA (mRNA)
  • Translation : process by which the messenger RNA (mRNA) produced by transcription is decoded by the sites of protein synthesis, the ribosomes, to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein

Special transfers occur only under specific conditions in case of some viruses or in a laboratory. These transfers are RNA replication, reverse transcription and direct translation from DNA to protein.

Francis Crick believed that protein could not encode for DNA or RNA or other proteins and classed these processes in the unknown transfers. Prions, discovered in 1982 by Stanley B. Prusiner, are proteins that propagate themselves by making conformational changes in other molecules of the same type of protein. While this represents a transfer of information from protein to protein, prion interactions leave the sequence of the protein unchanged, and so are not technically considered an exception to the central dogma of molecular biology of Francis Crick.

The Human Genome Project

The Human Genome Project (HGP) was a 13-year international scientific research project  coordinated by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH). The primary goal was determining the sequence of chemical base pairs which make up DNA, and identifying and mapping the approximately 20,000-25,000 genes of the human genome from both a physical and functional standpoint.

The project began in October 1990; a complete draft of the genome was announced in April 2003, two years earlier than planned. The U.S. National Center for Biotechnology Information (NCBI) house the gene sequence in a database known as GenBank, along with sequences of known and hypothetical genes and proteins.

Specialised computer programs are necessary to analyze the data, because the data itself is difficult to interpret without such programs. Among the organizations creating powerful tools for storing, visualizing and searching Genome data are the Genome Bioinformatics Group at the University of California , Santa Cruz (UCSC), the European Bioinformatics Institute (EBI = part of the European Molecular Biology Laboratory EMBL) and the Wellcome Trust Sanger Institute (WTSI).

The EBI and WTSI launched in 1999 a joint scientific project called Ensembl, which aim is to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of human species and other vertebrates and model organis  ms.

Ensembl Genomes release 13 was launched on March 8, 2012, bringing the total genomes supported to 341.

The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation. It consists of two steps:
1. identifying elements on the genome, a process called gene prediction
2. attaching biological information to these elements

The value of a genome is only as good as its annotation. To create a gold standard reference annotation, the Human and Vertebrate Analysis and Annotation (HAVANA) team of the WTSI uses tools developed in-house to manually annotate human, mouse and zebrafish genomes. Based on these data a central repository for high quality manual annotation of vertebrate finished genome sequence, called The Vertebrate Genome Annotation (VEGA) database, has been created.

The EBI hosts the The Protein and Nucleotide Database Group (PANDA) providing all its sequence resources and The HUGO Gene Nomenclature Committee (HGNC), the only worldwide authority that assigns standardised nomenclature to human genes. HGNC has assigned unique gene symbols and names to over 33,000 human loci, of which around 19,000 are protein coding. The HGNC website genenames.org is a curated online repository of approved gene nomenclature and associated resources.

In September 2003, the National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, to carry out a project to identify all functional elements in the human genome sequence. Both UCSC and WTSI are participating in the ENCODE project.

The WTSI set up a sub-project of the ENCODE project; called GENCODE (Encyclopædia of genes and gene variants) to annotate all evidence-based gene features in the entire human genome at a high accuracy. The Gencode gene sets are used by the entire ENCODE consortium and by many other projects as reference gene sets :


Genetics is the science of genes, heredity, and variation in living organisms. It’s a discipline of biology and can be applied to the study of all living systems, from viruses and bacteria, through plants and domestic animals to humans. The modern science of genetics, which seeks to understand the process of inheritance, began with the work of Gregor Mendel in the mid-19th century.

The molecular basis for genes is deoxyribonucleic acid (DNA). Genes correspond to regions within DNA, a molecule composed of a chain of four different types of nucleotides :

  • adenine (A)
  • cytosine (C)
  • guanine (G)
  • and thymine (T)

Genetic information exists in the sequence of these nucleotides. DNA exists as a double-stranded molecule, coiled into the shape of a double-helix. Each nucleotide in DNA  pairs with its partner nucleotide on the opposite strand: A pairs with T, and C pairs with G. Thus, in its two-stranded form, each strand contains all necessary information, redundant with its partner strand.

Genes are arranged linearly along long chains of DNA base-pair sequences. Eukaryotic organisms, which include plants and animals, have their DNA arranged in multiple linear chromosomes. These DNA strands are often extremely long; the largest human chromosome is about 247 million base pairs in length.

The full set of hereditary material in an organism (the combined DNA sequences of all chromosomes) is called the genome.