Synthetic Biology

Posted on August 3, 2013 by Marco Barnig

Synthetic biology is the design and construction of new biological parts, devices, and systems, and the re-design of existing, natural biological systems for useful purposes. It combines biology and engineering with a focus on Biotechnology.

Synthetic biologists focus on finding how life works (the origin of life) or how to use it to benefit society, including the approach of biology by inserting man-made DNA into a living cell and the approach of chemistry by working on gene synthesis as an extension of synthetic chemistry.

The website syntheticbiology.org, originally started by a group of students, faculty and staff from MIT and Harvard, now regroups all individuals, groups and labs from various institutions who are committed to engineering biology in an open and ethical manner. The site is hosted on OpenWetWare and can be edited by all members of the Synthetic Biology community.

An exciting synthetic biology project was recently funded succesfully on Kickstarter : Glowing Plants: Natural Lighting with no Electricity. A few days ago, without explanation, Kickstarter quietly altered its guidelines for project creators, introducing a new term that bans creators from giving away genetically-modified organisms (GMOs) as rewards to their online backers (see the post Kickstarter bans project creators from giving away genetically-modified organisms edited by Duncan Geere at The Verge website).

More informations about synthetic biology are available at the following links :

Synthetic Biology Project
The Synthetic Biology Institute at UC Berkeley (SBI)
The Synthetic Biology Center at MIT
International Association Synthetic Biology
Leukippos : Synthetic Biology Lab in the Cloud

Central dogma of molecular biology

Posted on March 22, 2012 by Marco Barnig

Last update : August 9, 2013

The central dogma of molecular biology is not really a dogma, but a framework for understanding the transfer of sequenced information between biopolymers in living organisms. There are 3 major classes of such biopolymers :

DNA
RNA
Protein

In it’s simplest form, the dogma of molecular biology states that DNA makes RNA and RNA makes protein. The dogma was first stated by Francis Crick in 1958 and re-stated in a Nature paper (Vol 227) published in August 1970.

There are 3×3 = 9 conceivable direct transfers of information that can occur between these biopolymers classed into 3 groups :

general transfers
special transfers
unknown transfers

The general transfers describe the normal flow of biological information :

DNA Replication : process by which one double-stranded DNA molecule produces two identical copies of the molecule
Transcription : process by which the information contained in a section of DNA is transferred to a newly assembled piece of messenger RNA (mRNA)
Translation : process by which the messenger RNA (mRNA) produced by transcription is decoded by the sites of protein synthesis, the ribosomes, to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein

Special transfers occur only under specific conditions in case of some viruses or in a laboratory. These transfers are RNA replication, reverse transcription and direct translation from DNA to protein.

Francis Crick believed that protein could not encode for DNA or RNA or other proteins and classed these processes in the unknown transfers. Prions, discovered in 1982 by Stanley B. Prusiner, are proteins that propagate themselves by making conformational changes in other molecules of the same type of protein. While this represents a transfer of information from protein to protein, prion interactions leave the sequence of the protein unchanged, and so are not technically considered an exception to the central dogma of molecular biology of Francis Crick.

GeneChip Expression Analysis Technology

Posted on March 22, 2012 by Marco Barnig

DNA oligonucletides (oligo) are short stretches of DNA sequences. Due to the double-stranded nature of DNA, one can design an oligo which has the complementary sequence to any gene of interest. If these oligos, or probes, are attached to a solid surface in a defined grid (x rows and y columns), a genechip (also called microarray or DNA chip) has been created.

If the genechip is put in contact with a solution containing sequences of gene products, these products (targets) will bound with their complementary probes. This process is called hybridization. The more products of a gene are in the solution, the more will hybridize with the probe on the surface of the microarray.

To identify the hybridized targets, it is necessary to label them. There are several techniques to do this marking. Beneath labeling with radioactive isotopes, the most common non-radioactive technique is fluorescent dye (FISH : fluorescence in situ hybridization). Most popular are Cyanine dyes, especially Cy3, fluorescent in the green region and Cy5, fluorescent in the red region.

One dye color is sufficient to measure the abundance of particular gene products in particular regions by scanning the microarray. The most common approach however is a two-color design where one of the samples of the gene products is a universal reference sample.

A gene product is the biochemical material, either functional RNA or protein, resulting from the activity (expression) of a gene. The amount of gene products depends on how active a gene is. In most experiments the ribosomal RNA (rRNA) is used as the gene product, because rRNA is one of only a few gene products present in all cells. Ribosomal RNA provides a mechanism for decoding mRNA into amino acids.

After the hybridization, the unbound material is washed away and the microarray is scanned. Once the data is collected, it can be analyzed by sophisticated bioinformatics tools. The results are usually published and shared with the scientific community in specialized data-bases.

The following list of links provides further informations and some interactive animations about genechip expression analysis technologies :

Fold change in analysis of gene expression

Posted on March 21, 2012 by Marco Barnig

Fold change is often used in analysis of gene expression data in microarray and RNA-Seq experiments, for measuring change in the expression level of a gene.

Fold change is a number describing how much a quantity changes going from an initial to a final value. For example, an initial value of 30 and a final value of 60 corresponds to a fold change of 2 (in common terms, a two-fold increase). A change from 80 to 20 would be a fold change of 0.25, while some practitioners replace a fold-change value that is less than 1 by the negative of its inverse, e.g.0.25 would be a fold change of -4 (in common terms, a four-fold decrease).

Agilent Sureprint G3 Human Gene Expression 8x60K Microarray

Posted on March 19, 2012 by Marco Barnig

Agilent’s SurePrint G3 Human GE 8x60K Microarray is based on updated transcriptome databases for mRNA targets and also include probes for lincRNAs (long intergenic non-coding RNAs). With the combination of mRNA and lincRNAs, it is now possible to perform two experiments on a single microarray, confidently predicting lincRNA function.

Each kit contains 3 standard glass slides containing eight 60K 60-mer Oligonucleotide microarray printed using Agilent’s SurePrint technology. The product number is G4851A, the design ID is 028004.

Each array contains 62,976 features arranged in 384 rows and 164 columns. A GeneList of the spots contained in the 8X60K microarrays is available at the Agilent Technologies website. There are 40.509 different genes, several are duplicated.

The following list shows the typical tools to analyze the Agilent microarrays :

Agilent Genomic Workbench Feature Extraction 10.10 : Quick Start Guide
Workbench Feature Extraction 10.10 : User Guide

Gene Expression Omnibus (GEO) and MIAME

Posted on March 18, 2012 by Marco Barnig

GEO is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the studies and gene expression patterns stored in GEO. Submitters are encouraged to supply MIAME compliant data.

The MIAME (Minimum Information About a Microarray Experiment) guidelines outline the minimum information that should be included when describing a microarray experiment. The six most critical elements contributing towards MIAME are :

raw data for each hybridization
final processed (normalized) data for the set of hybridizations in the experiment
essential sample annotation including experimental factors and their values
experimental design including sample data relationships
sufficient annotation of the array
essential laboratory and data processing protocols

Many journals require accession numbers for microarray or sequence data before acceptance of a paper for publication. GEO processing times is approximately 5 business days after completion of submission.

GEO is hosetd by the NCBI (National Center for Biotechnology Information). A detailed documentation is available at the NCBI website. Specific recommendations for Agilent submissions are also available.

Genome Assembly Terminology

Posted on March 13, 2012 by Marco Barnig

Below is a list of commonly used terms and definitions in the field of genomics (source : Genome Reference Consortium).

Assembly : a set of chromosomes, unlocalized and unplaced sequences and alternate loci used to represent an organism’s genome
Chromosome Assembly : a relatively complete pseudo-molecule assembled from smaller sequences that represent a biological chromosome
Diploid Assembly : a genome assembly for which a Chromosome Assembly is available for both sets of an individual’s chromosomes
Haploid Assembly : the collection of Chromosome assemblies, unlocalized and unlocalized sequences and alternate loci that represent an organism’s genome
Primary Assembly : a primary assemblies represents the collection of assembled chromosomes, unlocalized and unplaced sequences that, when combined, should represent a non-redundant haploid genome
Assembly Units : collections of sequences used to define discrete parts of an assembly
Genome Patch : a contig sequence that is released outside of the full assembly release cycle
FIX patch : FIX patches are released to correct an error in the assembly and will be removed when the new full assembly is released
NOVEL patch : NOVEL patches are sequences that were not in the last full assembly release and will be retained with the next full assembly release
Alternate Locus :
Unlocalized Sequence : a sequence found in an assembly that is associated with a specific chromosome but cannot be ordered or oriented on that chromosome
Unplaced Sequence : a sequence found in an assembly that is not associated with any chromosome
PAR (Pseudo-autosomal region) : a region found on the X and Y chromosomes of mammals that allow recombination between the sex chromosomes
AGP File : a file used to describe the instructions for building a contig, scaffold or chromosome sequence
Contig : a contiguous sequence generated from determining the non-redundant path along an order set of component sequences
Component : a low genomic level sequence used to construct the genome, typically these are either clone sequences, WGS sequence or a PCR fragment
Join : the sequence overlap between two adjacent components in a contig
Scaffold : an ordered and oriented set of contigs with gaps
Switch Point : the base at which the contig sequence stops being generated from one component sequence and switches to using the next component sequence
TPF (Tiling Path file) : provides the order of the component sequences used to build a contig, scaffold or chromosome

The Human Genome Project

Posted on March 12, 2012 by Marco Barnig

The Human Genome Project (HGP) was a 13-year international scientific research project coordinated by the U.S. Department of Energy (DOE) and the National Institutes of Health (NIH). The primary goal was determining the sequence of chemical base pairs which make up DNA, and identifying and mapping the approximately 20,000-25,000 genes of the human genome from both a physical and functional standpoint.

The project began in October 1990; a complete draft of the genome was announced in April 2003, two years earlier than planned. The U.S. National Center for Biotechnology Information (NCBI) house the gene sequence in a database known as GenBank, along with sequences of known and hypothetical genes and proteins.

Specialised computer programs are necessary to analyze the data, because the data itself is difficult to interpret without such programs. Among the organizations creating powerful tools for storing, visualizing and searching Genome data are the Genome Bioinformatics Group at the University of California , Santa Cruz (UCSC), the European Bioinformatics Institute (EBI = part of the European Molecular Biology Laboratory EMBL) and the Wellcome Trust Sanger Institute (WTSI).

The EBI and WTSI launched in 1999 a joint scientific project called Ensembl, which aim is to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of human species and other vertebrates and model organis ms.

Ensembl Genomes release 13 was launched on March 8, 2012, bringing the total genomes supported to 341.

The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation. It consists of two steps:
1. identifying elements on the genome, a process called gene prediction
2. attaching biological information to these elements

The value of a genome is only as good as its annotation. To create a gold standard reference annotation, the Human and Vertebrate Analysis and Annotation (HAVANA) team of the WTSI uses tools developed in-house to manually annotate human, mouse and zebrafish genomes. Based on these data a central repository for high quality manual annotation of vertebrate finished genome sequence, called The Vertebrate Genome Annotation (VEGA) database, has been created.

The EBI hosts the The Protein and Nucleotide Database Group (PANDA) providing all its sequence resources and The HUGO Gene Nomenclature Committee (HGNC), the only worldwide authority that assigns standardised nomenclature to human genes. HGNC has assigned unique gene symbols and names to over 33,000 human loci, of which around 19,000 are protein coding. The HGNC website genenames.org is a curated online repository of approved gene nomenclature and associated resources.

In September 2003, the National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, to carry out a project to identify all functional elements in the human genome sequence. Both UCSC and WTSI are participating in the ENCODE project.

The WTSI set up a sub-project of the ENCODE project; called GENCODE (Encyclopædia of genes and gene variants) to annotate all evidence-based gene features in the entire human genome at a high accuracy. The Gencode gene sets are used by the entire ENCODE consortium and by many other projects as reference gene sets :

1000 Genomes
Genome at Home (Stanford University)

Genome Browsers and BioGPS

Posted on March 12, 2012 by Marco Barnig

A genome browser is a graphical interface for display of information from a biological database for genomic data. Genome browsers enable researchers to visualize and browse entire genomes with annotated data including gene prediction and structure, proteins, expression, regulation, etc.

A detailed list of existing genome browsers is available at Wikipedia. The renowned genome browsers are the following :

Ensembl
UCSC Genome Browser
Map Viewer
Integrated Microbial Genomes (IMG)
Integrative Genomics Viewer (IGV)
GBrowse software system (framework for many additional genome browser)

GBrowse is part of GMOD, the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases. Another open source bioinformatics projects is Galaxy, a web-based platform for data intensive biomedical research.

BioGPS is a gene portal built with two guiding principles in mind : customizability and extensibility.

Chromosomes

Posted on March 11, 2012 by Marco Barnig

A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences.

Chromosomes can be divided into two types—autosomes, and sex chromosomes. Human cells have 23 pairs of large linear nuclear chromosomes (22 pairs of autosomes and one pair of sex chromosomes), giving a total of 46 chromosomes.

The specific location of a gene or DNA sequence on a chromosome is called a locus (plural : loci). A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map. Gene mapping is the procession of determining the locus for a particular biological trait.

Diploid and polyploid cells whose chromosomes have the same allele of a given gene at some locus are called homozygous with respect to that gene, while those that have different alleles of a given gene at a locus, are called heterozygous with respect to that gene.

The number of genes and base pairs per chromosome varies among the different sources available on the net. The following list shows statistics from the Major Assembly GRCh37, patch 7, released by the Genome Reference Consortium on February 11, 2012.

No	Sequenced	# Genes	% DNA	# base pairs (millions)
1	May 2006	3.511	8	250
2	April 2005	2.368	8	243
3	April 2006	1.926	6.5	198
4	April 2005	1.444	6	191
5	September 2004	1.633	6	181
6	October 2003	2.057	5.5	171
7	July 2003	1.882	5	159
8	January 2006	1.315	4.5	146
9	May 2004	1.534	4.5	141
10	May 2004	1.391	4.5	136
11	March 2006	2.168	4.5	135
12	March 2006	1.714	4.5	134
13	March 2004	720	3.5	115
14	January 2003	1.532	3.5	107
15	March 2006	1.249	3.5	103
16	December 2004	1.326	3	90
17	April 2006	1.773	2.5	81
18	March 2004	557	2.5	78
19	March 2004	2.066	2	59
20	December 2001	891	2	63
21	May 2000	450	1.5	48
22	December 1999	855	1.5	51
X	March 2005	1.672	5	155
Y	June 2003	429	2	59
Total :		36.463	100	3.094

The following list gives links to different chromosome repositories :

Internet with a Brain

Your browser becomes your personal assistant and Internet gets a synthetic consciousness

Category Archives: Biotechnology