Team:Alberta/Project/Bioinformatics

From 2009.igem.org

Revision as of 23:26, 12 September 2009 by JuliaPon (Talk | contribs)

University of Alberta - BioBytes










































































































Why build a minimal genome?

Genomes are complex! Determining how simplified a genome can become enriches our understanding the function and interactions of cellular components. Simplified cells can be used as a well characterized chasses for synthetic biology. Moreover, a simplified cell can be used to study cellular processes in a controlled, characterized genetic background. Finally, developing a minimal genome requires us to develop and optimize molecular methods of genome assembly. These methods can be then applied to other high through put biology.

Why We Need Bioinformatics

The size and complexity of the genome make bioinformatics analysis essential. We used bioinformatics to accomplish the following:

- review lists of essential genes in the literature and existing databases and compile a preliminary essential gene list

- model the metabolic reactions and net growth rate of E.coli with given gene sets. This identified additional metabolic genes essential to a minimal genome.

- identify knock out combinations that could be tested in the wet lab, to verify the accuracy of our metabolic model.

- select standardized promoters and terminators that would replace the natural promoters and terminators of essential genes.

- determine which promoter should be used with which gene, by analyzing expression level data.

- design primers to amplify all essential genes from genomic DNA.

These steps have all been completed, and are described on the following pages.

Literature Review

Four essential gene lists from the literature were analyzed to construct a preliminary essential gene lists

Essential Gene List from the literature Method # of Genes considered essential % of E.coli genes considered essential
Baba et al. 2006 Single gene knockout 303 6.4%
Gerdes et al 2003 Transposon insertions to inactivate single genes 617 13.0%
Gil et al 2004 Gene conservation and literature review 203 4.3%
Profiling of E.coli Chromosome (PEC) database Literature review 302 6.3%

These literature lists vary greatly in size and have minimal overlap. All analysis referred to genes by Blattner numbers in order to standardize gene names.

insert another table here

The maximum number of genes in common between any two literature lists is 205, which is between Baba et al 2006 and Gerdes et al 2003.

The varying levels of overlap between the four essential gene lists from the literature can be demonstrated in a Venn diagram, in which the number indicate the number of genes in common between lists. Only 48 genes were present in all four lists.

Constructing of a Preliminary Essential Gene List from the Literature

Criteria for Gene Selection:

o Genes must be present in more than one literature list unless there is particular reason to suspect they are essential

o The REcoli metabolism is modeled after the minimal metabolism proposed by Gil et al 04, with the addition of cell wall, fatty acid, heme and ubiquitin synthesis, as Gil assumed these would not be necessary in a mycoplasma like minimal cell.

o Additional genes required for metabolism were selected based on pathway information in the Ecocyc database. Redundancy of pathways is likely why these genes don’t appear essential in Baba, Gerdes and PEC.

o Antitoxin genes are not essential as toxin genes would not be present

Genes for the following processes were included:

o DNA replication and cell division, but no DNA repair

o Chaperones, but no heat shock or membrane stress response system

o Transcription

o Translation

o Glycolysis

o lactate production from pyruvate to regenerate NADH

o PMF generation via an ATP synthase consuming ATP to export protons.

o Synthesis of acetyl-CoA from pyruvate

o Fatty acid synthesis

o Methylerithritol pathway (for undecaprenyl phosphate and a ubiquinone side chain)

o Synthesis of phosphatidylethanolamine, but no other phospholipids

o Pentose phosphate pathway (converts 6 or 3 carbon sugars to 5C sugars, such as ones needed in nucleotide biosynthesis)

o Lipoprotein synthesis (Int and lolB are lipoproteins and essential)

o Synthesis of nucleotides (deoxy and oxy) from nucleosides

o Attaching lipid and biotin groups to protein

o Transport:

PTC transport system (imports and phosphorylates glucose)

Inorganic phosphate transport

Nucleoside transport

Sec system (exports proteins to periplasm), including SRP for cotranslational membrane insertion. secB chaperone does not appear essential. There is NO tat system, which would be used to export cofactor containing folded proteins.

Lipoprotein transport to outermembrane

Glutathione transport

o Cofactor synthesis:

Riboflavin from GTP and ribulose-5-phosphate

FAD from riboflavin

NAD from nicotinamide

NADPH from NAD

CoA from pantothenic acid

Methylene tetrahydroxyfolate (mTHF) from folic acid

S-adenosylmethionine (SAM) from methionine

Thiamine diphosphate (TPP) from thiamine

Pyridoxal-5-phosphate (PP) from pyridoxal

Heme from glutamate

Ubiquinone

RNAs:

The rrnC operon supplies the rRNA’s and three of the tRNAs. This operon was selected because it includes the great number of tRNA’s. To select the other tRNA’s, all tRNA’s listed as essential in PEC were first included. One tRNA was then selected for each anticodon that differed on one of the last two bases. At least one tRNA was included for each amino acid.

The complete list of essential RNA’s can be found here .

Statistics on BioBytes Gene List Based on Literature Review

Total genes in Ecoli: 4762

Total protein coding genes in BioBytes preliminary essentials list: 332

Total number of RNA genes in BioBytes preliminary essentials list: 29