Team:Alberta/Project/Gene Selection


University of Alberta - BioBytes

Literature Review

There were four primary literature sources which were used for the determination of the essential genome. These genes were analyzed to construct a preliminary essential gene list

Literature Gene List Data
Essential Gene List from the literature Method # of Genes considered essential % of E.coli genes considered essential # of genes unique to that list
Baba et al. 2006 Single gene knockout 303 6.4% 36
Gerdes et al 2003 Transposon insertions to inactivate single genes 617 13.0% 379
Gil et al 2004 Gene conservation and literature review 203 4.3% 53
Profiling of E.coli Chromosome (PEC) database Literature review 302 6.3% 126

Each gene list was determined in a variety of ways and there results show very little consistency. The number of genes from each source varies greatly. When the lists are compared to one another, there is very little overlap noted. Please see below:

Venn Diagram of the Number of Essential Genes Shared Between Lists in the Literature

The maximum number of genes in common between any two literature lists is 205, which is between Baba et al 2006 and Gerdes et al 2003. Only 48 genes were present in all four lists. The lack of consensus between these literature lists makes it very unreliable to use these genes in an essential genome. Still, these lists provide an important foundation for basic components that are required in an essential genome list.

Constructing the BioBytes Preliminary Essential Gene List

The preliminary essential gene list is based on literature sources. As described in the modeling section of this wiki, the metabolic genes from this preliminary list were used as a starting point for the computer model and were greatly altered based on the model's suggestions. The following criteria were used for selecting genes from a literature source:

  • Genes must be present in more than one literature list unless there is particular reason to suspect they are essential.
  • The BioBytes metabolism is modeled after the minimal metabolism proposed by Gil et al 04, with the addition of cell wall, fatty acid, heme and ubiquitin synthesis, as Gil assumed these would not be necessary in a mycoplasma like minimal cell.
  • Additional genes required for metabolism were selected based on pathway information in the Ecocyc database. Redundancy of pathways is likely why these genes don’t appear essential in Baba, Gerdes and PEC.
  • Antitoxin genes are not essential as toxin genes would not be present.

The basic functional groups of genes which were selected can be seen in the next chart and a detailed list of processes which were included follow.

Essential Gene Functions

Genes for the following processes were included:

  • DNA replication and cell division, but no DNA repair
  • Chaperones, but no heat shock or membrane stress response system
  • Transcription
  • Translation
  • Glycolysis
  • PMF generation via an ATP synthase consuming ATP to export protons.
  • Synthesis of acetyl-CoA from pyruvate
  • Fatty acid synthesis
  • Methylerithritol pathway (for undecaprenyl phosphate and a ubiquinone side chain)
  • Synthesis of phosphatidylethanolamine, but no other phospholipids
  • Pentose phosphate pathway (converts 6 or 3 carbon sugars to 5C sugars, such as ones needed in nucleotide biosynthesis)
  • Lipoprotein synthesis (Int and lolB are lipoproteins and essential)
  • Synthesis of nucleotides (deoxy and oxy) from nucleosides
  • Attaching lipid and biotin groups to protein
  • Transport:
    • PTC transport system (imports and phosphorylates glucose)
    • Inorganic phosphate transport
    • Nucleoside transport
    • Sec system (exports proteins to periplasm), including SRP for cotranslational membrane insertion. secB chaperone does not appear essential. There is NO tat system, which would be used to export cofactor containing folded proteins.
    • Lipoprotein transport to outermembrane
    • Glutathione transport
  • Cofactor synthesis:
    • Riboflavin from GTP and ribulose-5-phosphate
    • FAD from riboflavin
    • NAD from nicotinamide
    • NADPH from NAD
    • CoA from pantothenic acid
    • Methylene tetrahydroxyfolate (mTHF) from folic acid
    • S-adenosylmethionine (SAM) from methionine
    • Thiamine diphosphate (TPP) from thiamine
    • Pyridoxal-5-phosphate (PP) from pyridoxal
    • Heme from glutamate
    • Ubiquinone


The rrnC operon supplies all the rRNA’s and three of the tRNAs. This operon was selected because it includes the great number of tRNA’s. To select the other tRNA’s, all tRNA’s listed as essential in PEC were first included. One tRNA was then selected for each anticodon that differed on one of the last two bases. Differences in the first base can be accommodated by anticodon 'wobble'. At least one tRNA was included for each amino acid.

The complete list of essential RNA’s can be found here .

Statistics on BioBytes Preliminary Essential Gene List

Total genes in Ecoli: 4762

Total protein coding genes in BioBytes preliminary essentials list: 332

Total number of RNA genes in BioBytes preliminary essentials list: 29

Modeling Genes

Selection of the individual modeling genes can be seen under the Modeling tab of the Bioinformatics section. From the lists determined by the model there are 116 genes where were determined to be essential. The model only contains metabolic genes of the MG1655 ''E. coli'' genome therefore all other types of genes were solely determined using literature sources. Many of the genes that were determined to be essential are due to the complex nature of metabolic pathways. It is not sufficient to simply delete a single gene and determine if the organism is viable. Often genes act in complexes, or become essential if other genes become deleted (for example in redundant processes where 2 genes fulfill the same essential function) allowing the modeling work to fill the gaps of numerous genes which are required for life. The function of many genes which were added include transport of small metabolic compounds. Although there are some new pathways that are added, the majority of the genes collected add to many of the pathways determined to be essential via the literature search. This shows that our list contained many of the correct pathways, just further research was required to determine all of the essential genes.

BioBytes Final Essential Gene List

A final list of essential genes was produced from the literature review and the computer model.

Number of genes in list created from literature: 332

Number of additional genes suggested by model: 116

Number of genes in final essential genes list: 448

Number of genes in our essential gene list not classified as essential in the literature: 117

To view the complete list of Literature Genes click here . To view the complete list of Metabolic Model genes, click here .

Additionally, the University of Lethbridge team has constructed a series of visualized diagrams for some of the essential metabolic genes in our list. Please click here to see these figures. When these lists are compared to the literature lists of essential genes, they are found to have a very limited amount of overlap. In fact,our BioBytes Essential Gene List differs by 40%.

Correlation of BioBytes Essential Gene List to Literature Lists
Number of Genes Found in Common in Literature and BioBytes Essential Gene Lists

This list gives a much greater chance of success in producing a minimal genome than many of the sources that are presently available. With this list completed, our BioBytes approach can be used to assemble these genes into constructs and eventually produce the genome. Together, our modeling work along with BioBytes serve as a genome construction toolkit which anyone can use.

Standardization of Gene Regulation Components

In order to produce a well characterized and standardized minimal genome, numerous components have been standardized including promoters, terminators, and RBS sites. These have been incorporated into the BioBytes system either as individual parts (as in the case of promoters and terminators) or as components of our unique plasmids pAB and pBA (which occurred with the RBS site). Microarray data was also used to identify the relative amount of transcript which was produced by each essential gene and therefore which promoter to incorporate with each gene.

Click here for more...

Incorporation Into pAB/pBA

In order to produce the minimal genome, each individual gene is required to be amplified. In order to accomplish this, PCR was used to produce genes with distinct ends allowing for insertion into the pAB or pBA plasmids used in genome construction. 188 of these primers have been tested and added to the parts registry (please see the Achievements section for the parts list).

Click here for more...