Team:Alberta/Project/Bioinformatics

From 2009.igem.org

(Difference between revisions)
 
(23 intermediate revisions not shown)
Line 4: Line 4:
<style type="text/css">
<style type="text/css">
.b1f, .b2f, .b3f, .b4f{font-size:1px; overflow:hidden; display:block;}
.b1f, .b2f, .b3f, .b4f{font-size:1px; overflow:hidden; display:block;}
-
.b1f {height:1px; background:#ADED7C; margin:0 5px;}
+
.b1f {height:1px; background:#e1e1e1; margin:0 5px;}
-
.b2f {height:1px; background:#ADED7C; margin:0 3px;}
+
.b2f {height:1px; background:#e1e1e1; margin:0 3px;}
-
.b3f {height:1px; background:#ADED7C; margin:0 2px;}
+
.b3f {height:1px; background:#e1e1e1; margin:0 2px;}
-
.b4f {height:2px; background:#ADED7C; margin:0 1px;}
+
.b4f {height:2px; background:#e1e1e1; margin:0 1px;}
-
.content {background: #ADED7C;}
+
.content {background: #e1e1e1;}
.content div {margin-left: 5px;}
.content div {margin-left: 5px;}
</style>
</style>
Line 26: Line 26:
     <div class="Outreach">
     <div class="Outreach">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
-
     <h1>Why build a minimal genome?</h1>
+
     <h1>Why a Minimal Genome?</h1>
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
<div align="justify">
<div align="justify">
-
<font size="2">
 
-
<P>Genomes are complex! Determining how simplified a genome can become enriches our understanding of the function and interactions of cellular components. Simplified cells can be used as a well characterized chasses for synthetic biology. Moreover, a simplified cell can be used to study cellular processes in a controlled, characterized genetic background. Finally, developing a minimal genome requires us to develop and optimize molecular methods of genome assembly. These methods can be then applied to other high-throughput biology. </P>
 
-
</font></div>
+
<P>One of the most useful applications of the BioBytes assembly method is the production of entire genomes.  This brings the synthetic biology research community closer to one of its holy grails, the production of a viable synthetic minimal organism.  For this reason the BioBytes team has attempted to create the tools and design principles needed to produce a minimal genome.</P>
 +
 
 +
<P>
 +
A minimal genome provides many benefits to the scientific community.</P>
 +
 
 +
<ul>
 +
 +
<li>Genomes are extremely complex. Producing a minimal genome allows for a better understanding of the function and interaction of key cellular components needed for life.</li>
 +
<li>A minimal cell provides a chassis for future research with minimal intracellular inteferents.  This makes it the optimum research vector.
 +
 
 +
</ul> 
 +
 
 +
<h2>Genome Design</h2>
 +
<P>
 +
Due to complexity of producing a minimal genome, its development has been shortened into three sections:
 +
<ul>
 +
<li>The selection of essential genes to be used in the genome
 +
<li>Building the genome via the BioBytes Assembly Method
 +
<li>Using recombination to eliminate the original host chromosome and replace it with the minimal chromosome
 +
</ul></p>
 +
<p>
 +
 
 +
<h2>Why <i>E. coli</i>?</h2>
 +
The <i>E. coli</i> bacterium (strain MG1655) was chosen as the model organism for the production of our essential genome.  Although other organisms have smaller genomes (<i>E. coli</i> contains over 4500 genes) <i>Escherichia coli</i> is the most commonly used laboratory organism.  This means that it is one of the most widely studied and understood organisms.  This gives us the greatest success in producing a minimal genome, while simultaneously producing the most useful research vector for the scientific community.</P>
 +
 
 +
</div>
       </div></div>
       </div></div>
Line 46: Line 69:
     <div class="Why We Need Bioinformatics">
     <div class="Why We Need Bioinformatics">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
-
     <h1>Why We Need Bioinformatics</h1>
+
     <h1>Determining Essential Genes</h1>
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
Line 53: Line 76:
<font size="2">
<font size="2">
-
<b> The size and complexity of the genome make bioinformatics analysis essential. We used bioinformatics to accomplish the following: </b>
+
<p> <i>E. coli</i> has over 4,500 genes.  The size and complexity of this genome makes it almost impossible to manually process. An ''in silico'' approach allows for this complex data to be more easily collected, manipulated, and interpreted.  Bioinformatics has aided us in accomplishing the following:</p>
<ul>
<ul>
-
<li>review lists of essential genes in the literature and existing databases and compile a preliminary essential gene list </li>
+
<li>Review lists of essential genes in the literature and existing databases and compile a preliminary essential gene list </li>
-
<li>model the metabolic reactions and net growth rate of E.coli with given gene sets. This identified additional metabolic genes essential to a minimal genome. </li>
+
<li>Model the metabolic reactions and net growth rate of <i>E. coli</i> with given gene sets. This identified additional metabolic genes essential to a minimal genome. </li>
-
<li>identify knock out combinations that could be tested in the wet lab, to verify the accuracy of our metabolic model. </li>
+
<li>Identify knock out combinations that could be tested in the wet lab, to verify the accuracy of our metabolic model. </li>
-
<li>select standardized promoters and terminators that would replace the natural promoters and terminators of essential genes. </li>
+
<li>Select standardized promoters and terminators that would replace the natural promoters and terminators of essential genes. </li>
-
<li>determine which promoter should be used with which gene, by analyzing expression level data. </li>
+
<li>Determine which promoter should be used with which gene, by analyzing expression level data. </li>
-
<li>design primers to amplify all essential genes from genomic DNA. </li>
+
<li>Design primers to amplify all essential genes from genomic DNA. </li>
</ul>
</ul>
-
<b> These steps have all been completed, and are described on the following pages. </b>
+
<b> These steps have all been completed, and are described on the following pages.   </b>
<P>
<P>
Line 84: Line 107:
<style type="text/css">
<style type="text/css">
.b1f, .b2f, .b3f, .b4f{font-size:1px; overflow:hidden; display:block;}
.b1f, .b2f, .b3f, .b4f{font-size:1px; overflow:hidden; display:block;}
-
.b1f {height:1px; background:#ADED7C; margin:0 5px;}
+
.b1f {height:1px; background:#e1e1e1; margin:0 5px;}
-
.b2f {height:1px; background:#ADED7C; margin:0 3px;}
+
.b2f {height:1px; background:#e1e1e1; margin:0 3px;}
-
.b3f {height:1px; background:#ADED7C; margin:0 2px;}
+
.b3f {height:1px; background:#e1e1e1; margin:0 2px;}
-
.b4f {height:2px; background:#ADED7C; margin:0 1px;}
+
.b4f {height:2px; background:#e1e1e1; margin:0 1px;}
-
.content {background: #ADED7C;}
+
.content {background: #e1e1e1;}
.content div {margin-left: 5px;}
.content div {margin-left: 5px;}
</style>
</style>
Line 106: Line 129:
     <div class="Outreach">
     <div class="Outreach">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
     <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
-
     <h1>Literature Review</h1>
+
     <h1>Gene Selection</h1>
<font size="2">
<font size="2">
-
<P>
+
<P> In order to produce a preliminary genome list, various databases and papers were usedThese were determined through a variety of different experimental methods and have very limited overlap. Each gene must was carefully considered and a gene list of 332 genes was produced. Additionally, 29 genes were found to be essential for the RNA's.</P>
-
Four essential gene lists from the literature were analyzed to construct a preliminary essential gene list </P>
+
<p align=right><a href="https://2009.igem.org/Team:Alberta/Project/Gene_Selection"> Click here for more...</a>. </P>
-
 
+
-
<TABLE BORDER>
+
-
<TR>
+
-
<TH>Essential Gene List from the literature</TH>
+
-
<TH>Method</TH>
+
-
<TH># of Genes considered essential</TH>
+
-
<TH>% of E.coli genes considered essential</TH>
+
-
<TH># of genes unique to that list</TH>
+
-
</TR>
+
-
<TR>
+
-
<TD>Baba et al. 2006</TD>
+
-
<TD>Single gene knockout </TD>
+
-
<TD>303</TD>
+
-
<TD> 6.4%</TD>
+
-
<TD> 36</TD>
+
-
</TR>
+
-
<TR>
+
-
<TD>Gerdes et al 2003</TD>
+
-
<TD>Transposon insertions to inactivate single genes</TD>
+
-
<TD>617</TD>
+
-
<TD>13.0%</TD>
+
-
<TD> 379 </TD>
+
-
</TR>
+
-
<TR>
+
-
<TD>Gil et al 2004</TD>
+
-
<TD>Gene conservation and literature review</TD>
+
-
<TD>203</TD>
+
-
<TD>4.3%</TD>
+
-
<TD>53</TD>
+
-
</TR>
+
-
<TR>
+
-
<TD>Profiling of E.coli Chromosome (PEC) database</TD>
+
-
<TD>Literature review</TD>
+
-
<TD>302</TD>
+
-
<TD>6.3%</TD>
+
-
<TD>126</TD>
+
-
</TR>
+
-
</TABLE BORDER>
+
-
 
+
-
 
+
-
<P>These literature lists vary greatly in size and have minimal overlap. </P>
+
-
 
+
-
<b>Venn Diagram of the Number of Essential Genes Shared Between Lists in the Literature</b>
+
-
 
+
-
<img src="https://static.igem.org/mediawiki/2009/a/a1/Uofa_Venn_of_literature.png" width="450" height="450">
+
-
 
+
-
<P>The maximum number of genes in common between any two literature lists is 205, which is between Baba et al 2006 and Gerdes et al 2003. Only 48 genes were present in all four lists.</P>
+
-
 
+
-
 
+
-
</font></div>
+
-
 
+
-
      </div></div>
+
-
<b class="b4f"></b><b class="b3f"></b><b class="b2f"></b><b class="b1f"></b>
+
-
    </td>
+
-
  </tr>
+
-
 
+
-
<tr>
+
-
<td style="height: 400; padding-left: 10px; padding-right: 10px; padding-top: 11px;">
+
-
    <b class="b1f"></b><b class="b2f"></b><b class="b3f"></b><b class="b4f"></b>
+
-
    <div class="Presentations">
+
-
    <div style="height: 400; background:#FFFFFF; colorou line-height:100% padding: 3px 0px;">
+
-
    <h1>Constructing the Biobytes Preliminary Essential Gene List</h1>
+
-
 
+
-
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
+
-
<div align="justify">
+
-
 
+
-
<font size="2">
+
-
 
+
-
<p> The preliminary essential gene list is based on literature sources. As described in the modeling section of this wiki, the metabolic genes from this preliminary list were used as a starting point for the computer model and were greatly altered based on the model's suggestions. Non-metabolic genes in the this preliminary list were retained in the final list, described in the "Gene Selection" tab.
+
-
 
+
-
<h3> Criteria for Gene Selection: </h3>
+
-
<ul>
+
-
<li>Genes must be present in more than one literature list unless there is particular reason to suspect they are essential.</li>
+
-
<li>The BioBytes metabolism is modeled after the minimal metabolism proposed by Gil et al 04, with the addition of cell wall, fatty acid, heme and ubiquitin synthesis, as Gil assumed these would not be necessary in a mycoplasma like minimal cell.</li>
+
-
<li>Additional genes required for metabolism were selected based on pathway information in the Ecocyc database. Redundancy of pathways is likely why these genes don’t appear essential in Baba, Gerdes and PEC.</li>
+
-
<li>Antitoxin genes are not essential as toxin genes would not be present.</li>
+
-
</ul>
+
-
 
+
-
<h3>Genes for the following processes were included:</h3>
+
-
<ul>
+
-
<li>DNA replication and cell division, but no DNA repair</li>
+
-
<li>Chaperones, but no heat shock or membrane stress response system</li>
+
-
<li>Transcription</li>
+
-
<li>Translation</li>
+
-
<li>Glycolysis</li>
+
-
<li>PMF generation via an ATP synthase consuming ATP to export protons.</li>
+
-
<li>Synthesis of acetyl-CoA from pyruvate</li>
+
-
<li>Fatty acid synthesis</li>
+
-
<li>Methylerithritol pathway (for undecaprenyl phosphate and a ubiquinone side chain)</li>
+
-
<li>Synthesis of phosphatidylethanolamine, but no other phospholipids</li>
+
-
<li>Pentose phosphate pathway (converts 6 or 3 carbon sugars to 5C sugars, such as ones needed in nucleotide biosynthesis)</li>
+
-
<li>Lipoprotein synthesis (Int and lolB are lipoproteins and essential)</li>
+
-
<li>Synthesis of nucleotides (deoxy and oxy) from nucleosides</li>
+
-
<li>Attaching lipid and biotin groups to protein</li>
+
-
<li>Transport:</li>
+
-
<ul>
+
-
<li>PTC transport system (imports and phosphorylates glucose)</li>
+
-
<li>Inorganic phosphate transport</li>
+
-
<li>Nucleoside transport</li>
+
-
<li>Sec system (exports proteins to periplasm), including SRP for cotranslational membrane insertion. secB chaperone does not appear essential. There is NO tat system, which would be used to export cofactor containing folded proteins.</li>
+
-
<li>Lipoprotein transport to outermembrane</li>
+
-
<li>Glutathione transport </li>
+
-
</ul>
+
-
<li>Cofactor synthesis: </li>
+
-
<ul>
+
-
<li>Riboflavin from GTP and ribulose-5-phosphate </li>
+
-
<li>FAD from riboflavin</li>
+
-
<li>NAD from nicotinamide</li>
+
-
<li>NADPH from NAD</li>
+
-
<li>CoA from pantothenic acid</li>
+
-
<li>Methylene tetrahydroxyfolate (mTHF) from folic acid</li>
+
-
<li>S-adenosylmethionine (SAM) from methionine</li>
+
-
<li>Thiamine diphosphate (TPP) from thiamine</li>
+
-
<li>Pyridoxal-5-phosphate (PP) from pyridoxal </li>
+
-
<li>Heme from glutamate </li>
+
-
<li>Ubiquinone </li>
+
-
</ul>
+
-
</ul>
+
-
 
+
-
<h3>RNAs:</h3>
+
-
<P>The rrnC operon supplies all the rRNA’s and three of the tRNAs. This operon was selected because it includes the great number of tRNA’s.  To select the other tRNA’s, all tRNA’s listed as essential in PEC were first included. One tRNA was then selected for each anticodon that differed on one of the last two bases. Differences in the first base can be accommodated by anticodon 'wobble'. At least one tRNA was included for each amino acid.  </P>
+
-
 
+
-
 
+
-
<P>The complete list of essential RNA’s can be found <a href="https://2009.igem.org/Image:Uofa_RNAs_essential.xls"> here </a>. </P>
+
-
 
+
-
 
+
-
</font></div>
+
-
 
+
-
      </div></div>
+
-
<b class="b4f"></b><b class="b3f"></b><b class="b2f"></b><b class="b1f"></b>
+
     </td>
     </td>
   </tr>
   </tr>
Line 250: Line 143:
     <div class="Survey">
     <div class="Survey">
     <div style="height: 400; background:#FFFFFF; line-height:100% padding: 3px 0px;">
     <div style="height: 400; background:#FFFFFF; line-height:100% padding: 3px 0px;">
-
     <h1>Statistics on BioBytes Preliminary Essential Gene List</h1>
+
     <h1>Metabolic Modeling</h1>
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
Line 257: Line 150:
<font size="2">
<font size="2">
-
<P> Total genes in Ecoli: 4762 </P>
+
<p> To verify that all genes necessary for metabolism are included in our essential gene list, a computer model was used.  The Model was produced by the Palson group at the University of San Diego and was used in conjunction with the Cobra Toolbox developed by the System's Biology Research Group.  It provides a new "in silico" approach to identifying essential genes.  The results from the computational analysis suggests that many more genes are required in order to produce a viable minimal genome.  This added an additional 118 essential genes.  Together with the Literature Research, 450 genes were found to make up our essential gene list.  In order to accomplish this a series of programs were developed to be used with the Cobra Toolbox.  These programs allow for '''the determination of any organism's minimal metabolic network.'''  The results of the metabolic modeling is currently being researched in the wetlab to demonstrate its accuracy.</p>
-
<P> Total protein coding genes in BioBytes preliminary essentials list: 332 </P>
+
<p align=right><p align=right><a href="https://2009.igem.org/Team:Alberta/Project/Modeling"> Click here for more...</a> </P>
-
<P> Total number of RNA genes in BioBytes preliminary essentials list: 29 </P>
+
-
 
+
-
 
+
</font></div>
</font></div>
Line 268: Line 158:
     </td>
     </td>
   </tr>
   </tr>
-
<tr>
 
-
<td style="height: 400; padding-left: 10px; padding-right: 10px; padding-top: 11px;">
 
-
    <b class="b1f"></b><b class="b2f"></b><b class="b3f"></b><b class="b4f"></b>
 
-
    <div class="Survey">
 
-
    <div style="height: 400; background:#FFFFFF; line-height:100% padding: 3px 0px;">
 
-
    <h1>Metabolic Modeling</h1>
 
-
 
-
<!-- <div align="justify" style="padding-left:20px; padding-right:20px"> -->
 
-
<div align="justify">
 
-
 
-
<font size="2">
 
-
 
-
<p> Unfortunately, due to the lack of consensus seen from the literature genes, it was necessary to find another way of producing our essential gene list.  Metabolic modeling was used to aid in the identification of important essential genes.  The E.coli MG1655 genome was modeled by the Palsson group at the University of San Diego and was used for our modeling experiments.  Additionally, the Cobra Toolbox developed by the System`s Biology Research Group was used to interface the model with the Matlab program.</p>
 
-
<p>A series of multiple gene deletions were performed using the model in order to determine the essential metabolic genes.  These were compared to the literature genes selected and based on the individual gene`s function and involvement in various metabolic pathways, the gene was either added to our master essential gene list or removed.  Additionally, media conditions were altered for the cells environment allowing for predictions for the conditions which should be applied to the minimal cell once it is developed.  Once completed our MatLab model contributed 116 genes to our master gene list.</p>
 
-
<p>Metabolic modeling allows for computational analysis of entire genomes which would be impossible to accomplish any other way.  The various sources and methods used to collect data has allowed for an unique gene list which has the best possible chance of producing a minimal genome.  This has been produced through a series of multiple gene deletions and media change in silico experiments.  <b>The MatLab protocols demonstrated in the modeling section can be used to identify any organism’s essential genes provided a model is available</b>.</p>
 
</table>
</table>
</div>
</div>
</HTML>
</HTML>

Latest revision as of 07:09, 21 October 2009

University of Alberta - BioBytes










































































































Why a Minimal Genome?

One of the most useful applications of the BioBytes assembly method is the production of entire genomes. This brings the synthetic biology research community closer to one of its holy grails, the production of a viable synthetic minimal organism. For this reason the BioBytes team has attempted to create the tools and design principles needed to produce a minimal genome.

A minimal genome provides many benefits to the scientific community.

  • Genomes are extremely complex. Producing a minimal genome allows for a better understanding of the function and interaction of key cellular components needed for life.
  • A minimal cell provides a chassis for future research with minimal intracellular inteferents. This makes it the optimum research vector.

Genome Design

Due to complexity of producing a minimal genome, its development has been shortened into three sections:

  • The selection of essential genes to be used in the genome
  • Building the genome via the BioBytes Assembly Method
  • Using recombination to eliminate the original host chromosome and replace it with the minimal chromosome

Why E. coli?

The E. coli bacterium (strain MG1655) was chosen as the model organism for the production of our essential genome. Although other organisms have smaller genomes (E. coli contains over 4500 genes) Escherichia coli is the most commonly used laboratory organism. This means that it is one of the most widely studied and understood organisms. This gives us the greatest success in producing a minimal genome, while simultaneously producing the most useful research vector for the scientific community.

Determining Essential Genes

E. coli has over 4,500 genes. The size and complexity of this genome makes it almost impossible to manually process. An ''in silico'' approach allows for this complex data to be more easily collected, manipulated, and interpreted. Bioinformatics has aided us in accomplishing the following:

  • Review lists of essential genes in the literature and existing databases and compile a preliminary essential gene list
  • Model the metabolic reactions and net growth rate of E. coli with given gene sets. This identified additional metabolic genes essential to a minimal genome.
  • Identify knock out combinations that could be tested in the wet lab, to verify the accuracy of our metabolic model.
  • Select standardized promoters and terminators that would replace the natural promoters and terminators of essential genes.
  • Determine which promoter should be used with which gene, by analyzing expression level data.
  • Design primers to amplify all essential genes from genomic DNA.
These steps have all been completed, and are described on the following pages.

Gene Selection

In order to produce a preliminary genome list, various databases and papers were used. These were determined through a variety of different experimental methods and have very limited overlap. Each gene must was carefully considered and a gene list of 332 genes was produced. Additionally, 29 genes were found to be essential for the RNA's.

Click here for more....

Metabolic Modeling

To verify that all genes necessary for metabolism are included in our essential gene list, a computer model was used. The Model was produced by the Palson group at the University of San Diego and was used in conjunction with the Cobra Toolbox developed by the System's Biology Research Group. It provides a new "in silico" approach to identifying essential genes. The results from the computational analysis suggests that many more genes are required in order to produce a viable minimal genome. This added an additional 118 essential genes. Together with the Literature Research, 450 genes were found to make up our essential gene list. In order to accomplish this a series of programs were developed to be used with the Cobra Toolbox. These programs allow for '''the determination of any organism's minimal metabolic network.''' The results of the metabolic modeling is currently being researched in the wetlab to demonstrate its accuracy.

Click here for more...