Team:Bologna/Software

From 2009.igem.org

(Difference between revisions)
 
(37 intermediate revisions not shown)
Line 10: Line 10:
<br>
<br>
----
----
 +
<font size="4"><b>The Softwares</b></font><br>
 +
* [[Image:BASER_ICON.png|30px|]] <font size="4">BASER</font><br>
 +
<font color=#FF0000><b>N.B. we had problems uploading the .zip file, to get the software just email: </b></font><br>
 +
jflegias AT gmail DOT com <br>
 +
or <br>
 +
andrea.samore AT unibo DOT it <br><br>
 +
 +
<font size="2">Note: <b>Matlab with Bioinformatics toolbox required;</b> after decompacting the file, read README.txt</font><br>
 +
<font size="3">More details about BASER [https://2009.igem.org/Team:Bologna/Software#1 here]</font><br><br><br>
 +
* [[Image:microscopio.gif|40px|]] <font size="4">[[Media:VIFluoR.zip|Download]] VIFluoR</font>
 +
<font size="2">Note: <b>Matlab required;</b> after decompacting the file, read ReadMe.txt</font><br>
 +
<font size="3">More details about VIFluoR [https://2009.igem.org/Team:Bologna/Software#2 here]</font>
 +
<br>
 +
----
 +
=1=
<br>
<br>
<font size="5"><center>
<font size="5"><center>
-
<b>B<font color=#00FF00>A</font>S<font color=#00FF00>E</font>R: Best Sequence Research by <font color=#00FF00>A</font>ndrea and <font color=#00FF00>E</font>lisa</b>
+
<b>B<font color=#00FF00>A</font>S<font color=#00FF00>E</font>R<br>Best Sequence Research by <font color=#00FF00>A</font>ndrea and <font color=#00FF00>E</font>lisa</b>
</center>
</center>
<br>
<br>
Line 19: Line 34:
<font size="3">
<font size="3">
<div style="text-align:justify">
<div style="text-align:justify">
-
BASER is a computer program developed to design synthetic DNA sequences whose transcribed RNAs: a) feature maximal free energy in the secondary structure (i.e. reducing the probability of its intra-molecular annealing); b) have minimal unwanted interactions with genomic mRNA; c) present a minimal probability of partial/shifted hybridization with complementary strands.  These specifications are required for the proper engineering of the TRANS and CIS complementary sequences, whose functions are described in the T-ReX device.
+
BASER is a computer program developed to design synthetic DNA sequences whose transcribed RNAs: a) feature maximal free energy in the secondary structure (i.e. reducing the probability of its intra-molecular annealing); b) have minimal unwanted interactions with genomic mRNA; c) present a minimal probability of partial/shifted hybridization with complementary strands.  These specifications are required for the proper engineering of the TRANS and CIS complementary sequences, whose functions are described in the T-REX device.
<br><br>
<br><br>
Line 30: Line 45:
a) the self score: proportional to the minimum free energy of the corresponding RNA secondary structure <font color=#FF0000>[1,2]</font>; b) the genomic score: the number of times that the sequence appears in the coding DNA with at least '''m''' adjacent nucleotides out of a total of '''n''' corresponding nucleotides; c) the shifted score: proportional to the best suboptimal pairing of the “current” sequence and its Watson and Crick complementary strand.
a) the self score: proportional to the minimum free energy of the corresponding RNA secondary structure <font color=#FF0000>[1,2]</font>; b) the genomic score: the number of times that the sequence appears in the coding DNA with at least '''m''' adjacent nucleotides out of a total of '''n''' corresponding nucleotides; c) the shifted score: proportional to the best suboptimal pairing of the “current” sequence and its Watson and Crick complementary strand.
<br>
<br>
-
After score computation, five adjacent nucleotides in the “current” sequence are substituted with a randomly-picked block from the BF, originating a new sequence. The score of this new sequence is calculated and, if lower than the previous one, the new sequence will be considered as the “current” one in the next iteration (otherwise the previous one is maintained as the “current”). The algorithm tries to modify the “current” sequence until the number of total iterations N (N chosen by the user) has been reached. However, if the same sequence persists for more than K iterations (K<=N, K chosen by the user) without any improvement of its best score, this sequence is considered as candidate, so that BASER stops to go on, up to the number of total scheduled iterations. This same sequence, in its opposite 5’ to 3’ orientation will be the start sequence for a subsequent research by BASER. Candidate sequences are usually reached in less than 500 iterations (Fig. 1). All of them are reported at the end of the elaboration.
+
After score computation, five adjacent nucleotides in the “current” sequence are substituted with a randomly-picked block from the BF, originating a new sequence. The score of this new sequence is calculated and, if lower than the previous one, the new sequence will be considered as the “current” one in the next iteration (otherwise the previous one is maintained as the “current”). The algorithm tries to modify the “current” sequence until the number of total iterations N (N chosen by the user) has been reached. However, if the same sequence persists for more than K iterations (K<N, K chosen by the user) without any improvement of its best score, this sequence is considered as candidate. This same sequence, in its opposite 5’ to 3’ orientation will be the start sequence for a subsequent research by BASER. Candidate sequences are usually reached in less than 500 iterations (Fig. 1). All of them are reported at the end of the elaboration.
[[Image:Fig1.png|center|800px|thumb|<font size="2">Figure 1. Sequence Score vs Number of Iterations. In this example, when a sequence is found to have the same score for more than 150 iterations, it is considered a candidate sequence. Then, to restart the searching procedure, it is read in the opposite 5’ to 3’ orientation.</font>]]
[[Image:Fig1.png|center|800px|thumb|<font size="2">Figure 1. Sequence Score vs Number of Iterations. In this example, when a sequence is found to have the same score for more than 150 iterations, it is considered a candidate sequence. Then, to restart the searching procedure, it is read in the opposite 5’ to 3’ orientation.</font>]]
Line 40: Line 55:
The CIS-repressing (RNA secondary structure in Fig. 2) sequence chosen by BASER to be assembled upstream of the target protein-coding sequence in T-REX resulted:
The CIS-repressing (RNA secondary structure in Fig. 2) sequence chosen by BASER to be assembled upstream of the target protein-coding sequence in T-REX resulted:
<br><br>
<br><br>
-
<center><b>AACACAAACTATCACTTTAACAACACATTACATATACATTAAAATATTAC</b><font color=#7cfc00><i><u>AAAG</u>AGG</i></font><b>AGAAA</b><br>
+
<center><b>AACACAAACTATCACTTTAACAACACATTACATATACATTAAAATATTAC</b><font color=#00ff00><i><u>AAAG</u>AGG</i></font><i>AGAAA</i><br>
-
(RBS in <i>italic</i>)
+
(RBS in <i>italic</i>)</center><br>
 +
 
 +
[[Image:Immagine2.png|center|800px|thumb|<font size="2">Figure 2.</font>]]
 +
<br>
 +
Its complementary TRANS-repressor sequence, with a RBS cover in two versions of different length, was:<br><br>
 +
either<br><br>
 +
<center>
 +
<font color=#00ff00>CCTCTTT</font><b>GTAATATTTTAATGTATATGTAATGTGTTGTTAAAGTGATAGTTTGTGTT</b><br>
 +
with a 7b-long RBS cover in <font color=#00ff00>green</font> (RNA secondary structure in Fig. 3a)
 +
</center>
 +
or
 +
<center>
 +
<font color=#00ff00><u>CTTT</u></font><b>GTAATATTTTAATGTATATGTAATGTGTTGTTAAAGTGATAGTTTGTGTT</b><br>
 +
with a 4b-long RBS cover in <font color=#00ff00><u>green underlined</u></font> (RNA secondary structure in Fig. 3b)
 +
<br>
 +
{|align="center"
 +
|[[Image:figura3a.png|center|450 px|thumb|Figure 3a]]
 +
|[[Image:figura3b.png|center|450 px|thumb|Figure 3b]]
 +
|}
 +
<br><br><br></center>
 +
<font color=#FF0000>1.</font> Wuchty, S., Fontana, W., Hofacker, I., and Schuster, P. (1999). Complete suboptimal
 +
folding of RNA and the stability of secondary structures. Biopolymers 49, 145–165<br>
 +
<font color=#FF0000>2.</font> Matthews, D., Sabina, J., Zuker, M., and Turner, D. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940.
 +
<br><br>
 +
[https://2009.igem.org/Team:Bologna/Software ''Up'']
 +
----
 +
<br>
 +
 
 +
=2=
 +
<br><br>
 +
<font size="5">
 +
<center>
 +
<b>VIFluoR <br><br> (Very Inexpensive Fluorescence Reader) </b>
 +
</center>
 +
<br>
 +
<font size="3">
 +
In Synthetic Biology, the quantification of protein synthesis using fluorescent reporters is an established practice. Fluorescence intensity can be assessed by two different methods: i) the measure of total fluorescence produced by populations of cells; ii) the acquisition of images by an optical microscope to obtain information on single-cell fluorescence. We have developed a Matlab code (VIFluoR) to estimate, by imaging analysis,  the fluorescence emitted by a single bacterium. To validate the program, we compared the fluorescence determined using micro-images with the fluorescence measured in the same bacterial culture using a fluorimeter.
 +
 
 +
<br>
 +
<font size="4"><b>Images Selection</b></font>
 +
<br>
 +
 
 +
Single cell fluorescence in a bacterial population exhibits large variability, also depending on the cell cycle phase. To obtain a significant representation of bacterium fluorescence, it is necessary to acquire several images, each one reporting a sufficient number of bacterial cells (Figure 1). Our program can process more images at once. In addition, the images to be processed are selected when the program is running (Figure 2).
 +
 
 +
[[Image:partenza.jpg|center|thumb|900px|<font size="2">Figure 1 - Image of <i>E. coli</i> (Magnification 400x)</font>]]
 +
<br>
 +
[[Image:pick.jpg|center|500px|]]
 +
 
 +
<br><br>
 +
<font size="4"><b>Bacterial cell recognition</b></font>
 +
<br>
 +
VIFluoR firstly operates the image segmentation and then recognises the bacterial cells (Figure 3) evaluating two properties of segmented objects: the morphology, assuming an elliptical shape for the bacterium, and the focus. The user can select the bacterium morphology by setting the eccentricity (from 0 to 1), and the values for minimum and maximum area (in pixel). The program aids the user to set these parameters by showing the eccentricity and the area histograms of the candidate bacteria (Figure 4).
 +
<br><br>
 +
[[Image:seg.jpg|center|thumb|900px|<font size="2">Figure 3. Image segmentation. All the candidate bacteria are marked with red boxes.</font>]]
 +
<br>
 +
{|align="center"
 +
|[[Image:eccbact.png|center|thumb|450 px|<font size="2">Figure 4a. Bacterial eccentricity distribuction.</font>]]
 +
|[[Image:area.jpg|center|450 px|thumb|<font size="2">Figure 4b. Bacterial area distribuction.</font>]]
 +
|}
 +
<br>
 +
Focus clustering is an automated routine that divides the candidate bacteria in clusters and selects only the individuals featuring high fluorescence and high cell numbers. The program marks with a yellow box the bacteria chosen after the morphology and focus tests and with a red box the others (Figure 5).
 +
<br><br>
 +
 
 +
[[Image:cluster.jpg|center|1000 px|thumb|<font size="2">Figure 5. Bacterium recognition: bacteria passing the morphology and focus tests are marked with a yellow box.</font>]]
 +
 
 +
<br><br>
 +
<font size="4"><b>Program output</b></font>
 +
<br>
 +
The program seeks the pixel fluorescence associated to each bacterium and then shows the fluorescence histogram per pixels (Figure 6.a)
 +
<br>
 +
{|align="center"
 +
|[[Image:pixels.jpg|center|thumb|420 px|<font size="2">Figure 6a. Fluorescence histograms per pixel.</font>]]
 +
|[[Image:meand.jpg|center|470 px|thumb|<font size="2">Figure 6b. Fluorescence histograms  per bacterium.</font>]]
 +
|}
 +
 
 +
For each recognized bacterium the program computes: the bacterium area in pixel and the bacterium fluorescence (mean over the bacterium pixel number). Data are stored on the 4 vectors (Px<i><b>zzz</b></i> , F<i><b>zzz</b></i>, A<i><b>zzz</b></i> and Results<i><b>zzz</b></i> where <i><b>zzz</b></i> is the name given to the dataset when the program starts). The vector Results<i><b>zzz</b></i> contains the most important data of the analysis to help the consultation. If you have excel it's possible to save, as image.xls, a worksheet with all the data. Vectors Px<i><b>zzz</b></i> , F<i><b>zzz</b></i>, A<i><b>zzz</b></i> are used to plot the mean fluorescence histogram, boxplot of bacteria fluorescence (Figure 7) and the area vs the fluorescence graph (Figure 8).
 +
<br>
 +
{|align="center"
 +
|[[Image:boxpl.jpg|center|thumb|440 px|center|<font size="2">Figure 7. Boxplot of bacteria fluorescence.</font>]]
 +
|[[Image:areavsfluo.jpg|center|470 px|center|thumb|<font size="2">Figure 8. Area vs Fluorescence.</font>]]
 +
|}
 +
<br>
 +
[https://2009.igem.org/Team:Bologna/Software ''Up'']

Latest revision as of 23:00, 21 October 2009

ProvaBol2.png
HOME TEAM PROJECT SOFTWARE MODELING WET LAB PARTS HUMAN PRACTICE JUDGING CRITERIA


"Part of the inhumanity of the computer is that, once it is competently programmed and working smoothly, it is completely honest."

I. Asimov


The Softwares

  • BASER ICON.png BASER

N.B. we had problems uploading the .zip file, to get the software just email:
jflegias AT gmail DOT com
or
andrea.samore AT unibo DOT it

Note: Matlab with Bioinformatics toolbox required; after decompacting the file, read README.txt
More details about BASER here


Note: Matlab required; after decompacting the file, read ReadMe.txt
More details about VIFluoR here


1


BASER
Best Sequence Research by Andrea and Elisa


Aims

BASER is a computer program developed to design synthetic DNA sequences whose transcribed RNAs: a) feature maximal free energy in the secondary structure (i.e. reducing the probability of its intra-molecular annealing); b) have minimal unwanted interactions with genomic mRNA; c) present a minimal probability of partial/shifted hybridization with complementary strands. These specifications are required for the proper engineering of the TRANS and CIS complementary sequences, whose functions are described in the T-REX device.

Method
The BASER algorithm builds a 50 nucleotide-long sequence (start sequence), assembled by linking 10 blocks of 5 nucleotides each, randomly extracted from a basket file (BF), stored in a basket directory. The BF file can be either uniform (containing all the possible combinations of 2, 3 or 4 distinct nucleotides in the 5 available places) or non-uniform (containing each block a number of times that is inversely proportional to the appearance of that same block in the genomic DNA of E. coli).

After having generated a 50b-long sequence, BASER performs the Conformity test to check that the sequence does not contain: a) more than 5 adjacent repeats of the same nucleotide (to avoid transcription errors); b) restriction sites; c) RBS sequences. If one among these conditions occurs, a new sequence is generated until the Conformity test is passed. Thereafter the RBS sequence, chosen by the user, is linked downstream of its 3’ end to obtain what is called the “current” sequence. BASER thus calculates a score for the “current” sequence, derived from a combination of:
a) the self score: proportional to the minimum free energy of the corresponding RNA secondary structure [1,2]; b) the genomic score: the number of times that the sequence appears in the coding DNA with at least m adjacent nucleotides out of a total of n corresponding nucleotides; c) the shifted score: proportional to the best suboptimal pairing of the “current” sequence and its Watson and Crick complementary strand.
After score computation, five adjacent nucleotides in the “current” sequence are substituted with a randomly-picked block from the BF, originating a new sequence. The score of this new sequence is calculated and, if lower than the previous one, the new sequence will be considered as the “current” one in the next iteration (otherwise the previous one is maintained as the “current”). The algorithm tries to modify the “current” sequence until the number of total iterations N (N chosen by the user) has been reached. However, if the same sequence persists for more than K iterations (K<N, K chosen by the user) without any improvement of its best score, this sequence is considered as candidate. This same sequence, in its opposite 5’ to 3’ orientation will be the start sequence for a subsequent research by BASER. Candidate sequences are usually reached in less than 500 iterations (Fig. 1). All of them are reported at the end of the elaboration.

Figure 1. Sequence Score vs Number of Iterations. In this example, when a sequence is found to have the same score for more than 150 iterations, it is considered a candidate sequence. Then, to restart the searching procedure, it is read in the opposite 5’ to 3’ orientation.


CIS and TRANS sequences

The CIS-repressing (RNA secondary structure in Fig. 2) sequence chosen by BASER to be assembled upstream of the target protein-coding sequence in T-REX resulted:

AACACAAACTATCACTTTAACAACACATTACATATACATTAAAATATTACAAAGAGGAGAAA
(RBS in italic)

Figure 2.


Its complementary TRANS-repressor sequence, with a RBS cover in two versions of different length, was:

either

CCTCTTTGTAATATTTTAATGTATATGTAATGTGTTGTTAAAGTGATAGTTTGTGTT
with a 7b-long RBS cover in green (RNA secondary structure in Fig. 3a)

or

CTTTGTAATATTTTAATGTATATGTAATGTGTTGTTAAAGTGATAGTTTGTGTT
with a 4b-long RBS cover in green underlined (RNA secondary structure in Fig. 3b)

Figure 3a
Figure 3b



1. Wuchty, S., Fontana, W., Hofacker, I., and Schuster, P. (1999). Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49, 145–165
2. Matthews, D., Sabina, J., Zuker, M., and Turner, D. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940.

Up



2



VIFluoR

(Very Inexpensive Fluorescence Reader)


In Synthetic Biology, the quantification of protein synthesis using fluorescent reporters is an established practice. Fluorescence intensity can be assessed by two different methods: i) the measure of total fluorescence produced by populations of cells; ii) the acquisition of images by an optical microscope to obtain information on single-cell fluorescence. We have developed a Matlab code (VIFluoR) to estimate, by imaging analysis, the fluorescence emitted by a single bacterium. To validate the program, we compared the fluorescence determined using micro-images with the fluorescence measured in the same bacterial culture using a fluorimeter.


Images Selection

Single cell fluorescence in a bacterial population exhibits large variability, also depending on the cell cycle phase. To obtain a significant representation of bacterium fluorescence, it is necessary to acquire several images, each one reporting a sufficient number of bacterial cells (Figure 1). Our program can process more images at once. In addition, the images to be processed are selected when the program is running (Figure 2).

Figure 1 - Image of E. coli (Magnification 400x)


Pick.jpg



Bacterial cell recognition
VIFluoR firstly operates the image segmentation and then recognises the bacterial cells (Figure 3) evaluating two properties of segmented objects: the morphology, assuming an elliptical shape for the bacterium, and the focus. The user can select the bacterium morphology by setting the eccentricity (from 0 to 1), and the values for minimum and maximum area (in pixel). The program aids the user to set these parameters by showing the eccentricity and the area histograms of the candidate bacteria (Figure 4).

Figure 3. Image segmentation. All the candidate bacteria are marked with red boxes.


Figure 4a. Bacterial eccentricity distribuction.
Figure 4b. Bacterial area distribuction.


Focus clustering is an automated routine that divides the candidate bacteria in clusters and selects only the individuals featuring high fluorescence and high cell numbers. The program marks with a yellow box the bacteria chosen after the morphology and focus tests and with a red box the others (Figure 5).

Figure 5. Bacterium recognition: bacteria passing the morphology and focus tests are marked with a yellow box.



Program output
The program seeks the pixel fluorescence associated to each bacterium and then shows the fluorescence histogram per pixels (Figure 6.a)

Figure 6a. Fluorescence histograms per pixel.
Figure 6b. Fluorescence histograms per bacterium.

For each recognized bacterium the program computes: the bacterium area in pixel and the bacterium fluorescence (mean over the bacterium pixel number). Data are stored on the 4 vectors (Pxzzz , Fzzz, Azzz and Resultszzz where zzz is the name given to the dataset when the program starts). The vector Resultszzz contains the most important data of the analysis to help the consultation. If you have excel it's possible to save, as image.xls, a worksheet with all the data. Vectors Pxzzz , Fzzz, Azzz are used to plot the mean fluorescence histogram, boxplot of bacteria fluorescence (Figure 7) and the area vs the fluorescence graph (Figure 8).

Figure 7. Boxplot of bacteria fluorescence.
Figure 8. Area vs Fluorescence.


Up