Team:Bologna/Software

From 2009.igem.org

(Difference between revisions)

Revision as of 09:39, 21 October 2009

HOME	TEAM	PROJECT	SOFTWARE	MODELING	WET LAB	PARTS	HUMAN PRACTICE	JUDGING CRITERIA

"Part of the inhumanity of the computer is that, once it is competently programmed and working smoothly, it is completely honest."

I. Asimov

BASER: Best Sequence Research by Andrea and Elisa

Aims

BASER is a computer program developed to design synthetic DNA sequences whose transcribed RNAs: a) feature maximal free energy in the secondary structure (i.e. reducing the probability of its intra-molecular annealing); b) have minimal unwanted interactions with genomic mRNA; c) present a minimal probability of partial/shifted hybridization with complementary strands. These specifications are required for the proper engineering of the TRANS and CIS complementary sequences, whose functions are described in the T-ReX device.

Method
The BASER algorithm builds a 50 nucleotide-long sequence (start sequence), assembled by linking 10 blocks of 5 nucleotides each, randomly extracted from a basket file (BF), stored in a basket directory. The BF file can be either uniform (containing all the possible combinations of 2, 3 or 4 distinct nucleotides in the 5 available places) or non-uniform (containing each block a number of times that is inversely proportional to the appearance of that same block in the genomic DNA of E. coli).

After having generated a 50b-long sequence, BASER performs the Conformity test to check that the sequence does not contain: a) more than 5 adjacent repeats of the same nucleotide (to avoid transcription errors); b) restriction sites; c) RBS sequences. If one among these conditions occurs, a new sequence is generated until the Conformity test is passed. Thereafter the RBS sequence, chosen by the user, is linked downstream of its 3’ end to obtain what is called the “current” sequence. BASER thus calculates a score for the “current” sequence, derived from a combination of:
a) the self score: proportional to the minimum free energy of the corresponding RNA secondary structure [1,2]; b) the genomic score: the number of times that the sequence appears in the coding DNA with at least m adjacent nucleotides out of a total of n corresponding nucleotides; c) the shifted score: proportional to the best suboptimal pairing of the “current” sequence and its Watson and Crick complementary strand.

After score computation, five adjacent nucleotides in the “current” sequence are substituted with a randomly-picked block from the BF, originating a new sequence. The score of this new sequence is calculated and, if lower than the previous one, the new sequence will be considered as the “current” one in the next iteration (otherwise the previous one is maintained as the “current”). The algorithm tries to modify the “current” sequence until the number of total iterations N (N chosen by the user) has been reached. However, if the same sequence persists for more than K iterations (K<=N, K chosen by the user) without any improvement of its best score, this sequence is considered as candidate, so that BASER stops to go on, up to the number of total scheduled iterations. This same sequence, in its opposite 5’ to 3’ orientation will be the start sequence for a subsequent research by BASER. Candidate sequences are usually reached in less than 500 iterations (Fig. 1). All of them are reported at the end of the elaboration.

@@ Line 8: / Line 8: @@
 <font face="Times New Roman" font size="4"><i> I. Asimov </i></font>
 </center></html>
+<br>
+----
+<br>
+<font size="5"><center>
+<b>B<font color=#00FF00>A</font>S<font color=#00FF00>E</font>R: Best Sequence Research by <font color=#00FF00>A</font>ndrea and <font color=#00FF00>E</font>lisa</b>
+</center>
+<br>
+<font size="4"><b>Aims</b></font>
+<br>
+<font size="3">
+<div style="text-align:justify">
+BASER is a computer program developed to design synthetic DNA sequences whose transcribed RNAs: a) feature maximal free energy in the secondary structure (i.e. reducing the probability of its intra-molecular annealing); b) have minimal unwanted interactions with genomic mRNA; c) present a minimal probability of partial/shifted hybridization with complementary strands.  These specifications are required for the proper engineering of the TRANS and CIS complementary sequences, whose functions are described in the T-ReX device.
+<br><br>
+<font size="4"><b>Method</b></font>
+<br>
+The BASER algorithm builds a 50 nucleotide-long sequence (start sequence), assembled by linking 10 blocks of 5 nucleotides each, randomly extracted from a basket file (BF), stored in a basket directory. The BF file can be either uniform (containing all the possible combinations of 2, 3 or 4 distinct nucleotides in the 5 available places) or non-uniform (containing each block a number of times that is inversely proportional to the appearance of that same block in the genomic DNA of <i>E. coli</i>).
+<br><br>
+After having generated a 50b-long sequence, BASER performs the Conformity test to check that the sequence does not contain: a) more than 5 adjacent repeats of the same nucleotide (to avoid transcription errors); b) restriction sites; c) RBS sequences. If one among these conditions occurs, a new sequence is generated until the Conformity test is passed. Thereafter the RBS sequence, chosen by the user, is linked downstream of its 3’ end to obtain what is called the “current” sequence. BASER thus calculates a score for the “current” sequence, derived from a combination of:
+<br>
+a) the self score: proportional to the minimum free energy of the corresponding RNA secondary structure <font color=#FF0000>[1,2]</font>; b) the genomic score: the number of times that the sequence appears in the coding DNA with at least '''m''' adjacent nucleotides out of a total of '''n''' corresponding nucleotides; c) the shifted score: proportional to the best suboptimal pairing of the “current” sequence and its Watson and Crick complementary strand.
 <br>
+After score computation, five adjacent nucleotides in the “current” sequence are substituted with a randomly-picked block from the BF, originating a new sequence. The score of this new sequence is calculated and, if lower than the previous one, the new sequence will be considered as the “current” one in the next iteration (otherwise the previous one is maintained as the “current”). The algorithm tries to modify the “current” sequence until the number of total iterations N (N chosen by the user) has been reached. However, if the same sequence persists for more than K iterations (K<=N, K chosen by the user) without any improvement of its best score, this sequence is considered as candidate, so that BASER stops to go on, up to the number of total scheduled iterations. This same sequence, in its opposite 5’ to 3’ orientation will be the start sequence for a subsequent research by BASER. Candidate sequences are usually reached in less than 500 iterations (Fig. 1). All of them are reported at the end of the elaboration.