Team:Berkeley Software/LesiaNotebook/Notes

From 2009.igem.org

Week of June 8, 2009

Monday was the first day of the SUPERB program. I got introduced to the IGEM team. Doug gave a tutorial on Clotho and on plug ins. Got the Clotho source code from the repository.

Tuesday we had a meeting. I will be working with Adam on the language project. BOL will be a language which is based on the abstract model of parts which form the lowest level of hierarchy and will be the basic data types in this language. Parts can be composed of other parts, but in themselves cannot be decomposed any further into lower level data types. Devices will be at the second level of hierarchy and can consist of multiple devices as well as parts. It will be different from other languages like Antimony, [http://sbml.org/Main_Page SBML], or [http://bionetgen.org/index.php/Main_Page BNGL], as BOL will be a structural human readable language not geared towards modeling. BOL has a direct relationship to [http://openwetware.org/wiki/Endy:Notebook/BioBrick_Open_Graphical_Language BOGL] symbols, the graphical representation and can be used to convert textual designs to graphical representations and vice versa.

Wednesday brainstormed with Adam on the syntax of BOL. It will have defined data structures based on the [http://openwetware.org/wiki/Endy:Notebook/BioBrick_Open_Graphical_Language BOGL] symbols like promoter, RFS. Supportive data types will be string, int, composite. Also, an important feature that we would like BOL to have is the ability to create more user defined data types. I created a preliminary data type table on properties, data types, operators. Also started familiarizing myself with [http://www.gnu.org/software/bison/ GNU Bison], which we will use as our parser for the language.

Thursday continued to work on syntax and to write the code for an available diagram.

Diagram that uses BOGL symbols

Researched more on other languages. It seems that BOL will indeed be different from previous languages. The rule based description has been implemented in [http://bionetgen.org/index.php/Main_Page BNGL], which is a modeling language and concentrates on species, molecules rather than standard parts. We still have to go over the syntax for rules and how we will implement them in BOL. The logical "AND", "OR" and complement operators, as well as <, > seem appropriate for right now.

Friday created Plugin help file in Clotho. Created plugin xml file generator, which will automatically create a xml file to connect the Plug-in once the information is filled in. Also displays the xml file in the window.

Creates XML file for plugins

Still need to work on the five interfaces and their structure.

Week of June 15, 2009

Monday had the presentation on Plug-Ins. Got feedback from group and Professor Anderson on the Plug-In xml file generator:

Have mouse overs on each field that needs to be filled
Make option to save the xml file under user specified name
Make restrictions on choice for the interface and package name, since they have to correspond to each other

Started familiarizing myself with [http://www.antlr.org/ ANTLR], a tool for constructing compilers and interpreters from grammar rules. Each grammar rule checks the syntax of the program. One can specify actions for each rule which will be responsible for the semantic context of the program. The program consists of parser rules and lexer rules, where ANTLR constructs the Parser and Lexer files, the Test file and a Test input file after debugging. So far ANTLR seems more user friendly than GNU Bison, but this can be due to the fact that there are more tutorials available.

Tuesday had a meeting with Doug and Adam on the language development:

Need to specify probabilities with each rule somehow, maybe each rule can have a probability property and when enforcing rules this can be taken into consideration
Need to provide functions, for example could have:
- isDownstream()
- isUpstream()
- Translate()
- ReverseComplement()
Need to include input/output capabilities
Control and iteration statements
Scope of the variables for right now will be global

Started writing the Context Free Grammar and thinking about the structure of the intermediate language, that is how are we going to implement the semantics. Right now have two classes (they are more like structs):

Property which has type, variable name and the different fields that will store the values of the property. Type just means if the value of the property that we are going to store is text, integer or list. Variable name refers to what property we are talking about, eg Sequence, ID, etc; it should help later accessing each property.
Part stores an ArrayList of properties, the part type, eg Promoter, RBS, etc and a HashMap of objects that each Part will have. For example:

Part Promoter(Sequence, BioBrickID);

Promoter p1, p2, p3;

The instance Promoter will store part type: Promoter, the properties that Promoter can have, which are Sequence and BioBrickID, and objects stores the instances p1, p2, p3, where each instance will have its own list of property values.

Wednesday continued working on the grammar, tried to generate the parser and lexer code and ran into lots of errors. The error message that ANTLR gives seem not specific enough, so decided to debug my code by creating a new grammar and copying slowly part by part. The problem turn out to be when one declares a grammar rule, it has to be in such specific way that left recursion will not occur, for example:

expr: declareObj expr | instantObje expr | instantObje | declareObj;

would cause left recursion and errors. One way to solve that problem, is follwing:

program: expr*;

expr: declarObj | instantObje;

The sign * means that multiple statements of declaring Objects and instantiating them will be accepted by the compiler

Thursday met with Professor Hilfinger to get some feedback on the language design, possible useful considerations:

Creation of modules -> certain areas of code that do the same thing and only the parameters differ can be included in modules
Ability to import modules into larger structures
Rule scope -> need to know to which part rules apply
Conceptual language behind the syntax, what structures, data types should we use?
Decide on the interpreter/parser, right now it seems we are going to stick with ANTLR

Friday discussed with Doug Plug-In Model:

new Clotho/Plug-In Model

Right now we have five interfaces. In order to create new Plug-Ins one needs to extend one of those interfaces. It would be easier to have one main interface which contains the methods and an abstract class that implements the interface. Therefore if we want to create new Plug-Ins the class just needs to extend the abstract class and override the needed method.

Continued to work on the grammar and the semantic actions. Added print functionality. The following statements work so far: PROGRAM

Property someprop(txt);

Property someprop2(txt);

Property RelativeStrength(num);

Part customP(Sequence, BioBrickID);

customP.addProperties(RelativeStrength);

Promoter p(GCTA, BBa_435);

Week of June 22, 2009

Decided with Adam and Doug on the data structures to store and retrieve information for the compiler, see Adam's Notebook [1]. Continued to work on Eugene compiler. Here is some sample syntax and output that works: Eugene Test Different Inputs

Week of June 29, 2009

New link for documentation, source code and jar file for [http://eugene.wiki.sourceforge.net/About+Eugene Eugene]

Wednesday & Thursday worked on

added entries to the wiki at source forge
adding more functionality to print statements
adding more functionality in declarations of primitives
found some bugs and fixed them, like we didn't' check all hash tables of instances if an object has already been defined for components or devices
reorganized some of the code and grammar rules
enabled access of individual elements in arrays and multidimensional arrays
enabled declarations of rules

most important event on Thursday -> release of Eugene 0.01

To Do:

still need to implement Assert

Friday worked on Assert statement

created grammar rules to recognize all kinds of Assert Statements
Assert Statements are stored in an assertList, which consists of the names of the rules to be asserted
- the assertList stores the expressions in postfix notation so as to observe order of precedence
- every time somebody creates a new Assert statement it overwrites over the previous one, so as not to have confusion on which assert is implemented
- every time a device is created and an Assert statement exists, the method AssertRule is called:

AssertRule algorithm:

if assertList is not empty:

    for every member in assertList:
    start for
         if(member is an operand)
         start if
            push to stack
         else
            pop from stack
            evaluate result (another algorithm)
            push back result
        end if
    end for

current restrictions
- Rule declaration cannot handle currently comparison of properties
- only one current assert can be implemented, need to know scope of it
- not clear currently what to do with statements like p1 BEFORE p2 and either one of them or both are not in the considered device. If neither are in this device, should one return true, or if only one is contained should one return false or true??? Again it depends on the biological context.
- after the end of evaluation if overall statement result is false, compiler will issue warning statement: "Warning, the current Assert statement has been violated"

To Do:

actually run the Assert code and debug (am sure there will be some kind of bug!!)
add checking to grammar, if rules have been declared, if components or device instances inside the rule declarations are valid and have been declared
think of better messages if some rule has been violated but the overall result of the Assert statement is true?????
add to Rule declaration ability to compare properties using the operators ==, != , <=, >=, < , >
add this functionality to AssertRule method or maybe write a similar method for comparing properties

Week of July 6, 2009

In short, what I did this week

restructured and cleaned up some code to make it more modular, and corrected some minor bugs
included action code into a method to make it easier to use
improved rules, so that can compare primitives and not just components, devices
included if statement which evaluates a rule expression
created boolean primitive
changed declaration of devices to Cesar's suggestion:

Device BBa_K106019
(
  PrimerSite BBa_X1();
  Promoter BBa_X2();
  ORF BBa_X3();
  ORF BBa_X5();
  Terminator BBa_X6();
  Terminator BBa_X7();
  PrimerSite BBa_X8();
);

Still To Do

to make include <filenames> in the .eug file:
- need to write a method which looks if include statements exist
- extract the filenames and concatenate those files with the user defined .eug file to make on full file to be passed to the compiler

Week of July 13, 2009

continued to debug code and refactor
expanded capability of if statement to evaluate regular expressions
enabled include files as headers

Diagram from Trip to Stanford