Team:Berkeley Software/Eugene Implementation

From 2009.igem.org

Implementation

Figure 1: Eugene Flow Diagram

Header File Creation
Header files give the language the functionality to access many already predefined Parts in the databases. For the purpose of convenient data exchange over the Internet, XML could be used to read information from a database. Then the data is converted into Eugene syntax to represent the header files. As a result the language definitions are not just abstract statements but are tied to existing designs. There are three main header files: PropertyDefintion.h, PartDefiniton.h and PartDeclaration.h shown in Figure 1.

Back Up

Eugene Main File
This is the file that will specify the design expressed in Eugene syntax. The main .eug file can include the header files, which need to be specified at the top:

include PropertyDefintion.h, PartDefinition.h, PartDeclaration.h;

The main file will generally consist of custom Part definitions/declarations, device constructs, rule implementations, control statements, and function calls.

Back Up

Eugene Compiler
The Eugene Compiler was created using ANTLR, which is a LL(*) recursive-descent parser generator that accepts lexer, parser, and tree grammars. ANTLR is used as the parser generator for Eugene code, since it allows the reuse of grammar with different semantic actions and the creation of parsers in another language, which can be useful in implementing Eugene with other tools. The grammar for Eugene eugene.g is taken as input by ANTLR to create a Eugene parser and lexer file. The Eugene parser internally establishes a Backend Data Model to be used further by the Eugene Exporter. At the same time, when inputing a design in Eugene language, a data structure is created after some preprocessing of the header files. This can be applied either directly to Spectacles, or to other visual tools after conversion to XML from our internal Data Structure.

Back Up

Data Structure
The data structure consists of four main classes, which directly relate to the Eugene syntax. Each instance of these classes is referenced by the user-defined name from the Eugene files and stored in global hash maps according to the class type.

Classes

The Primitive class acts as a container for numbers, text, lists and boolean values.
The Part class stores the instance of a Part and its Property values as a Hash map of Property labels referencing Primitive data types. Each Part instance will point towards the Part definition it came from through its type field. An image path can be bound to a Part where Part instances can have different images if the user specifies accordingly.
The Device class stores the instance of a Device and the names of the ordered list of components. An image path can also be associated with a specific instance.
The Rule class stores the instance of a Rule definition, where the rule statement is broken into three components: the left and right operand and the operator.

Figure 2: Class Organization for Eugene

Global Data Structure
The data structure is divided between the storage of Part and Property definitions and the actual instances corresponding to their classes for efficient and immediate access to the data. Every instance is referenced by name and stored in a hash map according to the class it belongs. Parts, Devices, rules and primitives are kept in separate hash maps.

The hash map propertyDefinitions stores the defined Property labels and their type. For instance the sequence property will be of type txt as shown in Figure 3. The hash map partDefinitions stores the defined Parts and the property labels as well as any image associated with the specific Part. For example the Part Promoter will have the properties ID, Sequence and Orientation.

Figure 3: Populating Global Data Structures with Property and Part Definitions

The hash map partDeclarations contains the declared Part instances. Each instance contains a list of Property values. For example, the Part instance BBa_I0500 is of type Promoter having the properties ID, Sequence and Orientation where Sequence is of type “txt” and has “GATCTtta…” as its value as shown in Figure 4. Similarly, deviceDeclarations, ruleDeclarations and primitiveDeclarations store the instance names referring to class instances of Device, Rule and Primitive respectively.

Figure 4: Populating Global Data Structures with Primitive, Part, and Device Declarations

The hash maps ruleAsssertions and ruleNotes in Figure 5 store the assert and note statements as keys which point towards lists containing the individual elements of the statement in reverse Polish notation. Postfix notation is used to help evaluate the truth-value of each assert or note statement. The statements have a global scope. Therefore, every time a Device is created the program goes through each list and applies these statements to the Device.

Figure 5: Populating Global Data Structures with Rule Declarations, Assertions, and Notes

Back Up