Team:Berkeley Software/Eugene Implementation
From 2009.igem.org
Contents:
|
Implementation
Header File Creation
Header files give the language the functionality to access many already predefined Parts in the databases. For the purpose of convenient data exchange over the Internet, XML could be used to read information from a database. Then the data is converted into Eugene syntax to represent the header files. As a result the language definitions are not just abstract statements but are tied to existing designs. There are three main header files: PropertyDefintion.h, PartDefiniton.h and PartDeclaration.h shown in Figure 1.
Eugene Main File
The main .eug file can include the header files, which need to be specified at the top:
include PropertyDefintion.h, PartDefinition.h, PartDeclaration.h;
The main file will generally consist of custom Part definitions/declarations, device constructs, rule implementations and control statements.
ANTLR
[http://www.antlr.org ANTLR] is a LL(*) recursive-descent parser generator that accepts lexer, parser, and tree grammars. It is used as the parser generator for Eugene code, since [http://www.antlr.org ANTLR] allows the reuse of grammar with different semantic actions and the creation of parsers in another language, which can be useful in implementing Eugene with other tools. After some preprocessing of the header files a data structure is created, which can be applied either directly to Spectacles, or to other visual tools after conversion to XML from our internal Data Structure.
Data Structure
The data structure consists of four main classes, which directly relate to the Eugene syntax. Each instance of these classes is referenced by the user-defined name from the Eugene files and stored in global hash maps according to the class type.
Classes
- The Primitive class acts as a container for numbers, text, lists and boolean values.
- The Part class stores the instance of a Part and its Property values as a Hash map of Property labels referencing Primitive data types. Each Part instance will point towards the Part definition it came from through its type field. An image path can be bound to a Part where Part instances can have different images if the user specifies accordingly.
- The Device class stores the instance of a Device and the names of the ordered list of components. An image path can also be associated with a specific instance.
- The Rule class stores the instance of a Rule definition, where the rule statement is broken into three components: the left and right operand and the operator.
Figure 1: Class Organization for Eugene
Global Data Structure
The data structure is divided between the storage of Part and Property definitions and the actual instances corresponding to their classes for efficient and immediate access to the data. Every instance is referenced by name and stored in a hash map according to the class it belongs. Parts, Devices, rules and primitives are kept in separate hash maps.
The hash map propertyDefinitions stores the defined Property labels and their type. For instance the sequence property will be of type txt as shown in Figure 2. The hash map partDefinitions stores the defined Parts and the property labels as well as any image associated with the specific Part. For example the Part Promoter will have the properties ID, Sequence and Orientation.
Figure 2: Populating Global Data Structures with Property and Part Definitions
The hash map partDeclarations contains the declared Part instances. Each instance contains a list of Property values. For example, the Part instance BBa_I0500 is of type Promoter having the properties ID, Sequence and Orientation where Sequence is of type “txt” and has “GATCTtta…” as its value as shown in Figure 3. Similarly, deviceDeclarations, ruleDeclarations and primitiveDeclarations store the instance names referring to class instances of Device, Rule and Primitive respectively.
Figure 3: Populating Global Data Structures with Primitive, Part, and Device Declarations
The hash maps ruleAsssertions and ruleNotes in Figure 4 store the assert and note statements as keys which point towards lists containing the individual elements of the statement in reverse Polish notation. Postfix notation is used to help evaluate the truth-value of each assert or note statement. The statements have a global scope. Therefore, every time a Device is created the program goes through each list and applies these statements to the Device.
Figure 4: Populating Global Data Structures with Rule Declarations, Assertions, and Notes