Team:Berkeley Software/Eugene
From 2009.igem.org
- Eugene
- Spectacles
- Kepler
- Data Model
Eugene
Introduction
With the rise of partification in synthetic biology, there needs to be a formal specification to describe standard biological parts, especially when designing complex devices. The specification needs to be both human-writable and human-readable, a language that raises the level of abstraction where bioengineers can work. Eugene is such a language. Engineering at the part level requires both flexibility and rigidity. Eugene allows the user to mix custom parts with predefined parts from established databases. Parts can encapsulate an arbitrary amount of information, anything from a DNA sequence to experimental history. When designing a device in Eugene, parts can be freely stitched together on a whim, or strictly joined together based on rules. The design process is meant to be systematic yet intuitive. The user considers what information parts include, constructs parts to be used in the design, enforces restrictions on how parts can interact, and creates devices that are composites of the parts or other devices. Being a textual design, a device specified in Eugene is portable and easily lends itself to being translated into other formats, such as XML. In synthetic biology, the notion of part changes regularly and is debated tirelessly. Thus, Eugene tries to be adaptable and expressive in any climate. This page goes into the language specification in detail.
Language Definition
In this section we describe the elements in the language. These involve: primitive data types, properties, parts, devices, rules, and conditional execution. The relationships between these language elements are shown in Figure 2. Here you can see that each subsequent category is built upon the previous category.
Primitives
The language supports five predefined primitives. These are txt, num, boolean, txt[], and num[]. Strings (sequences of characters) are represented through the data type “txt”, where the actual text is specified in double quotes. Real numbers and integers are supported by the data type “num” and logical values by the data type “boolean”. Ordered lists of num and txt values can be created and individual members inside a list accessed by specifying an integer in the range from 0 to |List| - 1.
Examples (1) and (2) are two real code snippets of how primitives can be specified in Eugene. “listOfSequences” is simply a list of 3 arbitrary DNA sequences. “specificSequence” is the last element of “listOfSequences” (i.e. “ATCG”). Examples (3) and (4) show how the data type “num” can support integers and decimals.
txt[] listOfSequences = [ “ATG”, “TCG”, “ATCG”]; | (1) | |
txt specificSequence = listOfSequences[2]; | (2) | |
num[] listOfNumbers = [ 2.5, 10, 3.4, 6]; | (3) | |
num one = listOfNumbers[0]; | (4) |
Properties
Properties represent characteristics of interest and are defined by primitives and associated with Parts. For example a user could define a property “Sequence” (the DNA sequence), ID (the uuid for a relational database which may hold the part), or Orientation (e.g. a forward or backward promoter). Examples 5-8 show how such properties would be defined. Property definitions must be defined by the five primitive types. In Part definitions properties will be bound to that Part as placeholders for the instantiation of values in Part declarations. Properties have to be defined before Parts can use them. The user can create new Property labels or use those created by other users and captured in “header files” . For example, the following Properties are predefined in the header file PropertyDefinition.h and do not need to be defined again if the header file is included in the main program:
Property ID(txt); | // in header file | (5) | ||
Property Sequence(txt); | // in header file | (6) | ||
Property Orientation(txt); | // in header file | (7) | ||
Property RelativeStrength(num); | // in header file | (8) |
Parts
The data type Part represents a standard biological Part. A Part can be defined empty initially and then property labels can be added through the function addProperties() or properties can be bound to a Part during the definition.
Part Definition
Part definitions do not construct any Parts, but rather specify which Parts can be constructed. This can be done in the header file or in the main program. When the header file PartDefintion.h and PropertyDefintion.h are included, the following Parts and their corresponding property labels are predefined. For instance, the Part “Promoter” will have three properties associated with it and all instances of Promoter will inherit ID, Sequence and Orientation:
Part Promoter(ID, Sequence, Orientation); | (9) | |
Part ORF(ID, Sequence, Orientation); | (10) | |
Part RBS(ID, Sequence, Orientation); | (11) | |
Part Terminator(ID, Sequence, Orientation); | (12) | |
Part RestrictionSite(ID, Sequence, Orientation); | (13) | |
Part PrimerSite(ID, Sequence Orientation); | (14) |
If the properties are unknown during Part Definition process, the Part can be defined either empty or with the known properties. Later property labels can be added through the function addProperties() provided the property labels have been created beforehand. RBS will have four property labels after the following statement:
RBS.addProperties(RelativeStrength); | (15) |
Part Declaration
Part declarations make instances of predefined Parts and assign values to their properties. If the declaration specifies a list of values, it is assumed that every property will be assigned a value, where the order of the values corresponds to the order of the properties in the Part Definition as shown in example (17). Otherwise, a “dot notation“ followed by the name of the property can be employed, where the order becomes irrelevant as specified in the example below (16). The Part instance BBa_K112234_rbs has three properties associated with the Part RBS. These are ID, Sequence and Orientation. The identification label of a particular part from a database is stored in the ID placeholder to allow future access to the database. Sequence stores the DNA of a Part, while Orientation specifies the direction of the Part. Since dot notation is used, the ID value instantiation can be left out from the statement. Part declarations can be found in the header file PartDeclarations.h and are predefined if the header files are included in the main program.
RBS BBa_K112234_rbs (.Sequence("GATCTtaattgcggagacttt"), .Orientation("Forward")); | (16) | RBS BBa_K112234_rbs (“BBa_K112234_rbs”, "GATCTtaattgcggagacttt", “Forward”); | (17) |
Devices
Devices represent a composite of standard biological Parts and/or other Devices. In a Device declaration, the same Part and/or device can be used more than once. Property values of devices can be accessed with the dot operator; however, the value is the union of the property values of its members returned as a list. If the property is a txt or num, a txt[] or a num[] is returned. If the property is a txt[] or a num[], a txt[] or a num[] is also returned that consists of the lists appended together. For example the sequence of Device BBa_K112133 is the ordered union of the sequence of Part BBa_K112126 and the Device BBa_K112234. These two Devices are shown in Figures 3a and 3b, where the icon figures use true [http://openwetware.org/wiki/Endy:Notebook/BioBrick_Open_Graphical_Language Visual BioBrick Open Language symbols (vBOL)] icon graphic.
Figure 3a: Device BBa_K112133, consisting of one Part Promoter BBa_K11212 and one Device BBa_K112234 |
Device BBa_K112234(BBa_K112234_rbs, BBa_K112234_orf); |