Team:Berkeley Software/Eugene

From 2009.igem.org

Revision as of 20:28, 19 October 2009 by Adam z liu (Talk | contribs)


Eugene

Introduction

With the rise of partification in synthetic biology, there needs to be a formal specification to describe standard biological parts, especially when designing complex devices. The specification needs to be both human-writable and human-readable, a language that raises the level of abstraction where bioengineers can work. Eugene is such a language. Engineering at the part level requires both flexibility and rigidity. Eugene allows the user to mix custom parts with predefined parts from established databases. Parts can encapsulate an arbitrary amount of information, anything from a DNA sequence to experimental history. When designing a device in Eugene, parts can be freely stitched together on a whim, or strictly joined together based on rules. The design process is meant to be systematic yet intuitive. The user considers what information parts include, constructs parts to be used in the design, enforces restrictions on how parts can interact, and creates devices that are composites of the parts or other devices. Being a textual design, a device specified in Eugene is portable and easily lends itself to being translated into other formats, such as XML. In synthetic biology, the notion of part changes regularly and is debated tirelessly. Thus, Eugene tries to be adaptable and expressive in any climate. This page goes into the language specification in detail.

Content:
  • Language Definition
  • Examples
  • Implementation
  • Results
  • Conclusions


  • Language Definition

    In this section we describe the elements in the language. These involve: primitive data types, properties, parts, devices, rules, and conditional execution. The relationships between these language elements are shown in Figure 2. Here you can see that each subsequent category is built upon the previous category.


    Figure 2: Relationship between Eugene Categories

    Primitives
    The language supports five predefined primitives. These are txt, num, boolean, txt[], and num[]. Strings (sequences of characters) are represented through the data type “txt”, where the actual text is specified in double quotes. Real numbers and integers are supported by the data type “num” and logical values by the data type “boolean”. Ordered lists of num and txt values can be created and individual members inside a list accessed by specifying an integer in the range from 0 to |List| - 1.

    Examples (1) and (2) are two real code snippets of how primitives can be specified in Eugene. “listOfSequences” is simply a list of 3 arbitrary DNA sequences. “specificSequence” is the last element of “listOfSequences” (i.e. “ATCG”). Examples (3) and (4) show how the data type “num” can support integers and decimals.

    txt[] listOfSequences = [ “ATG”, “TCG”, “ATCG”];                      (1)
    txt specificSequence = listOfSequences[2];       (2)
    num[] listOfNumbers = [ 2.5, 10, 3.4, 6];       (3)
    num one = listOfNumbers[0];       (4)


    back to Content

    Properties
    Properties represent characteristics of interest and are defined by primitives and associated with Parts. For example a user could define a property “Sequence” (the DNA sequence), ID (the uuid for a relational database which may hold the part), or Orientation (e.g. a forward or backward promoter). Examples 5-8 show how such properties would be defined. Property definitions must be defined by the five primitive types. In Part definitions properties will be bound to that Part as placeholders for the instantiation of values in Part declarations. Properties have to be defined before Parts can use them. The user can create new Property labels or use those created by other users and captured in header files. For example, the following Properties are predefined in the header file PropertyDefinition.h and do not need to be defined again if the header file is included in the main program:

    Property ID(txt);                      // in header file                      (5)
    Property Sequence(txt);                      // in header file                      (6)
    Property Orientation(txt);                      // in header file                      (7)
    Property RelativeStrength(num);                      // in header file                      (8)


    back to Content

    Parts
    The data type Part represents a standard biological Part. A Part can be defined empty initially and then property labels can be added through the function addProperties() or properties can be bound to a Part during the definition.

    Part Definition
    Part definitions do not construct any Parts, but rather specify which Parts can be constructed. This can be done in the header file or in the main program. When the header file PartDefintion.h and PropertyDefintion.h are included, the following Parts and their corresponding property labels are predefined. For instance, the Part “Promoter” will have three properties associated with it and all instances of Promoter will inherit ID, Sequence and Orientation:

    Part Promoter(ID, Sequence, Orientation);                                          (9)
    Part ORF(ID, Sequence, Orientation);                      (10)
    Part RBS(ID, Sequence, Orientation);                      (11)
    Part Terminator(ID, Sequence, Orientation);                      (12)
    Part RestrictionSite(ID, Sequence, Orientation);                      (13)
    Part PrimerSite(ID, Sequence Orientation);                      (14)


    If the properties are unknown during Part Definition process, the Part can be defined either empty or with the known properties. Later property labels can be added through the function addProperties() provided the property labels have been created beforehand. RBS will have four property labels after the following statement:

    RBS.addProperties(RelativeStrength);                                                         (15)

    Part Declaration
    Part declarations make instances of predefined Parts and assign values to their properties. If the declaration specifies a list of values, it is assumed that every property will be assigned a value, where the order of the values corresponds to the order of the properties in the Part Definition as shown in example (17). Otherwise, a “dot notation“ followed by the name of the property can be employed, where the order becomes irrelevant as specified in the example below (16). The Part instance BBa_K112234_rbs has three properties associated with the Part RBS. These are ID, Sequence and Orientation. The identification label of a particular part from a database is stored in the ID placeholder to allow future access to the database. Sequence stores the DNA of a Part, while Orientation specifies the direction of the Part. Since dot notation is used, the ID value instantiation can be left out from the statement. Part declarations can be found in the header file PartDeclarations.h and are predefined if the header files are included in the main program.

    RBS BBa_K112234_rbs (.Sequence("GATCTtaattgcggagacttt"), .Orientation("Forward"));              (16)
    RBS BBa_K112234_rbs (“BBa_K112234_rbs”, "GATCTtaattgcggagacttt", “Forward”);              (17)


    back to Content

    Devices
    Devices represent a composite of standard biological Parts and/or other Devices. In a Device declaration, the same Part and/or device can be used more than once. Property values of devices can be accessed with the dot operator; however, the value is the union of the property values of its members returned as a list. If the property is a txt or num, a txt[] or a num[] is returned. If the property is a txt[] or a num[], a txt[] or a num[] is also returned that consists of the lists appended together. For example the sequence of Device BBa_K112133 is the ordered union of the sequence of Part BBa_K112126 and the Device BBa_K112234. These two Devices are shown in Figures 3a and 3b, where the icon figures use true [http://openwetware.org/wiki/Endy:Notebook/BioBrick_Open_Graphical_Language Visual BioBrick Open Language symbols (vBOL)] icon graphic.

    Table 1: Relationship between vBOL and Eugene

    vBOL Description Eugene
    Spectacles screenshot BBa K112133.jpg
    Figure 3a: [http://partsregistry.org/Part:BBa_K112133 Device BBa_K112133], consisting of one
    [http://partsregistry.org/Part:BBa_K112126 Part Promoter BBa_K112126] and one [http://partsregistry.org/Part:BBa_K112234 Device BBa_K112234]
    Device BBa_K112133(BBa_K112126, BBa_K112234);
    Spectacles screenshot BBa K112234.jpg
    Figure 3b: [http://partsregistry.org/Part:BBa_K112234 Device BBa_K112234], consisting of one
    Part Ribosome Binding Site and one Part Open Reading Frame
    Device BBa_K112234(BBa_K112234_rbs, BBa_K112234_orf);

    Individual Parts can be accessed through the use of square brackets and an index. The first member is indexed at zero. Square brackets can be stacked in the case of devices within devices. To access the first element BBa_K112234_rbs of Device BBa_K112234 through Device BBa_K112133, the following notation is supported:

    BBa_K112133[1][0]              // references BBa_K112234_rbs              (18)


    back to Content

    Rules
    The specification of rules provides the ability to validate Device declarations. Rule declarations in themselves do not perform the validation. They have to be “noted”, “asserted” or used as expressions inside an if-statement to give meaning. Rule declarations are single statements consisting of a left and right operand and one rule operator. The rule operators BEFORE, AFTER, WITH, NOTWITH, NEXTTO, NOTCONTAINS, NOTMORETHAN can be applied to Part instances or Device instances. Property values of Part/Device instances or primitives in relation with one Part/Device can be operators in rule declarations when using the relational operators <, <=, >, >=, !=, ==. These operators are overloaded when evaluating text and the text is compared according to alphabetical meaning. Table 2 provides a summary of the operators for Eugene rules.<tr><tr>

    Table 2: Eugene Operators for Specifying Rules

    </tr>

    Compositional Operators
    BEFORE operand 1 appears before operand 2 on devices
    AFTER operand 1 appears after operand 2 on devices
    WITH operand 1 appears with operand 2 on devices
    NOTWITH operand 1 does not appear with operand 2 on devices
    NEXTTO operand 1 is adjacent to operand 2 on devices
    NOTMORETHAN operand 1 (a part instance) occurs not more than operand 2 times in a device
    NOTCONTAINS unary operator, where operand 2 is not contained in device
    Comparison Operators
    < less than for numbers, comes before alphabetically for text
    <= less than or equal to for numbers, comes before alphabetically or is equal to for text
    > greater than for numbers, comes after alphabetically for text
    >= greater than or equal for numbers, comes after alphabetically or is equal to for text
    != not equal to
    == equal to
    Boolean Operators
    AND operand 1 AND operand 2
    OR operand 1 OR operand 2
    NOT NOT operand


    Table 3: Examples of Rule Declarations

    Eugene Syntax Description
    Rule r1(BBa_K112234_rbs BEFORE BBa_K11223_orf); Illustrates a rule where all Parts BBa_K112234_rbs have to come before all Parts BBa_K11223_orf
    Rule r2(BBa_K112234_rbs WITH BBa_K112234_orf); Illustrates a rule where the Part BBa_K112234_rbs has to be contained together with BBa_K112234_orf inside a Device
    Rule r3(BBa_K112126 NEXTTO BBa_K112234); Illustrates a rule where the Part BBa_K112126 has to be next to BBa_K112234 when a Device is declared
    num x = 2;
    Rule r4(BBa_K112234_rbs NOTMORETHAN x);
    Illustrates a rule where the Part BBa_K112234_rbs cannot occur more than x (=2) times in a Devie
    Rule r5(NOTCONTAINS BBa_B0032); Illustrates a rule where a Device cannot contain the Part BBa_B0032
    Rule r6(BBa_K112234_rbs.Sequence != BBa_K112234_orf.Sequence); Illustrates a rule that checks whether the sequence of BBa_K112234_rbs is equivalent to the sequence of BBa_K112234_orf
    Rule r7(BBa_K112234_rbs.RelativeStrength > BBa_B0032.RelativeStrength); Illustrates the comparison of Property values of Parts, where the “RelativeStrength” Property value for Part BBa_K112234_rbs has to be greater than the “RelativeStrength” Property value for Part BBa_B0032
    num relativeS = BBa_B0032.RelativeStrength;
    Rule r8(p.RelativeStrength > relativeS);
    Shows a similar comparison but uses the variable “relativeS” for comparison


    back to Content

    Asserting and Noting Rules
    In order to take effect, rules need to be “asserted” or “noted”, once they are declared. The scopes of all assert or note statements encompass every new Device. Every time a new Device is declared and provided “Assertions” and “Note” statements exist, the validation process is performed on the newly created Device. Rule instances can be combined with each other through the use of the logical operators AND, OR, NOT in the statements. The difference between rule assertions and rule notes lies in the strength of the consequence once a violation is found. If no violation is found the program continues running.

    Rule Assertion
    These statements are strong assertions and the program terminates with an error once a Device composition violates the statement. The following statement will check if BBa_K112234_rbs is not contained together with BBa_K112234_orf in the Device and their sequences should not be equal. In this case an error will terminate the program since both parts are components of the device, therefore violating the Assert statement.

    Assert ((NOT r4) AND (NOT r2));
    Device BBa_K112234(BBa_K112234_rbs, BBa_K112234_orf);
    



    Rule Notes
    Notes issue warnings in the output when the violation occurs. But the program continues running. In the following example the Device BBa_K112133 meets the first note’s condition. However, the next note is violated and the program will issue a warning.

    Note (r2 AND r3);
    Note (NOT r1);
    Device BBa_K112133(BBa_K112126, BBa_K112234);
    



    back to Content

    Conditional Statements
    The use of conditional statements breaks up the flow of execution and allows certain blocks of code to be executed. Eugene supports two kinds of if-statements to achieve this: Rule validating if-statement and standard if-statement. The three logical operators AND, OR, NOT can combine statements of each type but not together.

    Rule validating if-Statement
    Rules can be checked not just through Assert and Note statements but also in an if-statement. In this approach only specific rules will be considered, as they might not apply to all Devices. The notation should specify a list of Devices and a logical combination of rule instances pertaining to that list. Suppose we would like to test a rule only on the specific Device instance BBa_K112133, where the Promoter BBa_K112126 comes before the Ribosome Binding Site BBa_K112234_rbs. Then the following conditional statement can achieve such conditional evaluation. In this case, the if statement will evaluate to true:

    Rule r7(BBa_K112126 BEFORE BBa_K112234_rbs);
    if(on (BBa_K112133) r7) {
    	Block statement, in case of true evaluation
    } else {
    	Block statement, in case of false evaluation
    }
    


    Standard if-Statement
    Expressions not pertaining to rules and Devices can be evaluated by the standard if-statement which supports the relational operators <, <=, >, >=, !=, == as well as the logical operators AND, OR, NOT.

    boolean test = true;
    if(test) {
    	Assert(ruleWith);
    } else {
    	Assert(NOT ruleWith);
    }
    Device BBa_K112133(BBa_K112126, BBa_K112234);
    
    


    back to Content

    Header Files
    The inclusion of header files allows the use of predefined Properties, Parts and Part Instances in the program. The manageability of code in the main file is more efficient by hiding the low level implementation of sequence and Parts. The user needs only to define Devices in the main file. On such a level the program can be written quickly and it is less error prone. Also, this allows each lab to have their own header file libraries. At the same time the option to change or declare other Properties, Parts and Part instances exists in the language.

    back to Content


    Examples

    Several Device constructs have been selected from the Registry of [http://partsregistry.org/ Standard Biological Parts] to show the creation of visual and textual designs. The visual representation has been implemented using [http://openwetware.org/wiki/Endy:Notebook/BioBrick_Open_Graphical_Language Visual BioBrick Open Language symbols] and Spectacles, a visual tool editor, while Eugene was employed to demonstrate a textual representation. Due to the one to one relationship between standard biological parts and the Eugene syntax, the conversion could easily be achieved in both directions.

    [http://partsregistry.org/Part:BBa_K112809 BBa_K112809] is T4 Lysis Device with Pbad as the inducible Promoter. The Lysis device allows for the easy release of any product produced in the cell. The main file in Figure 3b displays the construct as an ordered list of Parts, while the header files are imported to hide the details of the Part declarations. With only one line of declaration, a complicated Device construct can be reproduced while the detailed information can still be accessed.

    Figure 3a: Visual representation of BBa_K112809

    Spectacles screenshot BBa K112809.jpg

    Figure 3b: Textual representation of BBa_K112809

    BBa K112809Picture.png


    [http://partsregistry.org/Part:BBa_E7104 BBa_E7104] is a GFP Reporter Device. Again, each new Device can be created using very few lines of code as Figure 4 demonstrates.

    Figure 4a: Visual representation of BBa_E7104

    Spectacles screenshot BBa E7104.jpg

    Figure 4b: Textual representation of BBa_E7104

    BBa E7104Picture.png


    [http://partsregistry.org/Part:BBa_K118021 BBa_K118021] in Figure 5a is a Promoter characterization Device. The second Part in [http://partsregistry.org/Part:BBa_K118021 BBa_K118021] is a Reporter as specified by the [http://partsregistry.org/ Standard Registry of Parts]. It has no image associated, since technically speaking [http://partsregistry.org/Part:BBa_J33204 BBa_J33204] is a Device. However, currently with the available standards and specifications, there does not exist a way of breaking this Device cleanly into separate existing Parts. For example, there needs to be a way of specifying point mutation, which could be a feature of Devices, currently not supported by Eugene but is considered for future releases. Therefore, [http://partsregistry.org/Part:BBa_K118021 BBa_K118021] was specified as the Part Reporter rather than Device in the code.

    Figure 5a: Visual representation of BBa_K118021

    Spectacles screenshot BBa K118021.jpg

    Figure 5b: Textual representation of BBa_K118021

    BBa K118021Picture.png


    [http://partsregistry.org/Part:BBa_I8510 BBa_I8510] in Figure 6a is an Inverter that takes 3OC6HSL as input and produces lacZalpha as output. It also creates an orthogonal GFP protein generator.

    Figure 6a: Visual representation of BBa_I8510

    Spectacles screenshot BBa I8510.jpg

    Figure 6b: Textual representation of BBa_I8510

    BBa I8510Picture.png


    back to Content