Team:Berkeley Software/Data Model

From 2009.igem.org

Revision as of 04:32, 20 October 2009 by Sixpi (Talk | contribs)


Clotho Infrastructure and Reconfigurable Data Model

Contents

Introduction

With the growing number of parts available to synthetic biologists, the amount of data needed to identify and describe these parts grows as well. However, there is a lack of a unified data model to organize this data. As a result, each institution's database of parts looks slightly different from the databases at other institutions, making tool development for the wider community difficult. Clotho Classic had a database connection manager that could connect to many different types of databases. However, this manager was limited and could only connect to databases with structures similar to its internal data model. Clotho builds upon this idea, and now includes a much more robust database connection module that can deal with a larger range of database organizational patterns. This is accomplished by allowing the user to extend the core set of keywords that the tools linked into Clotho may use. This functionality will allow Clotho to be useful during while the standards in the synthetic biology community continue to be developed.



Feature Overview

The new Clotho infrastructure provides three useful services to all Clotho Tools. First, all Clotho Tools implement a ClothoTool interface, which allows them to communicate with other ClothoTools through the Tool API. Second, the Clotho infrastructure provides a robust database connection where database tuples are returned as Java objects (using [http://en.wikipedia.org/wiki/Object_Relational_Mapping Object Relational Mapping]). Finally, Clotho's data model is reconfigurable at runtime, which means that Clotho Tools don't need to be tied down to a specific data model.

Technical Overview

This technical overview assumes some basic familiarity with the Java programming language, as well as come basic understanding of relational databases.

BerkeleySoftwareClothoInfrastructure.png

The new Clotho infrastructure has been greatly simplified since last year. It is now far easier to develop a new tool that can operate within Clotho, and these tools have more powerful API functions at their disposal. The above picture shows the relationships between the three services Clotho provides to tools. Tools can speak to each other through the Clotho Tool Core shown on the right. Tools can also access the Data Core through the Data API to talk to external databases. On the left hand side we see a diagram of how Clotho Keywords are read in at runtime to create an internal data model, and this process is explained in more detail below.

Clotho Keywords

Clotho keywords provide the flexibility in both tool-to-tool interaction and tool-to-database communication. A keyword is a name given to some type of a data synthetic biologists would like to keep track of. We separate keywords into two categories: object keywords and field keywords. An object keyword is a name for a collection of data that describes some entity. For example, biobrick and person could be two object keywords. A field keyword, on the other hand, is a name given to a primitive piece of data. For example, sequence or name are two field keywords. An object keyword will generally have one or more field keywords associated with it, such as sequence for biobrick or name for person.

These keywords and the relationship between them are described in keyword files, which are XML files in a format that Clotho can understand. These files specify what keywords tools may use. Clotho comes with a default core keyword file that all the tools written by the Berkeley iGEM team use, but a developer can easily write their own keyword file to allow their own tools to work with data not described in the core Clotho data model.

BerkeleySoftwareClothoKeyword.png

Here is an sample keyword file. As we can see, the top level XML element is named Objects. This is required for all keyword XML files. The next element, globalFields, is optional. This element is a list of field elements, and is meant for fields that all objects should have so the user does not need to retype the same fields for multiple objects. Each field element has a name element, and an optional description element. If there is a globalFields element, it must be the first element in Objects. After the globalFields element, we have a list of object elements, each object element having a name and a list of field elements.

After we are done inputting keyword files, we open up then Database Connection Manager through the menu of the main Clotho window. We create a new connection by inputting our information, and a mapping display will pop up. This process allows us to associate objects with tables in a given database, and fields to columns of these tables. When we are done mapping keywords, the database connection manager can attempt to look for connections between objects based on database metadata. For example, suppse the biobricks table in the database has a foreign key to the person table, and the biobricks table has been mapped to the biobrick object and the person table has been mapped to the person object. Then the manager will create a connection between the biobrick and person objects.

The mapped object and field keywords are then used by a code generator to create the files used by the Hibernate library. Hibernate provides a way to store and retrieve data from a database as Java objects. It requires a set of Java classes that conform to the [http://en.wikipedia.org/wiki/JavaBean JavaBean] standard, as well as a set of XML mapping files. Our code generator creates these files at runtime, compiles them, and links them into Clotho so that Hibernate can function.

Tool API

The Tool API provides a standardized way for tools to interact, since all Clotho Tools must implement our ClothoTool interface in order to be loaded by the Tool Core. For example, take our sequence view tool. This tool is just a simple sequence editor based on [http://www.biology.utah.edu/jorgensen/wayned/ape/ ApE]. We would like this tool to be able to import sequence data from any other tool easily. Using the Tool API, this operation is very simple - we merely call the getData() method with the argument "sequence", since all Clotho Tools must have a getData() method. So in this example, the sequence view can query the Tool Core for a list of all active tools, display the list, and then import a sequence from a user-selected tool.

We realize that our Tool API doesn't solve all tool-to-tool communication problems - if two tools need to work very closely together, their code will still be more interdependent. However, our Tool API does provide a way to make tools that integrate easily with new tools.

Data API

The Data API provides tools with a simple and robust interface to external data sources. Using the same Clotho keywords as described earlier, tools can ask for data packaged up into Java objects. These objects will have types as specified by mapped object keywords. For example, the parts manager needs to work with biobrick and collections, so it will use Data API to ask for biobrick and collection objects. If these keywords were not mapped during the database connection process, then the parts manager will notify the user of an error. Otherwise, all the data that describes the biobricks and collections the parts manager needs use are returned as easy to use Java objects that can be queried for the data in their fields.

The Data API returns objects from the database as Datum objects. These Datum objects have data stored in their fields, as specified by the keywords files. For example, a biobrick object would have a sequence field. Datum objects present a set of getter and setter methods that take field keywords as an argument, and return the data in the given field.

Database Connection Manager Tutorial