Team:Berkeley Software/Data Model

From 2009.igem.org

Revision as of 00:51, 18 October 2009 by Sixpi (Talk | contribs)






Video placeholder

Clotho Infrastructure and Reconfigurable Data Model

Contents

Introduction

With the growing number of parts available to synthetic biologists, the amount of data needed to identify and describe these parts grows as well. However, there is a lack of a unified data model to organize this data. As a result, each institution's database of parts looks slightly different from the databases at other institutions, making tool development for the wider community difficult. Clotho Classic had a database connection manager that could connect to many different types of databases. However, this manager was limited and could only connect to databases with structures similar to its internal data model. Clotho builds upon this idea, and now includes a much more robust database connection module that can deal with a larger range of database organizational patterns. This is accomplished by allowing the user to extend the core set of keywords that the tools linked into Clotho may use. This functionality will allow Clotho to be useful during while the standards in the synthetic biology community continue to be developed.



=

High level overview of the new Clotho infrastructure.

The picture above may look intimidating, but in actuality, the new Clotho infrastructure has been greatly simplified since last year. It is now far easier to develop a new tool that can operate within Clotho, and these tools have more powerful API functions at their disposal. There is now just one simple Tool API for tool-to-tool interaction, and database communication has been standardized with the creation of the Data API. The Data API leverages the power of Hibernate, an external library that does most of the database interaction for us. Although tools need a

Clotho Keywords

Clotho keywords provide the flexibility in both tool-to-tool interaction and tool-to-database communication. A keyword is a name given to some type of a data synthetic biologists would like to keep track of. We separate keywords into two categories: object keywords and field keywords. An object keyword is a name for a collection of data that describes some entity. For example, biobrick and person could be two object keywords. A field keyword, on the other hand, is a name given to a primitive piece of data. For example, sequence or name are two field keywords. An object keyword will generally have one or more field keywords associated with it, such as sequence for biobrick or name for person.

These keywords and the relationship between them are described in keyword files, which are XML files in a format that Clotho can understand. These files specify what keywords tools may use. Clotho comes with a default core keyword file that all the tools written by the Berkeley iGEM team use, but a developer can easily write their own keyword file to allow their own tools to work with data not described in the core Clotho data model.

BerkeleySoftwareClothoKeyword.png

Tool API

The Tool API provides a standardized way for tools to interact. Any Java class that implements our ClothoTool interface must provide some methods other tools can use to get and send data. Since the Tool Core can be queried for a list of all the active tools, it is possible to easily write a tool that pulls data from any tool, even if the tool you need to pull data from didn't exist when you wrote your tool. For example, we could add a function to our sequence view tool that allows the user to select another tool to get a sequence from. Since all tools have the getData() method, as long as the selected tool is able to return a sequence, the sequence view tool doesn't need to know what tool its asking for data.

Data API

The Data API provides tools with a simple and robust interface to external data sources. Using the same Clotho keywords as described earlier, tools can ask for data packaged up into Java objects. These objects will have types as specified by mapped object keywords. For example, the parts manager needs to work with biobrick and collections, so it will use Data API to ask for biobrick and collection objects. If these keywords were not mapped during the database connection process, then the parts manager will notify the user of an error. Otherwise, all the data that describes the biobricks and collections the parts manager needs use are returned as easy to use Java objects that can be queried for the data in their fields.