Team:Berkeley Software/Data Model

From 2009.igem.org

(Difference between revisions)
 
(33 intermediate revisions not shown)
Line 2: Line 2:
{{Template:BerkeleySoftwareProject
{{Template:BerkeleySoftwareProject
|video=
|video=
-
<br /><br /><br /><br /><center>Video placeholder</center>
+
<center><html><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/2Eoap_HqbyU&hl=en&fs=1&rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/2Eoap_HqbyU&hl=en&fs=1&rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></html></center>
-
 
+
-
|rightSideBox=
+
-
<center>[[Image:DM.png|410px]]</center>
+
|title=
|title=
-
Data Model
+
Clotho Infrastructure and Reconfigurable Data Model
|intro=
|intro=
-
With the growing number of parts available to synthetic biologists, the amount of data needed to identify and describe these parts grows as well. However, there is a lack of a unified data model to organize this data. As a result, each institution's database of parts looks slightly different from the databases at other institutions, making tool development for the wider community difficult. Clotho Classic had a database connection manager that could connect to many different types of databases. However, this manager was limited and could only connect to databases with structures similar to its internal data. Clotho builds upon this idea, and now includes a much more robust database connection module that can deal with a larger range of database organizational patterns. This is accomplished by allowing the user to extend the core set of keywords that the tools linked into Clotho may use. This functionality will allow Clotho to be useful during while the standards in the synthetic biology community continue to be developed.
+
[[Image:DM.png|300px|right|thumb|<span style="font-family: Georgia, serif;">Clotho Data Model - Smart APIs (Courtesy of [http://christinetsin.designbinder.com/ Christine Tsin])</span>]]
 +
 
 +
<font size="4">W</font>ith the growing number of parts available to synthetic biologists, the amount of data needed to identify and describe these parts grows as well. However, there is a lack of a unified data model to organize this data. As a result, each institution's database of parts looks slightly different from the databases at other institutions, making tool development for the wider community difficult if not impossible. [https://2008.igem.org/Team:UC_Berkeley_Tools Clotho Classic] had a database connection manager that could connect to many different types of databases. However, this manager was limited and could only connect to databases with structures very similar to its internal data model. Clotho builds upon this idea, and now includes a much more robust database connection module that can deal with a larger range of database organizational patterns. This is accomplished by allowing the user to extend the core set of ''keywords'' that the tools linked into Clotho may use. This functionality will allow Clotho to be useful while the standards in the synthetic biology community continue to be developed. This page describes how this data model works as well as presents the APIs available to Clotho tool developers.
|content=
|content=
-
Put content here
+
===Motivation===
 +
[[Image:ClothoVsClothoClassic.png|thumb|left|450px|<span style="font-family: Georgia, serif;">Clotho Classic vs. Revamped Clotho: Comm. and Data APIs</span>]]
 +
Clotho Classic was a toolbox useful for ''part development''. This summer, as we began to build new tools to be used in ''device development'', we started to run into the limitations of creating tools that fit into the Clotho Classic infrastructure. One annoyance was the difficulty in integrating the new tools with existing tools. Tool-to-tool communication through the Clotho Classic infrastructure was limited, so existing tools that needed to work together had to make use of tool specific interfaces. This meant that integrating a new tool required taking all of the existing tools into account, which is clearly sustainable if we hope to make many tools. As the figure on the left shows, tool communication required multiple function calls which corresponded to a series of ''request'' and ''response'' style handshakes. Not only were there multiple function calls, but also each tool had to "know" about the other tools which meant developers had to be very aware of the underlying software architecture.
 +
 
 +
Another limitation was the fixed data model. Clotho Classic promised to be compatible with any database, but in reality the target database had to be organized similarly to Clotho Classic's internal data model. This greatly limited Clotho Classic's usefulness. Indeed, the Clotho Classic team tried to work with the iGEM team from Davidson to get Clotho Classic to work with their database, but was unsuccessful due largely to the disparity between the two databases' organization. We wanted to prevent events of this nature from happening in the future, but attempts during the summer to hash out a standard synthetic biology data exchange model proved unsuccessful. We decided that a truly reconfigurable data model was the best solution for the time being until a standard data model for synthetic biology is developed. As the figure on the left shows, now database communication uses a standardized data structure as well as a simplified, standard yet powerful query API.
 +
 
 +
Lastly, even assuming we had a database Clotho Classic could connect to, developing tools that relied heavily on database interactions was difficult. Clotho Classic did provide a set of data API functions. In theory, these could be used to interact with the database. However, in practice these functions were unwieldy and difficult to use. Members of the development team for Clotho Classic in the Spring found that hand crafting their own SQL queries to be easier than using the existing data API. Considering the increasing amount of synthetic biology data stored in databases, tools that can interact with this data are also needed, and rapid development of these tools depends on a good API for database interaction.
 +
<br clear="all">
 +
 
 +
===Feature Overview===
 +
The new Clotho infrastructure addresses the limitations described above and provides three useful services to all Clotho Tools.
 +
 
 +
*First, all Clotho Tools implement a ClothoTool interface, which allows them to communicate with other ClothoTools through the Tool API.
 +
 
 +
*Second, the Clotho infrastructure provides a robust database connection where database tuples are returned as Java objects (using [http://en.wikipedia.org/wiki/Object_Relational_Mapping Object Relational Mapping]).
 +
 
 +
*Finally, Clotho's data model is reconfigurable at runtime, which means that Clotho Tools don't need to be tied down to a specific data model.
 +
 
 +
===Technical Overview===
 +
This technical overview assumes some familiarity with the Java programming language, as well as knowledge of some basic database terminology.
 +
 
 +
[[Image:BerkeleySoftwareClothoInfrastructure.png|thumb|850px|center|<span style="font-family: Georgia, serif;">Top level Clotho infrastructure file. This is a conceptual overview - the Java classes are not structured in quite this way.</span>]]
 +
 
 +
The new Clotho infrastructure has been greatly simplified since last year, and this simplification was driven by a desire to provide tools with the services described above. It is now far easier to develop a new tool that can operate within Clotho, and these tools have more powerful API functions at their disposal. The above picture shows the relationships between the three services Clotho provides to tools. Tools can speak to each other through the Clotho Tool Core shown on the right. Tools can also access the Data Core through the Data API to talk to external databases. On the left hand side we see a diagram of how Clotho Keywords are read in at runtime to create an internal data model, and this process is explained in more detail below.
 +
 
 +
====Clotho Keywords====
 +
Clotho keywords provide the flexibility in both tool-to-tool interaction and tool-to-database communication. A keyword is a name given to some type of a data synthetic biologists would like to keep track of. We separate keywords into two categories: object keywords and field keywords. An object keyword is a name for a collection of data that describes some entity. For example, biobrick and person could be two object keywords. A field keyword, on the other hand, is a name given to a primitive piece of data. For example, sequence or name are two field keywords. An object keyword will generally have one or more field keywords associated with it, such as sequence for biobrick or name for person.
 +
 
 +
These keywords and the relationship between them are described in keyword files, which are XML files in a format that Clotho can understand. These files specify what keywords tools may use. Clotho comes with a default core keyword file that all the tools written by the Berkeley iGEM team use, but a developer can easily write their own keyword file to allow their own tools to work with data not described in the core Clotho data model.
 +
 
 +
[[Image:BerkeleySoftwareClothoKeyword.png|thumb|400px|right|<span style="font-family: Georgia, serif;">Example Clotho Keyword XML file</span>]]
 +
 
 +
Here is an sample keyword file. As we can see, the top level XML element is named Objects. This is required for all keyword XML files. The next element, globalFields, is optional. This element is a list of field elements, and is meant for fields that all objects should have so the user does not need to retype the same fields for multiple objects. Each field element has a name element, and an optional description element. If there is a globalFields element, it must be the first element in Objects. After the globalFields element, we have a list of object elements, each object element having a name and a list of field elements.
 +
 
 +
After we are done inputting keyword files, we open up then Database Connection Manager through the menu of the main Clotho window. We create a new connection by inputting our information, and a mapping display will pop up. This process allows us to associate objects with tables in a given database, and fields to columns of these tables. When we are done mapping keywords, the database connection manager can attempt to look for connections between objects based on database metadata. For example, suppse the biobricks table in the database has a foreign key to the person table, and the biobricks table has been mapped to the biobrick object and the person table has been mapped to the person object. Then the manager will create a connection between the biobrick and person objects.
 +
 
 +
The mapped object and field keywords are then used by a code generator to create the files used by the Hibernate library. Hibernate provides a way to store and retrieve data from a database as Java objects. It requires a set of Java classes that conform to the [http://en.wikipedia.org/wiki/JavaBean JavaBean] standard, as well as a set of XML mapping files. Our code generator creates these files at runtime, compiles them, and links them into Clotho so that Hibernate can function.
 +
 
 +
The end goal is to create a simple and robust way to reconfigure the data model used by Clotho, so that tool development can proceed independently of data model development to some degree.
 +
 
 +
====Tool API====
 +
[[Image:CommAPIJavaDoc.png|thumb|400px|left|<span style="font-family: Georgia, serif;">Comm. API Java Doc</span>]]
 +
The Tool API provides a standardized way for tools to interact, since all Clotho Tools must implement our ClothoTool interface in order to be loaded by the Tool Core. For example, take our sequence view tool. This tool is just a simple sequence editor based on [http://www.biology.utah.edu/jorgensen/wayned/ape/ ApE]. We would like this tool to be able to import sequence data from any other tool easily. Using the Tool API, this operation is very simple - we merely call the getData() method with the argument "sequence", since all Clotho Tools must have a getData() method. So in this example, the sequence view can query the Tool Core for a list of all active tools, display the list, and then import a sequence from a user-selected tool.
 +
 
 +
In Clotho Classic, integrating a new tool meant the developer had to take into account all other existing tools, and old tools might also have to be rewritten to communicate with the new tool. In Clotho, all Clotho Tools can talk to each other through the tool core, which greatly simplifies the integration of new tools.
 +
 
 +
We realize that our Tool API doesn't solve all tool-to-tool communication problems - if two tools need to work very closely together, their code will still be more interdependent. However, our Tool API does provide a way to make tools that integrate easily with new tools.
 +
<br clear="all">
 +
====Data API====
 +
[[Image:DataAPIJavaDoc.png|thumb|400px|right|<span style="font-family: Georgia, serif;">Data API Java Doc</span>]]
 +
The Data API provides tools with a simple and robust interface to external data sources. Using the same Clotho keywords as described earlier, tools can ask for data packaged up into Java objects. Specifically, the Data API returns objects from the database as Datum objects. These Datum objects have data stored in their fields, as specified by the keywords files. For example, a biobrick object would have a sequence field. Datum objects present a set of getter and setter methods that take field keywords as an argument, and return the data in the given field. These objects will have types as specified by mapped object keywords.
 +
 
 +
In practice, these objects are easier to work with that raw data from a database connection. For example, the parts manager needs to work with biobrick and collections, so it will use Data API to ask for biobrick and collection objects. If these keywords were not mapped during the database connection process, then the parts manager will notify the user of an error. Otherwise, all the data that describes the biobricks and collections the parts manager needs use are returned as easy to use Java objects that can be queried for the data in their fields. In contrast, Java SQL libraries will return data in table form, and the programmer must take the effort to parse these tables to obtain the data they need.
 +
<br clear="all">
 +
===Database Connection Manager Tutorial===
 +
[[Image:BerkeleySoftwareDatabaseConnectionManager.png|thumb|left|275px|<span style="font-family: Georgia, serif;">Main database connection manager window</span>]]
 +
Database connections are initiated through the menu in the main Clotho window. Saved connections can be managed using the first tab, and new connections can be made on the second tab, which is shown above. At the moment, Clotho only supports connections to mySQL databases, but it will be able to connect to other SQL database implementations in the future. All of the five fields are required, and this information should be requested from your database administrator. Upon clicking connect, the window below should appear.
 +
 
 +
<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>
 +
[[Image:BerkeleySoftwareDatabaseMapper.png|thumb|right|350px|<span style="font-family: Georgia, serif;">Mapping window</span>]]
 +
The mapping window has four lists as labelled. Clicking on any object keyword will populate the field keyword list with the field keywords associated with the selected object keyword. Clicking on any table will populate the column list with the columns in the selected table. A selected object keyword can be mapped to a table, and once this is done, the field keywords can be mapped to columns. Once you are done binding all of your keywords, click on Finish Binding, which will cause Clotho to attempt to generate the Hibernate files required for a connection. If this process is successful, you will be prompted to save the connection if you wish. Saved connections can be zipped up and distributed to other lab members. This tool is meant to be simple to use, but only the database administrator should need to go through this mapping process, and the rest of the lab can re-use the generated connection files.
}}
}}
 +
__NOTOC__

Latest revision as of 07:18, 26 April 2010


Clotho Infrastructure and Reconfigurable Data Model

Introduction

Clotho Data Model - Smart APIs (Courtesy of [http://christinetsin.designbinder.com/ Christine Tsin])

With the growing number of parts available to synthetic biologists, the amount of data needed to identify and describe these parts grows as well. However, there is a lack of a unified data model to organize this data. As a result, each institution's database of parts looks slightly different from the databases at other institutions, making tool development for the wider community difficult if not impossible. Clotho Classic had a database connection manager that could connect to many different types of databases. However, this manager was limited and could only connect to databases with structures very similar to its internal data model. Clotho builds upon this idea, and now includes a much more robust database connection module that can deal with a larger range of database organizational patterns. This is accomplished by allowing the user to extend the core set of keywords that the tools linked into Clotho may use. This functionality will allow Clotho to be useful while the standards in the synthetic biology community continue to be developed. This page describes how this data model works as well as presents the APIs available to Clotho tool developers.



Motivation

Clotho Classic vs. Revamped Clotho: Comm. and Data APIs

Clotho Classic was a toolbox useful for part development. This summer, as we began to build new tools to be used in device development, we started to run into the limitations of creating tools that fit into the Clotho Classic infrastructure. One annoyance was the difficulty in integrating the new tools with existing tools. Tool-to-tool communication through the Clotho Classic infrastructure was limited, so existing tools that needed to work together had to make use of tool specific interfaces. This meant that integrating a new tool required taking all of the existing tools into account, which is clearly sustainable if we hope to make many tools. As the figure on the left shows, tool communication required multiple function calls which corresponded to a series of request and response style handshakes. Not only were there multiple function calls, but also each tool had to "know" about the other tools which meant developers had to be very aware of the underlying software architecture.

Another limitation was the fixed data model. Clotho Classic promised to be compatible with any database, but in reality the target database had to be organized similarly to Clotho Classic's internal data model. This greatly limited Clotho Classic's usefulness. Indeed, the Clotho Classic team tried to work with the iGEM team from Davidson to get Clotho Classic to work with their database, but was unsuccessful due largely to the disparity between the two databases' organization. We wanted to prevent events of this nature from happening in the future, but attempts during the summer to hash out a standard synthetic biology data exchange model proved unsuccessful. We decided that a truly reconfigurable data model was the best solution for the time being until a standard data model for synthetic biology is developed. As the figure on the left shows, now database communication uses a standardized data structure as well as a simplified, standard yet powerful query API.

Lastly, even assuming we had a database Clotho Classic could connect to, developing tools that relied heavily on database interactions was difficult. Clotho Classic did provide a set of data API functions. In theory, these could be used to interact with the database. However, in practice these functions were unwieldy and difficult to use. Members of the development team for Clotho Classic in the Spring found that hand crafting their own SQL queries to be easier than using the existing data API. Considering the increasing amount of synthetic biology data stored in databases, tools that can interact with this data are also needed, and rapid development of these tools depends on a good API for database interaction.

Feature Overview

The new Clotho infrastructure addresses the limitations described above and provides three useful services to all Clotho Tools.

  • First, all Clotho Tools implement a ClothoTool interface, which allows them to communicate with other ClothoTools through the Tool API.
  • Second, the Clotho infrastructure provides a robust database connection where database tuples are returned as Java objects (using [http://en.wikipedia.org/wiki/Object_Relational_Mapping Object Relational Mapping]).
  • Finally, Clotho's data model is reconfigurable at runtime, which means that Clotho Tools don't need to be tied down to a specific data model.

Technical Overview

This technical overview assumes some familiarity with the Java programming language, as well as knowledge of some basic database terminology.

Top level Clotho infrastructure file. This is a conceptual overview - the Java classes are not structured in quite this way.

The new Clotho infrastructure has been greatly simplified since last year, and this simplification was driven by a desire to provide tools with the services described above. It is now far easier to develop a new tool that can operate within Clotho, and these tools have more powerful API functions at their disposal. The above picture shows the relationships between the three services Clotho provides to tools. Tools can speak to each other through the Clotho Tool Core shown on the right. Tools can also access the Data Core through the Data API to talk to external databases. On the left hand side we see a diagram of how Clotho Keywords are read in at runtime to create an internal data model, and this process is explained in more detail below.

Clotho Keywords

Clotho keywords provide the flexibility in both tool-to-tool interaction and tool-to-database communication. A keyword is a name given to some type of a data synthetic biologists would like to keep track of. We separate keywords into two categories: object keywords and field keywords. An object keyword is a name for a collection of data that describes some entity. For example, biobrick and person could be two object keywords. A field keyword, on the other hand, is a name given to a primitive piece of data. For example, sequence or name are two field keywords. An object keyword will generally have one or more field keywords associated with it, such as sequence for biobrick or name for person.

These keywords and the relationship between them are described in keyword files, which are XML files in a format that Clotho can understand. These files specify what keywords tools may use. Clotho comes with a default core keyword file that all the tools written by the Berkeley iGEM team use, but a developer can easily write their own keyword file to allow their own tools to work with data not described in the core Clotho data model.

Example Clotho Keyword XML file

Here is an sample keyword file. As we can see, the top level XML element is named Objects. This is required for all keyword XML files. The next element, globalFields, is optional. This element is a list of field elements, and is meant for fields that all objects should have so the user does not need to retype the same fields for multiple objects. Each field element has a name element, and an optional description element. If there is a globalFields element, it must be the first element in Objects. After the globalFields element, we have a list of object elements, each object element having a name and a list of field elements.

After we are done inputting keyword files, we open up then Database Connection Manager through the menu of the main Clotho window. We create a new connection by inputting our information, and a mapping display will pop up. This process allows us to associate objects with tables in a given database, and fields to columns of these tables. When we are done mapping keywords, the database connection manager can attempt to look for connections between objects based on database metadata. For example, suppse the biobricks table in the database has a foreign key to the person table, and the biobricks table has been mapped to the biobrick object and the person table has been mapped to the person object. Then the manager will create a connection between the biobrick and person objects.

The mapped object and field keywords are then used by a code generator to create the files used by the Hibernate library. Hibernate provides a way to store and retrieve data from a database as Java objects. It requires a set of Java classes that conform to the [http://en.wikipedia.org/wiki/JavaBean JavaBean] standard, as well as a set of XML mapping files. Our code generator creates these files at runtime, compiles them, and links them into Clotho so that Hibernate can function.

The end goal is to create a simple and robust way to reconfigure the data model used by Clotho, so that tool development can proceed independently of data model development to some degree.

Tool API

Comm. API Java Doc

The Tool API provides a standardized way for tools to interact, since all Clotho Tools must implement our ClothoTool interface in order to be loaded by the Tool Core. For example, take our sequence view tool. This tool is just a simple sequence editor based on [http://www.biology.utah.edu/jorgensen/wayned/ape/ ApE]. We would like this tool to be able to import sequence data from any other tool easily. Using the Tool API, this operation is very simple - we merely call the getData() method with the argument "sequence", since all Clotho Tools must have a getData() method. So in this example, the sequence view can query the Tool Core for a list of all active tools, display the list, and then import a sequence from a user-selected tool.

In Clotho Classic, integrating a new tool meant the developer had to take into account all other existing tools, and old tools might also have to be rewritten to communicate with the new tool. In Clotho, all Clotho Tools can talk to each other through the tool core, which greatly simplifies the integration of new tools.

We realize that our Tool API doesn't solve all tool-to-tool communication problems - if two tools need to work very closely together, their code will still be more interdependent. However, our Tool API does provide a way to make tools that integrate easily with new tools.

Data API

Data API Java Doc

The Data API provides tools with a simple and robust interface to external data sources. Using the same Clotho keywords as described earlier, tools can ask for data packaged up into Java objects. Specifically, the Data API returns objects from the database as Datum objects. These Datum objects have data stored in their fields, as specified by the keywords files. For example, a biobrick object would have a sequence field. Datum objects present a set of getter and setter methods that take field keywords as an argument, and return the data in the given field. These objects will have types as specified by mapped object keywords.

In practice, these objects are easier to work with that raw data from a database connection. For example, the parts manager needs to work with biobrick and collections, so it will use Data API to ask for biobrick and collection objects. If these keywords were not mapped during the database connection process, then the parts manager will notify the user of an error. Otherwise, all the data that describes the biobricks and collections the parts manager needs use are returned as easy to use Java objects that can be queried for the data in their fields. In contrast, Java SQL libraries will return data in table form, and the programmer must take the effort to parse these tables to obtain the data they need.

Database Connection Manager Tutorial

Main database connection manager window

Database connections are initiated through the menu in the main Clotho window. Saved connections can be managed using the first tab, and new connections can be made on the second tab, which is shown above. At the moment, Clotho only supports connections to mySQL databases, but it will be able to connect to other SQL database implementations in the future. All of the five fields are required, and this information should be requested from your database administrator. Upon clicking connect, the window below should appear.











Mapping window

The mapping window has four lists as labelled. Clicking on any object keyword will populate the field keyword list with the field keywords associated with the selected object keyword. Clicking on any table will populate the column list with the columns in the selected table. A selected object keyword can be mapped to a table, and once this is done, the field keywords can be mapped to columns. Once you are done binding all of your keywords, click on Finish Binding, which will cause Clotho to attempt to generate the Hibernate files required for a connection. If this process is successful, you will be prompted to save the connection if you wish. Saved connections can be zipped up and distributed to other lab members. This tool is meant to be simple to use, but only the database administrator should need to go through this mapping process, and the rest of the lab can re-use the generated connection files.