IllinoisTools/11 July 2009

7/5- 7/11
Donny

I created a graph library which I call dongraphlib to handle our specific functions. I took a look of code from NetworkX and pygraph but made modifications and left out everything we have no need for. I included a bidirectional dijkstra algorithm to make the shortest path calculation go extremely fast (less than a second!).

Palak

This week I worked on creating a script that would determine whether or not the reaction is reversible. I also worked on a few administrative tasks. I spoke with the Office of the Vice Chancellor of Research to discuss what kind of support they could provide us. I also took a look at a few grants that we could apply for in order to get funding for next year.

Kanishka

I worked on trying to blast through the biobrick database. It took Riyad and I several days to get it to work.

Riyad

Kanishka and I wrote a program to blast a sequence out program spits out and compare it to the entire BioBrick registry.

Nate

Alright, this week was fun filled. At the beginning of the week, it was decided that we should get a list of reactions organized by organism. As an example, HSA would have a list of reactions that occur in it. I had tried doing this before, but I was having difficulty trying to organize data. Part of the problem was that I was using the KEGG API, which is just plain awful for our purposes. Fortunately, Donny and I were able to find a series of FTP files that list every reaction for each reaction pathway listed for each organism. Therefore, I set out to write a program in python to retrieve this data using the urllib module for python, the FTP files and the KEGG API. My initial program was of course really slow since I was using the KEGG API. Fortunately Donny pointed out my silly error, and helped me write a much faster program that only used the FTP files via the urllib module in python. Unfortunately, our program that we wrote (which I named "wonderful.py") was still too slow for our purposes (on the order of days/weeks). Fortunately Donny and Rihad were able to come up with a different program that went about a different method of organizing reactions by organism. Now that we have this data, the next step was to get lists of enzymes organized by reaction. I had already obtained a list of lists like this, but the data I got was obtained via an old, slow program that I had written (which used the KEGG API). I needed a much faster means to collect and organize this data, for this data needs to be updated as KEGG is updated often. Therefore, I set out to write a new program that would use just the FTP files. I was able to complete a program that did just that, and it awesome fast. Next I looked into gene collecting and organizing them by enzyme. In evaluating the gene data for enzymes on the enzyme FTP, I realized that the KEGG only has genes listed for HALF of the enzymes listed. This poses a problem for us, for our program is supposed to output genes for enzymes that facilitate reactions needed to convert compound A to compound Z. I also noted some other weird little things about the KEGG database, but they seem trivial in regards to our overall goals.