Team:Illinois-Tools/Notebook/Week4
From 2009.igem.org
(→6/14-6/20) |
(→6/14-6/20) |
||
Line 37: | Line 37: | ||
This week has been very exciting in terms of coding and programming. I had fun learning about dictionaries and how to make them and use them in Python. I was also able to manipulate the Kegg database in a number of ways through the KEGG API. Looking through the KEGG API, there are many commands, such as get_reactions_by_enzyme, and get_enzymes_by_compound. However, I realized there is no command to get_reactions_by_organism. Since our program will go from compound A to compound B through the various reactions listed on KEGG, I felt this type of function would be very useful to us. This would be useful in determining a host organism by seeing which organisms each reaction can take place in. Therefore, using the new tools I had learned for Python and the KEGG API I started making programs that compiled lists of information from the KEGG database that I could use to perform this organisms, and hopefully vice versa for get_organisms_by_reaction. Before I started making code, I made a map of going from Organsims, to genes, to enzymes, and finally to reactions. My first task was to compile a list of Organsims and the 3 or 4 letter codes that KEGG has assigned to each organism. These code names are needed when doing certain queries to the KEGG API. In particular, I needed this list of code names for the Organisms to genes step via the command get_genes_by_organisms. I ended up writing my code to retrieve all of the genes by code name for every organism listed on the KEGG database and compiled these lists of genes for every organism into one HUGE list of lists. I ended exporting this extremely large list of lists of genes by organism to a text file via the pickle module for python. This text file is the biggest text file I have ever seen, coming in around 200 MB. LOL.... This program also took a long time to run as well, took about 3 or 4 hours to finish. Since this list is so huge, I decided to just work with Ecoli for now, and move on to S cerevisiae and other organisms later. My next step was to get enzymes by genes. I did this for Ecoli only, and compiled and exported the list to a txt file. This also took quite some time. I started working on getting reactions by enzyme, when I learned something interesting, yet annoying. Enzymes are given a code name in the form of 'ec:1.1.1.3' on the kegg database. This code name is "non-organism" specific. Therefore, getting reactions by enzyme has the potential of retrieving reactions that do not occur in Ecoli, since the enzyme name is a generic name for any enzyme that performs a reaction. Sometimes, enzymes can facilitate more than one reaction. In terms of Ecoli, not all of these reactions that an enzyme can perform can necessarily occur in Ecoli. Therefore, I discovered I must find another way that will be organism specific throughout the whole process of going from organism to reaction. My new flow was now from organism, to gene, to ko number (orthology?? not fully uderstood), to pathway, to reactions that occur in pathway. I got all the way through to reactions for Ecoli, but I realized I had another problem. Yes, I had a list of reactions for Ecoli, but I did not know which gene was associated with what reaction. That I realized is very important to know; what genes are associated with what reactions. This will be very useful for our program. To my dismay, I could not think of a way to get reactions by gene and vice versa through the KEGG API alone. Therefore, I used the KEGG FTP to obtain a list of reactions, that I hope to get enzymes for, and then the genes for all of those enzymes. This I will work with next week. On a side note, I thought of some fun features that we could have as a part of our program. One idea I had was converting genes that qualify from the KEGG database into biobricks by basically adding the DNA needed to fulfill the state of being a 'biobrick'. Another idea I had was to optimize all of the genes that are to be outputted by our program for the host organism that is chosen by user through codon optimization. Each organism has varying levels of charged tRNAs. The idea would be to alter the codons so that the percentage of each codon reflects the percentage of the charged tRNAs. Right now, these ideas are still being considered, they might be fun to do once our program is finished. | This week has been very exciting in terms of coding and programming. I had fun learning about dictionaries and how to make them and use them in Python. I was also able to manipulate the Kegg database in a number of ways through the KEGG API. Looking through the KEGG API, there are many commands, such as get_reactions_by_enzyme, and get_enzymes_by_compound. However, I realized there is no command to get_reactions_by_organism. Since our program will go from compound A to compound B through the various reactions listed on KEGG, I felt this type of function would be very useful to us. This would be useful in determining a host organism by seeing which organisms each reaction can take place in. Therefore, using the new tools I had learned for Python and the KEGG API I started making programs that compiled lists of information from the KEGG database that I could use to perform this organisms, and hopefully vice versa for get_organisms_by_reaction. Before I started making code, I made a map of going from Organsims, to genes, to enzymes, and finally to reactions. My first task was to compile a list of Organsims and the 3 or 4 letter codes that KEGG has assigned to each organism. These code names are needed when doing certain queries to the KEGG API. In particular, I needed this list of code names for the Organisms to genes step via the command get_genes_by_organisms. I ended up writing my code to retrieve all of the genes by code name for every organism listed on the KEGG database and compiled these lists of genes for every organism into one HUGE list of lists. I ended exporting this extremely large list of lists of genes by organism to a text file via the pickle module for python. This text file is the biggest text file I have ever seen, coming in around 200 MB. LOL.... This program also took a long time to run as well, took about 3 or 4 hours to finish. Since this list is so huge, I decided to just work with Ecoli for now, and move on to S cerevisiae and other organisms later. My next step was to get enzymes by genes. I did this for Ecoli only, and compiled and exported the list to a txt file. This also took quite some time. I started working on getting reactions by enzyme, when I learned something interesting, yet annoying. Enzymes are given a code name in the form of 'ec:1.1.1.3' on the kegg database. This code name is "non-organism" specific. Therefore, getting reactions by enzyme has the potential of retrieving reactions that do not occur in Ecoli, since the enzyme name is a generic name for any enzyme that performs a reaction. Sometimes, enzymes can facilitate more than one reaction. In terms of Ecoli, not all of these reactions that an enzyme can perform can necessarily occur in Ecoli. Therefore, I discovered I must find another way that will be organism specific throughout the whole process of going from organism to reaction. My new flow was now from organism, to gene, to ko number (orthology?? not fully uderstood), to pathway, to reactions that occur in pathway. I got all the way through to reactions for Ecoli, but I realized I had another problem. Yes, I had a list of reactions for Ecoli, but I did not know which gene was associated with what reaction. That I realized is very important to know; what genes are associated with what reactions. This will be very useful for our program. To my dismay, I could not think of a way to get reactions by gene and vice versa through the KEGG API alone. Therefore, I used the KEGG FTP to obtain a list of reactions, that I hope to get enzymes for, and then the genes for all of those enzymes. This I will work with next week. On a side note, I thought of some fun features that we could have as a part of our program. One idea I had was converting genes that qualify from the KEGG database into biobricks by basically adding the DNA needed to fulfill the state of being a 'biobrick'. Another idea I had was to optimize all of the genes that are to be outputted by our program for the host organism that is chosen by user through codon optimization. Each organism has varying levels of charged tRNAs. The idea would be to alter the codons so that the percentage of each codon reflects the percentage of the charged tRNAs. Right now, these ideas are still being considered, they might be fun to do once our program is finished. | ||
+ | |||
+ | '''Palak''' | ||
+ | This week the changed the direction of our project slightly. We decided that we would add an option that allowed the user to choose a shortest path depending on a chosen factor such as lowest energy usage or least number of cofactors. This allowed us to create graphs of the reactions with weights. We would then find the path of least weight for the given factor. This eliminated the need to find all paths, which we found wasn't feasible. Due to this the need for flux analysis functions was not crucial right away. I put this part of the project on hold and started working on another aspect, creating databases. We found that it would speed up our program if we had our own database of reactions and their properties. I explored different possibilities to create this database. We work with MySQL and SQLite. Ankit and I determined that SQLite was sufficient for our purposes. The two of us, spent the rest of the week learning how to use SQLite3 and integrate it with Django. |
Revision as of 21:15, 29 June 2009
6/14-6/20
Donny
I have compiled a dictionary of over 200 chemicals determined by the EPA to be toxic linked to their respective Kegg IDs. I followed the tutorials for creating a webpage with django and have decided that this will be a good fit for our project. The dictionary of chemicals was made into an HTML drop down menu. In the upcoming weeks, I will make the html form linked to our SQL database so that it can be automatically updated as the EPA list is updated. I am working on making a simple layout for the page where the user can select a toxin from a list as well as select a desired product- the page will respond to these selections and indicate that it recognizes the choices made. Although this is an unimpressive feat, it will lay some foundation for our pages IO. I am also currently investigating how to improve the pathfinding algorithm that Palak found on the net. It is nowhere near efficient enough for our purposes. Ideally, this algorithm will be able to quickly scan the huge network of reactions and show every possible series of reactions that can go from input compound to output compound. Unfortunately, this is only the beginning because we will have to then determine our method of choosing which path will be the most realistic and also optimal in the sense of production.
In other news, I'm still not on the Illinois iGEM mailing list so I don't find out about things like team beer pong tournaments and birthday celebrations. (I assume I'm not missing much- but if I don't get invited to summer thanksgiving I'll be pretty upset).
Ankit
So after receiving Donny's intimidating email, I finally feel like I should write in the lab notebook. I am still without my laptop, which makes it harder to work on code. But for the last week I have been working on getting COBRA sources codes uploaded to our program. I found a program called OMPC. Apparently, OMPC allows running MATLAB®'s m-files using Python interpreter. OMPC reads m-files and translates them into Python compatible code.
Note having a laptop, I have been unable to check if this program works. Hopefully my laptop comes in tomorrow, so I can check this out because it will save us a lot of time by note having to convert about 30-40 COBRA source codes written in MATLAB to Phyton.
Moreover, I am still concerned with our project. It seems as we are just doing what the Maranas group has already done with the OptStrain (OptStrain: a computational framework for redesign of microbial production systems. {Maranas 2004}). It seems like they have already done what we have done but their code is in GAMMS and isn't an open source program. Hopefully when our program is done, we will have some awesome cool features.
P.S. on a unrelated note and knowing that none technicians that are currently repairing my laptop are reading this, I just like to say that I WANT MY LAPTOP NOW!
kanishka
Last week I had the MCATS so I took a little break, but now I'm back. I've been trying to learn python and have been following Dr Bhalerao's book, "Python Cookbook". My goal is to finish the book by the end of the week and learn some cool stuff. I will also refer to another book that I have torrented (shhh!, downloaded), "Programming Python". I have also been messing around with the Kegg API and tried to figure out how it works.
Riyad This week I learned a good deal about making stand alone matlab functions. This would allow us to use the COBRA functions outside of a matlab environment. I also started doing research into the S-matrix and found some matlab functions that might come into future use. I continued to get familiar with python and made a random letter generator that spits out "A" "T" "C" and "G" ,randomly... It was fun to make but not sure how practical it is
Nate'
This week has been very exciting in terms of coding and programming. I had fun learning about dictionaries and how to make them and use them in Python. I was also able to manipulate the Kegg database in a number of ways through the KEGG API. Looking through the KEGG API, there are many commands, such as get_reactions_by_enzyme, and get_enzymes_by_compound. However, I realized there is no command to get_reactions_by_organism. Since our program will go from compound A to compound B through the various reactions listed on KEGG, I felt this type of function would be very useful to us. This would be useful in determining a host organism by seeing which organisms each reaction can take place in. Therefore, using the new tools I had learned for Python and the KEGG API I started making programs that compiled lists of information from the KEGG database that I could use to perform this organisms, and hopefully vice versa for get_organisms_by_reaction. Before I started making code, I made a map of going from Organsims, to genes, to enzymes, and finally to reactions. My first task was to compile a list of Organsims and the 3 or 4 letter codes that KEGG has assigned to each organism. These code names are needed when doing certain queries to the KEGG API. In particular, I needed this list of code names for the Organisms to genes step via the command get_genes_by_organisms. I ended up writing my code to retrieve all of the genes by code name for every organism listed on the KEGG database and compiled these lists of genes for every organism into one HUGE list of lists. I ended exporting this extremely large list of lists of genes by organism to a text file via the pickle module for python. This text file is the biggest text file I have ever seen, coming in around 200 MB. LOL.... This program also took a long time to run as well, took about 3 or 4 hours to finish. Since this list is so huge, I decided to just work with Ecoli for now, and move on to S cerevisiae and other organisms later. My next step was to get enzymes by genes. I did this for Ecoli only, and compiled and exported the list to a txt file. This also took quite some time. I started working on getting reactions by enzyme, when I learned something interesting, yet annoying. Enzymes are given a code name in the form of 'ec:1.1.1.3' on the kegg database. This code name is "non-organism" specific. Therefore, getting reactions by enzyme has the potential of retrieving reactions that do not occur in Ecoli, since the enzyme name is a generic name for any enzyme that performs a reaction. Sometimes, enzymes can facilitate more than one reaction. In terms of Ecoli, not all of these reactions that an enzyme can perform can necessarily occur in Ecoli. Therefore, I discovered I must find another way that will be organism specific throughout the whole process of going from organism to reaction. My new flow was now from organism, to gene, to ko number (orthology?? not fully uderstood), to pathway, to reactions that occur in pathway. I got all the way through to reactions for Ecoli, but I realized I had another problem. Yes, I had a list of reactions for Ecoli, but I did not know which gene was associated with what reaction. That I realized is very important to know; what genes are associated with what reactions. This will be very useful for our program. To my dismay, I could not think of a way to get reactions by gene and vice versa through the KEGG API alone. Therefore, I used the KEGG FTP to obtain a list of reactions, that I hope to get enzymes for, and then the genes for all of those enzymes. This I will work with next week. On a side note, I thought of some fun features that we could have as a part of our program. One idea I had was converting genes that qualify from the KEGG database into biobricks by basically adding the DNA needed to fulfill the state of being a 'biobrick'. Another idea I had was to optimize all of the genes that are to be outputted by our program for the host organism that is chosen by user through codon optimization. Each organism has varying levels of charged tRNAs. The idea would be to alter the codons so that the percentage of each codon reflects the percentage of the charged tRNAs. Right now, these ideas are still being considered, they might be fun to do once our program is finished.
Palak This week the changed the direction of our project slightly. We decided that we would add an option that allowed the user to choose a shortest path depending on a chosen factor such as lowest energy usage or least number of cofactors. This allowed us to create graphs of the reactions with weights. We would then find the path of least weight for the given factor. This eliminated the need to find all paths, which we found wasn't feasible. Due to this the need for flux analysis functions was not crucial right away. I put this part of the project on hold and started working on another aspect, creating databases. We found that it would speed up our program if we had our own database of reactions and their properties. I explored different possibilities to create this database. We work with MySQL and SQLite. Ankit and I determined that SQLite was sufficient for our purposes. The two of us, spent the rest of the week learning how to use SQLite3 and integrate it with Django.