From 2009.igem.org

Revision as of 00:24, 19 October 2009 by Naoiwamoto (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Notebook HEARTBEAT

Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!

July

7-27-2009

Meeting with Oliver Pelz
- Discuss general ideas of our Database Structure and Content
- An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
- Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
- We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
- GUI on wiki: which language? php? javascript?
- Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)

aim: create database until end of August

AUGUST

Week	Days
32	8-3-2009	8-4-2009	8-5-2009	8-6-2009	8-7-2009	-	-
33	-	-	8-12-2009	-	8-14-2009	-	-
34	8-17-2009	8-18-2009	8-19-2009	8-20-2009	8-21-2009	-	-

8-3-2009

First contact with MySQL
Start making an overview of other team's projects
Configuring our Virtual Server

8-4-2009

Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
Start installing developing environment on our internal server
- GNOME
- Mediawiki

8-5-2009

Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
- Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
- select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
- select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
- future experimental validation: ChIP-on-Chip
  - for this we need a TFBS-free sequence
- idea: plot histogram of TFBS relative to TSS
  - problem: choice of sequence: upstream only? inculde downstream?
- new programming language: R and perl
- next meeting: Friday after team meeting

Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
- An introduction into PromoterSweep
- Structure and analysis principles of PromoterSweep
- Output is stored in an XML file. This means we have to parse the xml code.
- Oliver Pelz will give help for us in programming

Protocol of the meeting can be downloaded from here.

Start working with MySQL
request UNIX/HUSAR/HPC access at DKFZ (Nao)
first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra (LINKS)

8-6-2009

Meeting with Oliver Pelz
- defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms

installation of NX server for access onto internal server from Windows
configure developing environment (printing from Linux, configure Mediawiki)
defining basic concept of database construction
- we select annotated promoter sequences in DoOP
- we make a selection of pathway of interest using KEGG
- narrow down number of target promoter sequences <10000.

8-7-2009

Official Team Meeting on Scheduling
Meeting with Anna-Lena and Tobias
- Introduction into R
- Tobias will give us access to their computing cluster (Group Roland Eils)
- Promoter Selection: DoOP, EnsEMBL, or UCSC?

HUSAR account arrived
installation of R, R editor and perl editor
further configuration of our internal server / mediawiki

8-10-2009

first contact with R and perl
playing around with R and perl
playing around with R library: Biobase
check working on DKFZ cluster

8-11-2009

defining programming languages: perl, R, MySQL
retrieving first Promotersweep output files

Meeting with Marti
- ideas for modeling
  - we will have at least three colors which overlap in their spectra.
  - a very nice approach will be Fuzzy Logic Modeling.
  - idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
  - idea 2: create&visualize fancy and fuzzy data from in silico simulation
- combine: promoter, output and graphic representation (GRAFIK!)
- next meeting with Marti: end of next week.

extract NCBI Entrez Gene IDs with R and perl
MAC adresses registered for bioquant network

8-12-2009

configure perl working environment
study structure of DoOP database
download DoOP and load DoOP database into MySQL

8-13-2009

trying out some DoOP queries
download fasta sequences from UCSC gene browser (LINK)
mapping of NCBI Entrez Gene IDs with RefSeq IDs
configure perl working environment on Windows XP
contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP (LINK)

8-14-2009

start PromoterSweep Analysis over Weekend

8-18-2009

Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!

study outputfile of PromoterSweep. check out general structure and pick up useful information.
result is grouped in: General Info, Best Genomic Mapping, Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
upon selection, sections of interest will be collected and made ready for entry into MySQL DB
discuss table structure of our database

How should our database be called? - Brainstorming -
- SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
- MAY contain: position, heartbeat, prediction, assembly, eukaryotes
- and still more keywords to come

8-19-2009

parse Promotersweep xml file into tab-separated text file (PERL CODE?)
- the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
- this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
studying structure and basic concepts of hash & key

8-20-2009

pre-decision for our table-structure
- Table: Main_Info
  - RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
- Table: Gene_Info
  - Ensembl_ID, Gene Symbol, Gene Description.
- we go for the RefSeq ID to be the key connecting these two tables.

8-21-2009

update script for parsing the Promotersweep output files due to unexpected errors
PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
- We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).

order [http://www.mathworks.com/| Matlab] iGEM licence

search for a tool to use MySQL in R programming environment
wiki: write an short article about the German Cancer Research Center (DKFZ)

Meeting with Anna-Lena: once we established our database... then
- two strategies:
  - manually select interesting transcription factors and analyse them using database queries
  - plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
- we go for both!
- idea for the future: we can analyze combinatorial appearance of distinct TF pairs

We have a name for our database - we call it -

- wait for it -

HEARTBEAT database (Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool)

8-22-2009

Insert Amplification of mitoneet-eGFP by PCR

8-24-2009

Restriction digest of mutagenized Plasmids (PstI) and analysis on gel

Amplified inserts were gel-purificated

What worked: eBFP, eBFP+NLS, eBFP_kozak, eBFP+NLS_kozak, eGFP, eGFP_kozak
What didn't: NLS, NLS_kozak, eGFP+mitomeet, mitomeet, eGFP+mitomeet_kozak, mitomeet_kozak,

8-26-2009

BBBing of Insertsequences

PCR of cherry, cherry_myrpalm, myrpalm, NLS with kozak Primers to amplify cherry_kozak, cherry_myrpalm_kozak, myrpalm_kozak, NLS_kozak

Restriction with NheI and SpeI of localisationsequences and Flourophores, Restricted Plasmid was provided by Synthetic Promoter Group and digested with SAP
Ligation with p31
Transformation in DH5alpha with ligated Plasmids
Outplating of Transformed cells on Amp-plates

8-27-2009

Ligation and Transformation did not work (no colonies, except of two on the NLS )
New PCR with flourophores and localisationsequences, to get higher amounts

GEl purification of: eGFP, eGFP_kozak, eBFP, eBFP_NLS, eBFP_kozak, eBFP_NLS_kozak, NLS_kozak, cherry, cherry_myrpalm, myrpalm, cherry_kozak, cherry_myrpalm_kozak, myrpalm_kozak

8-28-2009

BBBing of Insertsequences2.0

Restrictiondigest of flourophores and localisationsequences with SpeI and NheI (1 h, Buffer 2, BSA)
Restrictiondigest of p49 with SpeI and NheI (1 h, Buffer 2, BSA) and SAP (30 min), purification
Nanodrop of digest shows no DNA inside of the samples -- purification was maybe unsuccessful

8-29-2009

BBBing of Insertsequences2.1

Restrictiondigest of flourophores and localisationsequences with SpeI and NheI (1 h, Buffer 2, BSA)
Restrictiondigest of p49 with SpeI and NheI (1 h, Buffer 2, BSA) and SAP (30 min), purification

8-31-2009

BBBing of Insertsequences2.1 (part 2)

Ligation of Insertsequences with restricted p49
Transformation
Outplating -> Wrong resistance

Team:Heidelberg/Notebook modeling