Team:Heidelberg/Notebook modeling

{| = Notebook HEARTBEAT = Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!
 * -valign="top" border="0" style="margin-left: 2px;"
 * width="650px" style="padding: 0 15px 15px 20px; background-color:#ede8e2"|

Contents

 * July


 * August


 * September


 * October

7-27-2009

 * Meeting with Oliver Pelz
 * Discuss general ideas of our Database Structure and Content
 * An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
 * Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
 * We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
 * GUI on wiki: which language? php? javascript?
 * Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)


 * aim: create database until end of August

[TOP]

August
[TOP]

8-3-2009

 * First contact with MySQL
 * Start making an overview of other team's projects
 * Configuring our Virtual Server

8-4-2009

 * Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
 * Start installing developing environment on our internal server
 * GNOME
 * Mediawiki

8-5-2009

 * Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
 * Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
 * select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
 * select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
 * future experimental validation: ChIP-on-Chip
 * for this we need a TFBS-free sequence
 * idea: plot histogram of TFBS relative to TSS
 * problem: choice of sequence: upstream only? inculde downstream?
 * new programming language: R and perl
 * next meeting: Friday after team meeting


 * Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
 * An introduction into PromoterSweep
 * Structure and analysis principles of PromoterSweep
 * Output is stored in an XML file. This means we have to parse the xml code.
 * Oliver Pelz will give help for us in programming


 * Protocol of the meeting can be downloaded | from here.


 * Start working with MySQL
 * request UNIX/HUSAR/HPC access at DKFZ (Nao)
 * first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra

8-6-2009

 * Meeting with Oliver Pelz
 * defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms


 * installation of NX server for access onto internal server from Windows
 * configure developing environment (printing from Linux, configure Mediawiki)
 * defining basic concept of database construction
 * we select annotated promoter sequences in DoOP
 * we make a selection of pathway of interest using KEGG
 * narrow down number of target promoter sequences <10000.

8-7-2009

 * Official Team Meeting on Scheduling
 * Meeting with Anna-Lena and Tobias
 * Introduction into R
 * Tobias will give us access to their computing cluster (Group Roland Eils)
 * Promoter Selection: DoOP, EnsEMBL, or UCSC?


 * HUSAR account arrived
 * installation of R, R editor and perl editor
 * further configuration of our internal server / mediawiki
 * writing first perl program - "Hi there"

[TOP]

8-10-2009

 * first contact with R and perl
 * playing around with R and perl
 * playing around with R library: Biobase
 * check working on DKFZ cluster

8-11-2009

 * defining programming languages: perl, R, MySQL
 * retrieving first Promotersweep output files


 * Meeting with Marti
 * ideas for modeling
 * we will have at least three colors which overlap in their spectra.
 * a very nice approach will be Fuzzy Logic Modeling.
 * idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
 * idea 2: create&visualize fancy and fuzzy data from in silico'' simulation
 * combine: promoter, output and graphic representation
 * next meeting with Marti: end of next week.


 * extract NCBI Entrez Gene IDs with R and perl
 * MAC adresses registered for bioquant network

8-12-2009

 * configure perl working environment
 * study structure of DoOP database
 * download DoOP and load DoOP database into MySQL

8-13-2009

 * trying out some DoOP queries
 * download fasta sequences from UCSC gene browser
 * mapping of NCBI Entrez Gene IDs with RefSeq IDs
 * configure perl working environment on Windows XP
 * contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP

8-14-2009

 * parse UCSC fasta sequences according to our selection
 * write parsed sequences into multifasta format
 * start PromoterSweep Analysis over Weekend

[TOP]

8-18-2009
Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!


 * study outputfile of PromoterSweep. check out general structure and pick up useful information.
 * result is grouped in: General Info, Best Genomic Mapping,  Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
 * upon selection, sections of interest will be collected and made ready for entry into MySQL DB
 * discuss table structure of our database


 * How should our database be called? - Brainstorming -
 * SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
 * MAY contain: position, heartbeat, prediction, assembly, eukaryotes
 * and still more keywords to come
 * establishing local@host access to mysql

8-19-2009

 * parse Promotersweep xml file into tab-separated text file
 * the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
 * this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
 * studying structure and basic concepts of hash & key
 * including parsed data into mysql database

8-20-2009

 * pre-decision for our table-structure
 * Table: Main_Info
 * RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
 * Table: Gene_Info
 * Ensembl_ID, Gene Symbol, Gene Description.
 * we go for the RefSeq ID to be the key connecting these two tables.

8-21-2009

 * update script for parsing the Promotersweep output files due to unexpected errors
 * we forgot to include "weak" as a category for the TFBS quality - added!
 * PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
 * We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).


 * order | Matlab iGEM licence


 * search for a tool to use MySQL in R programming environment
 * wiki: write an short article about the German Cancer Research Center (DKFZ)


 * Meeting with Anna-Lena: once we established our database... then
 * two strategies:
 * manually select interesting transcription factors and analyse them using database queries
 * plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
 * we go for both!
 * idea for the future: we can analyze combinatorial appearance of distinct TF pairs


 * We have a name for our database - we call it -

- wait for it -

HEARTBEAT database (''Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool) [TOP]

8-24-2009

 * Meeting with Marti: defining output modeling strategies
 * "exclusive promoters"
 * a model for predicting the behaviour of activation of one, two, three... promoters at the same time.
 * the potential of this model lies in the possibility to model single as well as many pathways in combination and even check for synergistic effects
 * modeling logic: quantitative ODE VS. quantitative & qualitative fuzzy logic
 * "error checking"
 * what to capture/measure: affinity of transcription factor binding to DNA
 * calculate score / reliabilty
 * phenotypic measurement
 * if we have time in the end: model/experiment optimization by wetlab-drylab-rounds (GRAFIK)
 * if we do not have much time: figure out where is catch
 * modeling layers & final visualization
 * (i) capture affinity - (ii) model gene expression - (iii) pathway activity - (iv) fancy visualization (Mathworks Simulink?)
 * plot: time course, dynamic affinity
 * keep in mind the possible high amount of False Positives using promoter search/analysis

8-25-2009

 * official Team Meeting also with Mr. Kai Ludwig (LANGE + PFLANZ) as guest for Logo / Title Claim discussion


 * so far we have 1753 promoter sequences analyzed by PromoterSweep!


 * Meeting with Daniela (Nao): Cell Profiler for capturing biological images & data analysis based on MATLAB


 * working with R module RMySQL for using the pipeline between R and MySQL
 * create a list of useful RMySQL commands

8-26-2009

 * Workflow for plotting histogram - workflow (SOURCE CODE/S?)
 * make MySQL query using R
 * make list of TFs, avoid duplicates using perl
 * pick up each TF (perl/R) and plot histogram (R)


 * create MySQL command list including combinatorial queries

8-27-2009

 * check HEARTBEAT DB for duplicate entries
 * how should we plot the histogram?
 * (a) histogram - how "wide" should be each bin? 100bp? 50bp? 20bp?
 * (b) plot probability density
 * study Transfac PWM (position weight matrices) for
 * difference in consensus sequences (also ask Anna-Lena)
 * different PWM types (vertebrates, plant, insect, fungi, bacteria, nematodes...)
 * positive control: when histograms are generated and plotted, check distribution of Sp1


 * so far we have 3640 promoter sequences "sweeped"!


 * access from R to mysql at the local@host server established

8-28-2009

 * dealing with perl - introduce transition of variables between perl and R

[TOP]

8-31-2009
[TOP]

September
[TOP]

9-1-2009

 * derive transcription factor data using R and MySQL
 * plot HEARTBEAT TF hit distribution as histograms & density functions for different PWM subsets (all, vertebrates only, single matrices and joined TFs)
 * further completion of the database

9-2-2009

 * discussion on how to make statistical studies on our gained distributions
 * ideas: define maximum and variance -> Nao
 * look for motif sequences -> Tim


 * we have 4476 sequences analysed by Promotersweep so far!
 * but we are expecting 4700 sequences - check missing ones!

9-3-2009

 * internal team meeting: Tim, Lars, Stephen, Nao
 * select especially interesting TFs
 * criteria: (a) good hits in our distributions; (b) easy experimental handling
 * we go for HIF, SREBP and VDR to analyse and make synthetic promoter design
 * Transfac PWM: there are some annotaion inconveniences of some matrices
 * which "spacer" sequences should we use in order to generate TFBS free sequece parts


 * rational design of synthetic promoters
 * Tim: SREBP, Nao: VDR
 * both go for a total number of 10 sequences
 * strategies:
 * single TFs: search for density maxima
 * check combinatorial appearance and design promoter sequences with multiple binding TFs
 * use spacer sequences generated by Lars and check for TFBS using Transfac
 * sequence length: max. 1000bp


 * back-up idea: if synthesis does not work for a long (~1000bp) sequence then try to work out a protocol for a two-step promoter synthesis combining one empty (TFBS free) sequence with another which consists of many TF and activator binding sites.

9-4-2009

 * work with Transfac PWM: structure, description, and using consensus sequence
 * write script to get the ID's and frequencies for all co-occuring TFBS of VDR and SREBP
 * write script for generating consensus sequence based on Transfac PWM and replacing ambiguity code with A, C, G or T Getconsensus.pl, MakeConsensus.pl


 * Wiki Meeting (Nao)
 * Logo choice & modification
 * choose header pics
 * navigation layout
 * develop a catchy, cool homepage

9-5-2009

 * Meeting with Tim, design synthetic promoter sequences
 * check spacer sequence (200bp) for TFBS: one TFBS found; remove it by cutting and shortening the sequence to 190bp)
 * Kid3 is a repressor!

9-6-2009

 * design more synthetic promoter sequences by manual iteration process which consists of (i) TFBS check and (ii) TFBS removal & filling up random sequence


 * aim: creation of an automatic designing tool for synthetic promoters which include sequence design, transfac search as well as filling the sequence up with spacer sequences.

[TOP]

9-7-2009

 * check designed sequences for restriction sites CheckRestrictionsites.pl
 * finish creating sequences
 * consider CMV core promoter into the calculation of the relative position of TFBS to the TSS
 * create sequences for negative control
 * pure TFBS free sequence
 * sequences with TFBS at minima of the density function
 * checking for all sequences for further binding sites with the Transfac match tool

9-8-2009

 * check restriction sites for reverse complementary strand
 * add flanking sites with restriction sites and spacer nucleotides to our designed sequences
 * is there any possibility to automatize Transfac queries?
 * work with combined / joined MySQL query structures
 * or solve this process by simply writing new temporary tables?


 * workflow summary (short) for manual designing of a synthetic promoter:
 * (A) use random sequence
 * (B) check TF-matrices
 * (C) validate TFs (mouse? human? repressor?)
 * (D) check Transfac and restriction sites


 * Phone conference with Kai Ludwig, Logo & Web Design (Nao)


 * official Team Meeting


 * wiki closure on Oct 21st!

9-9-2009

 * modify synthetic promoter sequences to be ready for ordering
 * Sweep more promoter sequences using Promotersweep
 * start Modeling
 * revise and improve HEARTBEAT
 * discuss differences between PWMs

9-10-2009

 * still modifying synthetic sequences to be ready for shipping
 * we have altogether 25 designed promoter sequences!

9-11-2009

 * Software Meeting (Stephen, Tim, Nao)
 * compartibility with mediawiki: HTML, perl, php, R, java?
 * GUI design
 * simple interface: single TF, auxiliary TFs, #TFBS, sequence length
 * "interactive": multiple TF, choosing auxiliary TFs, additional information (see Eukaryopedia), density function plot & histogram
 * "hyper-interactive" step-by-step design & creation


 * Modeling Meeting with Marti and Anna-Lena (Tim, Nao)
 * aim: fancy visualization to show expectation & prediction providing pathway insights
 * TODO/QUESTIONS
 * what is the stimulus? collect possible inputs!
 * measurable outcome: experiments & pathways
 * quality of synthetic sequence: error checking
 * we need to define the quality of our sequences
 * LEVELS of modeling
 * (1) DNA (2) expression/transcriptional activity (3) output
 * each with corresponding measurement


 * general modeling scheme: input - "What we are affecting" - possible outcomes
 * how? We use fuzzy logic

[TOP]

9-14-2009

 * collect input for inducing the system (e.g. p53: CPT, Pifithrin-alpha; NFkB: TNF-alpha etc.)
 * phone conference with Kai Ludwig
 * learn how to include Perl code into html code
 * learn how to use embperl
 * configure apache2 server such that embperl can be interpreted
 * try to make offline use of embperl working
 * try to find nice html editor for ubuntu - (seamonkey, Amaya)

9-15-2009

 * create network picture for meeting tomorrow
 * Logo discussion
 * Read paper: Fuzzy Logic Modeling of Signaling Networks (Aldridge 2009)
 * learn data management of virtual server
 * get an overview about the apache2 file and security system

9-16-2009

 * Modeling Meeting with Marti (Douaa, Tim, Nao)
 * update on available drugs/sequences
 * decide what to model: (A) error checking, and (B) differential expression?
 * use natural promoters to build up model for prediction of activity of synthetic promoters
 * Discussion of TF score
 * Transfac sequence alignment score
 * promotersweep binding site quality
 * relative position to TSS: How?
 * (A) peak width & amplitude, (B) distance to maximal peak & position, (C) number of PEAK, (D) "sliding window" and calculate area under curve, (E) #TFBS (also for comparison of different synthetic promoters)
 * biophysical affinity using TRAP
 * first model: build up either on CMV or on JeT
 * potential: integrate many stimuli -> find out crosstalks of pathways?


 * TODO (meeting)
 * collect data
 * define WHAT we want to model
 * summarize available sequences
 * try to formulate IF ... THEN "sentences"
 * check MATLAB & MATLAB Fuzzy Logic Toolbox availability

9-17-2009

 * internal Team Meeting
 * find error.log files on the server and learn how to use it

9-18-2009
[TOP]

9-20-2009

 * learn how to use tag language of embperl
 * learn how to write loops with embperl
 * access of input variables in embperl -- using the %fdat hash

9-21-2009

 * struggling with how to use R from embperl

9-22-2009

 * Wiki Meeting (Dani, Cori, Nao)
 * install image processing tool
 * design wiki, brainstorming for possible navigation bars
 * Wiki Phone Meeting with Kai Ludwig (Nao)
 * design header & presentation-master as well as team shirts


 * Seminar: Martijn Luijsterburg (Karolinska Institute) - Heterochromatin Protein 1 is involved in the DNA damage response. Host: Thomas Höfer, Bioquant

9-23-2009

 * Modeling Meeting with Marti, Anna-Lena (Tim, Nao)
 * contact database group (TP3)
 * statistics: characterizing peaks
 * we go for area under the curve and affinity. optionally we can choose Transfac sequence score and peak height & width
 * strategy to convince the wetlab people from the importance of modeling during the meeting on upcoming friday.
 * MATLAB license?
 * logical gates: try to start creating model topology after Friday


 * Presentation: Marti Bernado Faura (Bioquant, University of Heidelberg): Data-driven Fuzzy Logic modeling of Programmed Cell Death
 * intro into fuzzy logic
 * system development & work flow of fuzzy logic
 * fuzzy inference & model prediction
 * model types: MISO / MIMO


 * Wrap-up meeting: Team HEARTBEAT (Tim, Nao)
 * split up computational work into three tracks: HEARTBEAT DB, HEARTBEAT GUI and modeling
 * database: documentation (until Oct 18), peak characterization, calculate absolute density function
 * GUI: based on embperl, design according to our new wiki
 * modeling: MATLAB license, collect sequences & input data, develop network model, include pathways


 * literature work

9-24-2009

 * prepare slides for meeting tomorrow
 * pathway search: TNF-alpha/NFkB, VDR, SREBP and crosstalks. NFkB has a lot of pathway crosstalks, while SREBP and VDR show a interesting connection. Upon induction, SREBP activates VDR.

9-25-2009

 * Team Meeting (Wetlab, Nao)
 * short progress report of all of us
 * modeling: discussing scheme, modeling elements and strategies

[TOP]

9-28-2009

 * Wiki Phone Meeting with Kai Ludwig (Nao)

9-29-2009

 * designed synthetic promoters (HB_0001 - HB_0025) will be joined to CMV core promoter since JeT core promoter contains a Sp1 site in it. All other sequences (random synthesized, e.g.) are coupled with JeT core promoter.
 * literature studies on combinatorial cis-regulation as well as on modelig of the lambda-switch
 * prepare slides for the next modeling meeting

9-30-2009

 * Wiki Meeting (Dani, Nao)
 * MATLAB license order (Jens)
 * postpone Yara meeting (Wetlab, Tim)


 * got sequences from Lars
 * got qRT-PCR setup from Chenchen


 * Modeling Meeting with Marti & Anna-Lena (Tim, Nao)
 * still need to collect FACS and microscopy results
 * discuss our network prediction model using TNF-alpha as aｎ example
 * maybe we can use the lambda switch paper as a good starting point for our modeling

[TOP]

10-1-2009

 * Wiki & Presentation Meeting with Dani (Nao)

10-2-2009

 * some wiki work

[TOP]

10-6-2009

 * Internal Team Meeting
 * check out number and measurement plans of randomly assembled synthetic promoters (5x NFkB, 5x p53, 2x pPARg, 2x SREBP)


 * Wiki Meeting (Corinna, Daniela, Nao)
 * discuss design of the top page and possible features
 * try out CSS design

10-7-2009

 * Wiki Design (Nao)
 * Wiki Phone Meeting with Kai Ludwig (Nao)


 * MATLAB has arrived!
 * literature work


 * Wetlab Meeting: progress report on measurement of random assembled synthetic promoters
 * make thoughts about the whole storyboard of our presentation at the jamboree

10-8-2009

 * Short Meeting with Roland
 * image processing work for wiki

10-9-2009
[TOP]

10-13-2009

 * Measurement discussion with Lars: REU/RMPU, defining equations for mammalian systems
 * literature work on PoPS paper (Kelly JR et al.) and apply their equations


 * Marti Modeling Meeting (Anna-Lena, Tim, Nao)
 * Journal Club (Tim, Nao)
 * summary of meeting from Team Meeting from last thursday


 * Marti: start modeling using MATLAB and Fuzzy Logic Toolbox (FLT), playing around with FLT and tutorial

10-14-2009
Nao
 * develop first test fuzzy inference system (FIS) for testing
 * Marti Modeling Meeting, specify model topology
 * collect data: FACS (Cori), Microscopy (Hannah), Sequence & TECAN (Lars)
 * start calculating position score using R
 * translating project abstract

10-15-2009
Nao
 * calculate affinity score using TRAP (Anna-Lena)
 * collect ideas for integrating TFwise scores in order to calculate final position/affinity score for one sequence: median, mean, maximum, weighted mean?
 * all data analysis is stored in three sheets (SequenceAnalysis, ResultSummary and CalculateTRAP)
 * from now on we concentrate on FACS measurements because they are the most reliable ones (TECAN used only for scanning)
 * fill up TRAP data with missing transcription factors

10-16-2009
Nao
 * Anna-Lena Meeting: discuss how to integrate sequence scores
 * get & check p53, pPARg and random SREBP sequences
 * go through FACS results
 * add HEARTBEAT sequences for data analysis
 * modeling documentation
 * parsing experimental setups for modeling use
 * Chenchen qRT-PCR results


 * define possible modeling layers
 * first layer: input
 * drug type, pathway, drug mode of action, drug concentration, targeted cells, incubation time
 * sequence type, position score, affinity score
 * we choose position & affinity score, sequence type and the presence of stimulation. Time as well as different concentration (unfortunately no data available) can be added in future
 * second layer: promoters
 * 6 constitutives, 3 standards, 6 inducible available
 * data analysis narrows this to 5 constitutives, 3 standards and 4 inducible
 * HEARTBEAT sequences have to be measured a.s.a.p.


 * Marti Modeling Meeting
 * try to define some fuzzy rules
 * we assume better binding -> better expression
 * define membership functions
 * start modeling with NFkB results

All
 * Internal Team Meeting
 * reminder: wiki task, wiki to do
 * Official Team Meeting

10-17-2009

 * final decision: we go for maximum of position and affinity score
 * added HB sequences for data analysis table; as soon as results are there we can model designed synthetic promoters
 * define shape of membership functions
 * literature search for missing activity values?
 * still TODO: check out p53 results since the p53-NFkB crosstalk is really interesting!

10-18-2009
Nao [TOP]
 * SREBP/VDR paper arrived
 * finish data analysis
 * study & playing around with MATLAB FLT, programming from both FLT GUI and MATLAB command line
 * define our work to be (i) error checking and (ii) exclusive pathway modeling
 * high potential of this model lies in its plug'n'play structrue, with a high capacity of integrating more inputs, outputs and also the middle layer (promoter diversity)

10-19-2009
Nao
 * define final network structure
 * wiki work
 * reading RFC documentation and correction
 * we call this project HEARTBEAT fuzzy network (FN)
 * HB FN documentation and first results!
 * creating two fuzzy controllers: inducible NFkB and constitutive
 * how do we integrate the data? combine via Simulink!

10-20-2009

 * Creating, developing, integrating and combining fuzzy network modeling (MATLAB, Simulink)
 * first analysis of HB sequences
 * HEARTBEAT FN documentation

10-22-2009

 * FROZEN WIKI!!!!

[TOP]


 * width="250px" style="padding: 0 20px 15px 15px; background-color:#d8d5d0"|


 * }