Team:Heidelberg/Notebook modeling

From 2009.igem.org

Notebook HEARTBEAT

Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!

Contents

July

7-27-2009

  • Meeting with Oliver Pelz
    • Discuss general ideas of our Database Structure and Content
    • An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
    • Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
    • We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
    • GUI on wiki: which language? php? javascript?
    • Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)
  • aim: create database until end of August

[TOP]

August

Week Days
Mon Tue Wed Thu Fri Sat Sun
31 - - - - - 1 2
32 3 4 5 6 7 8 9
33 10 11 12 13 14 15 16
34 17 18 19 20 21 22 23
35 24 25 26 27 28 29 30
36 31 - - - - - -

[TOP]

8-3-2009

  • First contact with MySQL
  • Start making an overview of other team's projects
  • Configuring our Virtual Server

8-4-2009

  • Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
  • Start installing developing environment on our internal server
    • GNOME
    • Mediawiki

8-5-2009

  • Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
    • Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
    • select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
    • select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
    • future experimental validation: ChIP-on-Chip
      • for this we need a TFBS-free sequence
    • idea: plot histogram of TFBS relative to TSS
      • problem: choice of sequence: upstream only? inculde downstream?
    • new programming language: R and perl
    • next meeting: Friday after team meeting
  • Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
    • An introduction into PromoterSweep
    • Structure and analysis principles of PromoterSweep
    • Output is stored in an XML file. This means we have to parse the xml code.
    • Oliver Pelz will give help for us in programming
  • Protocol of the meeting can be downloaded from here.
  • Start working with MySQL
  • request UNIX/HUSAR/HPC access at DKFZ (Nao)
  • first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra

8-6-2009

  • Meeting with Oliver Pelz
    • defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms
  • installation of NX server for access onto internal server from Windows
  • configure developing environment (printing from Linux, configure Mediawiki)
  • defining basic concept of database construction
    • we select annotated promoter sequences in DoOP
    • we make a selection of pathway of interest using KEGG
    • narrow down number of target promoter sequences <10000.

8-7-2009

  • Official Team Meeting on Scheduling
  • Meeting with Anna-Lena and Tobias
    • Introduction into R
    • Tobias will give us access to their computing cluster (Group Roland Eils)
    • Promoter Selection: DoOP, EnsEMBL, or UCSC?
  • HUSAR account arrived
  • installation of R, R editor and perl editor
  • further configuration of our internal server / mediawiki
  • writing first perl program - "Hi there"

[TOP]

8-10-2009

  • first contact with R and perl
  • playing around with R and perl
  • playing around with R library: Biobase
  • check working on DKFZ cluster

8-11-2009

  • defining programming languages: perl, R, MySQL
  • retrieving first Promotersweep output files
  • Meeting with Marti
    • ideas for modeling
      • we will have at least three colors which overlap in their spectra.
      • a very nice approach will be Fuzzy Logic Modeling.
      • idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
      • idea 2: create&visualize fancy and fuzzy data from in silico simulation
    • combine: promoter, output and graphic representation
    • next meeting with Marti: end of next week.
  • extract NCBI Entrez Gene IDs with R and perl
  • MAC adresses registered for bioquant network

8-12-2009

  • configure perl working environment
  • study structure of DoOP database
  • download DoOP and load DoOP database into MySQL

8-13-2009

  • trying out some DoOP queries
  • download fasta sequences from UCSC gene browser
  • mapping of NCBI Entrez Gene IDs with RefSeq IDs
  • configure perl working environment on Windows XP
  • contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP

8-14-2009

  • parse UCSC fasta sequences according to our selection
  • write parsed sequences into multifasta format
  • start PromoterSweep Analysis over Weekend

[TOP]

8-18-2009

Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!

  • study outputfile of PromoterSweep. check out general structure and pick up useful information.
  • result is grouped in: General Info, Best Genomic Mapping, Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
  • upon selection, sections of interest will be collected and made ready for entry into MySQL DB
  • discuss table structure of our database
  • How should our database be called? - Brainstorming -
    • SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
    • MAY contain: position, heartbeat, prediction, assembly, eukaryotes
    • and still more keywords to come
  • establishing local@host access to mysql

8-19-2009

  • parse Promotersweep xml file into tab-separated text file
    • the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
    • this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
  • studying structure and basic concepts of hash & key
  • including parsed data into mysql database

8-20-2009

  • pre-decision for our table-structure
    • Table: Main_Info
      • RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
    • Table: Gene_Info
      • Ensembl_ID, Gene Symbol, Gene Description.
    • we go for the RefSeq ID to be the key connecting these two tables.

8-21-2009

  • update script for parsing the Promotersweep output files due to unexpected errors
  • we forgot to include "weak" as a category for the TFBS quality - added!
  • PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
    • We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).
  • search for a tool to use MySQL in R programming environment
  • wiki: write an short article about the German Cancer Research Center (DKFZ)
  • Meeting with Anna-Lena: once we established our database... then
    • two strategies:
      • manually select interesting transcription factors and analyse them using database queries
      • plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
    • we go for both!
    • idea for the future: we can analyze combinatorial appearance of distinct TF pairs
  • We have a name for our database - we call it -


- wait for it -


HEARTBEAT database (Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool)


[TOP]

8-24-2009

  • Meeting with Marti: defining output modeling strategies
    • "exclusive promoters"
      • a model for predicting the behaviour of activation of one, two, three... promoters at the same time.
      • the potential of this model lies in the possibility to model single as well as many pathways in combination and even check for synergistic effects
      • modeling logic: quantitative ODE VS. quantitative & qualitative fuzzy logic
    • "error checking"
      • what to capture/measure: affinity of transcription factor binding to DNA
        • calculate score / reliabilty
        • phenotypic measurement
      • if we have time in the end: model/experiment optimization by wetlab-drylab-rounds (GRAFIK)
      • if we do not have much time: figure out where is catch
    • modeling layers & final visualization
      • (i) capture affinity - (ii) model gene expression - (iii) pathway activity - (iv) fancy visualization (Mathworks Simulink?)
      • plot: time course, dynamic affinity
      • keep in mind the possible high amount of False Positives using promoter search/analysis

8-25-2009

  • official Team Meeting also with Mr. Kai Ludwig (LANGE + PFLANZ) as guest for Logo / Title Claim discussion
  • so far we have 1753 promoter sequences analyzed by PromoterSweep!
  • Meeting with Daniela (Nao): Cell Profiler for capturing biological images & data analysis based on MATLAB
  • working with R module RMySQL for using the pipeline between R and MySQL
  • create a list of useful RMySQL commands

8-26-2009

  • Workflow for plotting histogram - workflow (SOURCE CODE/S?)
    • make MySQL query using R
    • make list of TFs, avoid duplicates using perl
    • pick up each TF (perl/R) and plot histogram (R)
  • create MySQL command list including combinatorial queries

8-27-2009

  • check HEARTBEAT DB for duplicate entries
  • how should we plot the histogram?
    • (a) histogram - how "wide" should be each bin? 100bp? 50bp? 20bp?
    • (b) plot probability density
  • study Transfac PWM (position weight matrices) for
    • difference in consensus sequences (also ask Anna-Lena)
    • different PWM types (vertebrates, plant, insect, fungi, bacteria, nematodes...)
    • positive control: when histograms are generated and plotted, check distribution of Sp1
  • so far we have 3640 promoter sequences "sweeped"!
  • access from R to mysql at the local@host server established

8-28-2009

  • dealing with perl - introduce transition of variables between perl and R

[TOP]

8-31-2009

[TOP]

September

Week Days
Mon Tue Wed Thu Fri Sat Sun
36 - 1 2 3 4 5 6
37 7 8 9 10 11 12 13
38 14 15 16 17 18 19 20
39 21 22 23 24 25 26 27
40 28 29 30 - - - -

[TOP]

9-1-2009

  • derive transcription factor data using R and MySQL
  • plot HEARTBEAT TF hit distribution as histograms & density functions for different PWM subsets (all, vertebrates only, single matrices and joined TFs)
  • further completion of the database

9-2-2009

  • discussion on how to make statistical studies on our gained distributions
    • ideas: define maximum and variance -> Nao
  • look for motif sequences -> Tim
  • we have 4476 sequences analysed by Promotersweep so far!
    • but we are expecting 4700 sequences - check missing ones!

9-3-2009

  • internal team meeting: Tim, Lars, Stephen, Nao
    • select especially interesting TFs
      • criteria: (a) good hits in our distributions; (b) easy experimental handling
      • we go for HIF, SREBP and VDR to analyse and make synthetic promoter design
  • Transfac PWM: there are some annotaion inconveniences of some matrices
  • which "spacer" sequences should we use in order to generate TFBS free sequece parts
  • rational design of synthetic promoters
    • Tim: SREBP, Nao: VDR
    • both go for a total number of 10 sequences
    • strategies:
      • single TFs: search for density maxima
      • check combinatorial appearance and design promoter sequences with multiple binding TFs
    • use spacer sequences generated by Lars and check for TFBS using Transfac
    • sequence length: max. 1000bp
  • back-up idea: if synthesis does not work for a long (~1000bp) sequence then try to work out a protocol for a two-step promoter synthesis combining one empty (TFBS free) sequence with another which consists of many TF and activator binding sites.

9-4-2009

  • work with Transfac PWM: structure, description, and using consensus sequence
  • write script to get the ID's and frequencies for all co-occuring TFBS of VDR and SREBP
  • write script for generating consensus sequence based on Transfac PWM and replacing ambiguity code with A, C, G or T
    Getconsensus.pl, MakeConsensus.pl
  • Wiki Meeting (Nao)
    • Logo choice & modification
    • choose header pics
    • navigation layout
    • develop a catchy, cool homepage

9-5-2009

  • Meeting with Tim, design synthetic promoter sequences
  • check spacer sequence (200bp) for TFBS: one TFBS found; remove it by cutting and shortening the sequence to 190bp)
  • Kid3 is a repressor!

9-6-2009

  • design more synthetic promoter sequences by manual iteration process which consists of (i) TFBS check and (ii) TFBS removal & filling up random sequence
  • aim: creation of an automatic designing tool for synthetic promoters which include sequence design, transfac search as well as filling the sequence up with spacer sequences.


[TOP]

9-7-2009

  • check designed sequences for restriction sites
    CheckRestrictionsites.pl
  • finish creating sequences
  • consider CMV core promoter into the calculation of the relative position of TFBS to the TSS
  • create sequences for negative control
    • pure TFBS free sequence
    • sequences with TFBS at minima of the density function
  • checking for all sequences for further binding sites with the Transfac match tool

9-8-2009

  • check restriction sites for reverse complementary strand
  • add flanking sites with restriction sites and spacer nucleotides to our designed sequences
  • is there any possibility to automatize Transfac queries?
  • work with combined / joined MySQL query structures
  • or solve this process by simply writing new temporary tables?
  • workflow summary (short) for manual designing of a synthetic promoter:
    • (A) use random sequence
    • (B) check TF-matrices
    • (C) validate TFs (mouse? human? repressor?)
    • (D) check Transfac and restriction sites
  • Phone conference with Kai Ludwig, Logo & Web Design (Nao)
  • official Team Meeting
  • wiki closure on Oct 21st!

9-9-2009

  • modify synthetic promoter sequences to be ready for ordering
  • Sweep more promoter sequences using Promotersweep
  • start Modeling
  • revise and improve HEARTBEAT
  • discuss differences between PWMs

9-10-2009

  • still modifying synthetic sequences to be ready for shipping
  • we have altogether 25 designed promoter sequences!

9-11-2009

  • Software Meeting (Stephen, Tim, Nao)
    • compartibility with mediawiki: HTML, perl, php, R, java?
    • GUI design
      • simple interface: single TF, auxiliary TFs, #TFBS, sequence length
      • "interactive": multiple TF, choosing auxiliary TFs, additional information (see Eukaryopedia), density function plot & histogram
      • "hyper-interactive" step-by-step design & creation
  • Modeling Meeting with Marti and Anna-Lena (Tim, Nao)
    • aim: fancy visualization to show expectation & prediction providing pathway insights
    • TODO/QUESTIONS
      • what is the stimulus? collect possible inputs!
      • measurable outcome: experiments & pathways
      • quality of synthetic sequence: error checking
        • we need to define the quality of our sequences
    • LEVELS of modeling
      • (1) DNA (2) expression/transcriptional activity (3) output
      • each with corresponding measurement
  • general modeling scheme: input - "What we are affecting" - possible outcomes
  • how? We use fuzzy logic

[TOP]

9-14-2009

  • collect input for inducing the system (e.g. p53: CPT, Pifithrin-alpha; NFkB: TNF-alpha etc.)
  • phone conference with Kai Ludwig
  • learn how to include Perl code into html code
    • learn how to use embperl
    • configure apache2 server such that embperl can be interpreted
    • try to make offline use of embperl working
  • try to find nice html editor for ubuntu - (seamonkey, Amaya)

9-15-2009

  • create network picture for meeting tomorrow
  • Logo discussion
  • Read paper: Fuzzy Logic Modeling of Signaling Networks (Aldridge 2009)
  • learn data management of virtual server
  • get an overview about the apache2 file and security system

9-16-2009

  • Modeling Meeting with Marti (Douaa, Tim, Nao)
    • update on available drugs/sequences
    • decide what to model: (A) error checking, and (B) differential expression?
    • use natural promoters to build up model for prediction of activity of synthetic promoters
    • Discussion of TF score
      • Transfac sequence alignment score
      • promotersweep binding site quality
      • relative position to TSS: How?
        • (A) peak width & amplitude, (B) distance to maximal peak & position, (C) number of PEAK, (D) "sliding window" and calculate area under curve, (E) #TFBS (also for comparison of different synthetic promoters)
      • biophysical affinity using TRAP
    • first model: build up either on CMV or on JeT
    • potential: integrate many stimuli -> find out crosstalks of pathways?
  • TODO (meeting)
    • collect data
    • define WHAT we want to model
    • summarize available sequences
    • try to formulate IF ... THEN "sentences"
    • check MATLAB & MATLAB Fuzzy Logic Toolbox availability

9-17-2009

  • internal Team Meeting
  • find error.log files on the server and learn how to use it

9-18-2009

[TOP]

9-20-2009

  • learn how to use tag language of embperl
    • learn how to write loops with embperl
    • access of input variables in embperl -- using the %fdat hash

9-21-2009

  • struggling with how to use R from embperl

9-22-2009

  • Wiki Meeting (Dani, Cori, Nao)
    • install image processing tool
    • design wiki, brainstorming for possible navigation bars
  • Wiki Phone Meeting with Kai Ludwig (Nao)
    • design header & presentation-master as well as team shirts
  • Seminar: Martijn Luijsterburg (Karolinska Institute) - Heterochromatin Protein 1 is involved in the DNA damage response. Host: Thomas Höfer, Bioquant

9-23-2009

  • Modeling Meeting with Marti, Anna-Lena (Tim, Nao)
    • contact database group (TP3)
    • statistics: characterizing peaks
      • we go for area under the curve and affinity. optionally we can choose Transfac sequence score and peak height & width
    • strategy to convince the wetlab people from the importance of modeling during the meeting on upcoming friday.
    • MATLAB license?
    • logical gates: try to start creating model topology after Friday
  • Presentation: Marti Bernado Faura (Bioquant, University of Heidelberg): Data-driven Fuzzy Logic modeling of Programmed Cell Death
    • intro into fuzzy logic
    • system development & work flow of fuzzy logic
    • fuzzy inference & model prediction
    • model types: MISO / MIMO
  • Wrap-up meeting: Team HEARTBEAT (Tim, Nao)
    • split up computational work into three tracks: HEARTBEAT DB, HEARTBEAT GUI and modeling
      • database: documentation (until Oct 18), peak characterization, calculate absolute density function
      • GUI: based on embperl, design according to our new wiki
      • modeling: MATLAB license, collect sequences & input data, develop network model, include pathways
  • literature work

9-24-2009

  • prepare slides for meeting tomorrow
  • pathway search: TNF-alpha/NFkB, VDR, SREBP and crosstalks. NFkB has a lot of pathway crosstalks, while SREBP and VDR show a interesting connection. Upon induction, SREBP activates VDR.

9-25-2009

  • Team Meeting (Wetlab, Nao)
    • short progress report of all of us
    • modeling: discussing scheme, modeling elements and strategies


[TOP]

9-28-2009

  • Wiki Phone Meeting with Kai Ludwig (Nao)

9-29-2009

  • designed synthetic promoters (HB_0001 - HB_0025) will be joined to CMV core promoter since JeT core promoter contains a Sp1 site in it. All other sequences (random synthesized, e.g.) are coupled with JeT core promoter.
  • literature studies on combinatorial cis-regulation as well as on modelig of the lambda-switch
  • prepare slides for the next modeling meeting

9-30-2009

  • Wiki Meeting (Dani, Nao)
  • MATLAB license order (Jens)
  • postpone Yara meeting (Wetlab, Tim)
  • got sequences from Lars
  • got qRT-PCR setup from Chenchen
  • Modeling Meeting with Marti & Anna-Lena (Tim, Nao)
    • still need to collect FACS and microscopy results
    • discuss our network prediction model using TNF-alpha as an example
    • maybe we can use the lambda switch paper as a good starting point for our modeling

[TOP]

October

Week Days
Mon Tue Wed Thu Fri Sat Sun
40 - - - 1 2 3 4
41 5 6 7 8 9 10 11
42 12 13 14 15 16 17 18
43 19 20 21 22 23 24 25
44 26 27 28 29 30 31 -

10-1-2009

  • Wiki & Presentation Meeting with Dani (Nao)

10-2-2009

  • some wiki work

[TOP]

10-5-2009

10-6-2009

  • Internal Team Meeting
    • check out number and measurement plans of randomly assembled synthetic promoters (5x NFkB, 5x p53, 2x pPARg, 2x SREBP)
  • Wiki Meeting (Corinna, Daniela, Nao)
    • discuss design of the top page and possible features
    • try out CSS design

10-7-2009

  • Wiki Design (Nao)
  • Wiki Phone Meeting with Kai Ludwig (Nao)
  • MATLAB has arrived!
  • literature work
  • Wetlab Meeting: progress report on measurement of random assembled synthetic promoters
  • make thoughts about the whole storyboard of our presentation at the jamboree

10-8-2009

  • Short Meeting with Roland
  • image processing work for wiki

10-9-2009

[TOP]

10-12-2009

10-13-2009

  • Measurement discussion with Lars: REU/RMPU, defining equations for mammalian systems
  • literature work on PoPS paper (Kelly JR et al.) and apply their equations
  • Marti Modeling Meeting (Anna-Lena, Tim, Nao)
    • Journal Club (Tim, Nao)
    • summary of meeting from Team Meeting from last thursday
  • Marti: start modeling using MATLAB and Fuzzy Logic Toolbox (FLT), playing around with FLT and tutorial

10-14-2009

Nao

  • develop first test fuzzy inference system (FIS) for testing
  • Marti Modeling Meeting, specify model topology
  • collect data: FACS (Cori), Microscopy (Hannah), Sequence & TECAN (Lars)
  • start calculating position score using R
  • translating project abstract

10-15-2009

Nao

  • calculate affinity score using TRAP (Anna-Lena)
  • collect ideas for integrating TFwise scores in order to calculate final position/affinity score for one sequence: median, mean, maximum, weighted mean?
  • all data analysis is stored in three sheets (SequenceAnalysis, ResultSummary and CalculateTRAP)
  • from now on we concentrate on FACS measurements because they are the most reliable ones (TECAN used only for scanning)
  • fill up TRAP data with missing transcription factors

10-16-2009

Nao

  • Anna-Lena Meeting: discuss how to integrate sequence scores
  • get & check p53, pPARg and random SREBP sequences
  • go through FACS results
  • add HEARTBEAT sequences for data analysis
  • modeling documentation
  • parsing experimental setups for modeling use
  • Chenchen qRT-PCR results
  • define possible modeling layers
    • first layer: input
      • drug type, pathway, drug mode of action, drug concentration, targeted cells, incubation time
      • sequence type, position score, affinity score
      • we choose position & affinity score, sequence type and the presence of stimulation. Time as well as different concentration (unfortunately no data available) can be added in future
    • second layer: promoters
      • 6 constitutives, 3 standards, 6 inducible available
      • data analysis narrows this to 5 constitutives, 3 standards and 4 inducible
      • HEARTBEAT sequences have to be measured a.s.a.p.
  • Marti Modeling Meeting
    • try to define some fuzzy rules
    • we assume better binding -> better expression
    • define membership functions
    • start modeling with NFkB results

All

  • Internal Team Meeting
    • reminder: wiki task, wiki to do
  • Official Team Meeting

10-17-2009

  • final decision: we go for maximum of position and affinity score
  • added HB sequences for data analysis table; as soon as results are there we can model designed synthetic promoters
  • define shape of membership functions
  • literature search for missing activity values?
  • still TODO: check out p53 results since the p53-NFkB crosstalk is really interesting!

10-18-2009

Nao

  • SREBP/VDR paper arrived
  • finish data analysis
  • study & playing around with MATLAB FLT, programming from both FLT GUI and MATLAB command line
  • define our work to be (i) error checking and (ii) exclusive pathway modeling
  • high potential of this model lies in its plug'n'play structrue, with a high capacity of integrating more inputs, outputs and also the middle layer (promoter diversity)

[TOP]

10-19-2009

Nao

  • define final network structure
  • wiki work
  • reading RFC documentation and correction
  • we call this project HEARTBEAT fuzzy network (FN)
  • HB FN documentation and first results!
    • creating two fuzzy controllers: inducible NFkB and constitutive
  • how do we integrate the data? combine via Simulink!

10-20-2009

  • Creating, developing, integrating and combining fuzzy network modeling (MATLAB, Simulink)
  • first analysis of HB sequences
  • HEARTBEAT FN documentation

10-21-2009

10-22-2009

  • FROZEN WIKI!!!!

[TOP]