Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!
- Meeting with Oliver Pelz
- Discuss general ideas of our Database Structure and Content
- An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
- Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
- We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
- Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)
- aim: create database until end of August
- First contact with MySQL
- Start making an overview of other team's projects
- Configuring our Virtual Server
- Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
- Start installing developing environment on our internal server
- Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
- Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
- select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
- select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
- future experimental validation: ChIP-on-Chip
- for this we need a TFBS-free sequence
- idea: plot histogram of TFBS relative to TSS
- problem: choice of sequence: upstream only? inculde downstream?
- new programming language: R and perl
- next meeting: Friday after team meeting
- Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
- An introduction into PromoterSweep
- Structure and analysis principles of PromoterSweep
- Output is stored in an XML file. This means we have to parse the xml code.
- Oliver Pelz will give help for us in programming
- Protocol of the meeting can be downloaded from here.
- Start working with MySQL
- request UNIX/HUSAR/HPC access at DKFZ (Nao)
- first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra
- Meeting with Oliver Pelz
- defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms
- installation of NX server for access onto internal server from Windows
- configure developing environment (printing from Linux, configure Mediawiki)
- defining basic concept of database construction
- we select annotated promoter sequences in DoOP
- we make a selection of pathway of interest using KEGG
- narrow down number of target promoter sequences <10000.
- Official Team Meeting on Scheduling
- Meeting with Anna-Lena and Tobias
- Introduction into R
- Tobias will give us access to their computing cluster (Group Roland Eils)
- Promoter Selection: DoOP, EnsEMBL, or UCSC?
- HUSAR account arrived
- installation of R, R editor and perl editor
- further configuration of our internal server / mediawiki
- writing first perl program - "Hi there"
- first contact with R and perl
- playing around with R and perl
- playing around with R library: Biobase
- check working on DKFZ cluster
- defining programming languages: perl, R, MySQL
- retrieving first Promotersweep output files
- Meeting with Marti
- ideas for modeling
- we will have at least three colors which overlap in their spectra.
- a very nice approach will be Fuzzy Logic Modeling.
- idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
- idea 2: create&visualize fancy and fuzzy data from in silico simulation
- combine: promoter, output and graphic representation
- next meeting with Marti: end of next week.
- extract NCBI Entrez Gene IDs with R and perl
- MAC adresses registered for bioquant network
- configure perl working environment
- study structure of DoOP database
- download DoOP and load DoOP database into MySQL
- trying out some DoOP queries
- download fasta sequences from UCSC gene browser
- mapping of NCBI Entrez Gene IDs with RefSeq IDs
- configure perl working environment on Windows XP
- contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP
- parse UCSC fasta sequences according to our selection
- write parsed sequences into multifasta format
- start PromoterSweep Analysis over Weekend
Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!
- study outputfile of PromoterSweep. check out general structure and pick up useful information.
- result is grouped in: General Info, Best Genomic Mapping, Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
- upon selection, sections of interest will be collected and made ready for entry into MySQL DB
- discuss table structure of our database
- How should our database be called? - Brainstorming -
- SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
- MAY contain: position, heartbeat, prediction, assembly, eukaryotes
- and still more keywords to come
- establishing local@host access to mysql
- parse Promotersweep xml file into tab-separated text file
- the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
- this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
- studying structure and basic concepts of hash & key
- including parsed data into mysql database
- pre-decision for our table-structure
- Table: Main_Info
- RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
- Table: Gene_Info
- Ensembl_ID, Gene Symbol, Gene Description.
- we go for the RefSeq ID to be the key connecting these two tables.
- update script for parsing the Promotersweep output files due to unexpected errors
- we forgot to include "weak" as a category for the TFBS quality - added!
- PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
- We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).
- search for a tool to use MySQL in R programming environment
- wiki: write an short article about the German Cancer Research Center (DKFZ)
- Meeting with Anna-Lena: once we established our database... then
- two strategies:
- manually select interesting transcription factors and analyse them using database queries
- plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
- we go for both!
- idea for the future: we can analyze combinatorial appearance of distinct TF pairs
- We have a name for our database - we call it -
- wait for it -
HEARTBEAT database (Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool)
- Meeting with Marti: defining output modeling strategies
- "exclusive promoters"
- a model for predicting the behaviour of activation of one, two, three... promoters at the same time.
- the potential of this model lies in the possibility to model single as well as many pathways in combination and even check for synergistic effects
- modeling logic: quantitative ODE VS. quantitative & qualitative fuzzy logic
- "error checking"
- what to capture/measure: affinity of transcription factor binding to DNA
- calculate score / reliabilty
- phenotypic measurement
- if we have time in the end: model/experiment optimization by wetlab-drylab-rounds (GRAFIK)
- if we do not have much time: figure out where is catch
- modeling layers & final visualization
- (i) capture affinity - (ii) model gene expression - (iii) pathway activity - (iv) fancy visualization (Mathworks Simulink?)
- plot: time course, dynamic affinity
- keep in mind the possible high amount of False Positives using promoter search/analysis
- official Team Meeting also with Mr. Kai Ludwig (LANGE + PFLANZ) as guest for Logo / Title Claim discussion
- so far we have 1753 promoter sequences analyzed by PromoterSweep!
- Meeting with Daniela (Nao): Cell Profiler for capturing biological images & data analysis based on MATLAB
- working with R module RMySQL for using the pipeline between R and MySQL
- create a list of useful RMySQL commands
- Workflow for plotting histogram - workflow (SOURCE CODE/S?)
- make MySQL query using R
- make list of TFs, avoid duplicates using perl
- pick up each TF (perl/R) and plot histogram (R)
- create MySQL command list including combinatorial queries
- check HEARTBEAT DB for duplicate entries
- how should we plot the histogram?
- (a) histogram - how "wide" should be each bin? 100bp? 50bp? 20bp?
- (b) plot probability density
- study Transfac PWM (position weight matrices) for
- difference in consensus sequences (also ask Anna-Lena)
- different PWM types (vertebrates, plant, insect, fungi, bacteria, nematodes...)
- positive control: when histograms are generated and plotted, check distribution of Sp1
- so far we have 3640 promoter sequences "sweeped"!
- access from R to mysql at the local@host server established
- dealing with perl - introduce transition of variables between perl and R
- derive transcription factor data using R and MySQL
- plot HEARTBEAT TF hit distribution as histograms & density functions for different PWM subsets (all, vertebrates only, single matrices and joined TFs)
- further completion of the database
- discussion on how to make statistical studies on our gained distributions
- ideas: define maximum and variance -> Nao
- look for motif sequences -> Tim
- we have 4476 sequences analysed by Promotersweep so far!
- but we are expecting 4700 sequences - check missing ones!
- internal team meeting: Tim, Lars, Stephen, Nao
- select especially interesting TFs
- criteria: (a) good hits in our distributions; (b) easy experimental handling
- we go for HIF, SREBP and VDR to analyse and make synthetic promoter design
- Transfac PWM: there are some annotaion inconveniences of some matrices
- which "spacer" sequences should we use in order to generate TFBS free sequece parts
- rational design of synthetic promoters
- Tim: SREBP, Nao: VDR
- both go for a total number of 10 sequences
- single TFs: search for density maxima
- check combinatorial appearance and design promoter sequences with multiple binding TFs
- use spacer sequences generated by Lars and check for TFBS using Transfac
- sequence length: max. 1000bp
- back-up idea: if synthesis does not work for a long (~1000bp) sequence then try to work out a protocol for a two-step promoter synthesis combining one empty (TFBS free) sequence with another which consists of many TF and activator binding sites.
- Wiki Meeting (Nao)
- Logo choice & modification
- choose header pics
- navigation layout
- develop a catchy, cool homepage
- Meeting with Tim, design synthetic promoter sequences
- check spacer sequence (200bp) for TFBS: one TFBS found; remove it by cutting and shortening the sequence to 190bp)
- Kid3 is a repressor!
- design more synthetic promoter sequences by manual iteration process which consists of (i) TFBS check and (ii) TFBS removal & filling up random sequence
- aim: creation of an automatic designing tool for synthetic promoters which include sequence design, transfac search as well as filling the sequence up with spacer sequences.
- check designed sequences for restriction sites
- finish creating sequences
- consider CMV core promoter into the calculation of the relative position of TFBS to the TSS
- create sequences for negative control
- pure TFBS free sequence
- sequences with TFBS at minima of the density function
- checking for all sequences for further binding sites with the Transfac match tool
- check restriction sites for reverse complementary strand
- add flanking sites with restriction sites and spacer nucleotides to our designed sequences
- is there any possibility to automatize Transfac queries?
- work with combined / joined MySQL query structures
- or solve this process by simply writing new temporary tables?
- workflow summary (short) for manual designing of a synthetic promoter:
- (A) use random sequence
- (B) check TF-matrices
- (C) validate TFs (mouse? human? repressor?)
- (D) check Transfac and restriction sites
- Phone conference with Kai Ludwig, Logo & Web Design (Nao)
- wiki closure on Oct 21st!
- modify synthetic promoter sequences to be ready for ordering
- Sweep more promoter sequences using Promotersweep
- start Modeling
- revise and improve HEARTBEAT
- discuss differences between PWMs
- still modifying synthetic sequences to be ready for shipping
- we have altogether 25 designed promoter sequences!
- Software Meeting (Stephen, Tim, Nao)
- compartibility with mediawiki: HTML, perl, php, R, java?
- GUI design
- simple interface: single TF, auxiliary TFs, #TFBS, sequence length
- "interactive": multiple TF, choosing auxiliary TFs, additional information (see Eukaryopedia), density function plot & histogram
- "hyper-interactive" step-by-step design & creation
- Modeling Meeting with Marti and Anna-Lena (Tim, Nao)
- aim: fancy visualization to show expectation & prediction providing pathway insights
- what is the stimulus? collect possible inputs!
- measurable outcome: experiments & pathways
- quality of synthetic sequence: error checking
- we need to define the quality of our sequences
- LEVELS of modeling
- (1) DNA (2) expression/transcriptional activity (3) output
- each with corresponding measurement
- general modeling scheme: input - "What we are affecting" - possible outcomes
- how? We use fuzzy logic
- collect input for inducing the system (e.g. p53: CPT, Pifithrin-alpha; NFkB: TNF-alpha etc.)
- phone conference with Kai Ludwig
- learn how to include Perl code into html code
- learn how to use embperl
- configure apache2 server such that embperl can be interpreted
- try to make offline use of embperl working
- try to find nice html editor for ubuntu - (seamonkey, Amaya)
- create network picture for meeting tomorrow
- Logo discussion
- Read paper: Fuzzy Logic Modeling of Signaling Networks (Aldridge 2009)
- learn data management of virtual server
- get an overview about the apache2 file and security system
- Modeling Meeting with Marti (Douaa, Tim, Nao)
- update on available drugs/sequences
- decide what to model: (A) error checking, and (B) differential expression?
- use natural promoters to build up model for prediction of activity of synthetic promoters
- Discussion of TF score
- Transfac sequence alignment score
- promotersweep binding site quality
- relative position to TSS: How?
- (A) peak width & amplitude, (B) distance to maximal peak & position, (C) number of PEAK, (D) "sliding window" and calculate area under curve, (E) #TFBS (also for comparison of different synthetic promoters)
- biophysical affinity using TRAP
- first model: build up either on CMV or on JeT
- potential: integrate many stimuli -> find out crosstalks of pathways?
- TODO (meeting)
- collect data
- define WHAT we want to model
- summarize available sequences
- try to formulate IF ... THEN "sentences"
- check MATLAB & MATLAB Fuzzy Logic Toolbox availability
- internal Team Meeting
- find error.log files on the server and learn how to use it
- learn how to use tag language of embperl
- learn how to write loops with embperl
- access of input variables in embperl -- using the %fdat hash
- struggling with how to use R from embperl
- Wiki Meeting (Dani, Cori, Nao)
- install image processing tool
- design wiki, brainstorming for possible navigation bars
- Wiki Phone Meeting with Kai Ludwig (Nao)
- design header & presentation-master as well as team shirts
- Seminar: Martijn Luijsterburg (Karolinska Institute) - Heterochromatin Protein 1 is involved in the DNA damage response. Host: Thomas Höfer, Bioquant
- Modeling Meeting with Marti, Anna-Lena (Tim, Nao)
- contact database group (TP3)
- statistics: characterizing peaks
- we go for area under the curve and affinity. optionally we can choose Transfac sequence score and peak height & width
- strategy to convince the wetlab people from the importance of modeling during the meeting on upcoming friday.
- MATLAB license?
- logical gates: try to start creating model topology after Friday
- Presentation: Marti Bernado Faura (Bioquant, University of Heidelberg): Data-driven Fuzzy Logic modeling of Programmed Cell Death
- intro into fuzzy logic
- system development & work flow of fuzzy logic
- fuzzy inference & model prediction
- model types: MISO / MIMO
- Wrap-up meeting: Team HEARTBEAT (Tim, Nao)
- split up computational work into three tracks: HEARTBEAT DB, HEARTBEAT GUI and modeling
- database: documentation (until Oct 18), peak characterization, calculate absolute density function
- GUI: based on embperl, design according to our new wiki
- modeling: MATLAB license, collect sequences & input data, develop network model, include pathways
- prepare slides for meeting tomorrow
- pathway search: TNF-alpha/NFkB, VDR, SREBP and crosstalks. NFkB has a lot of pathway crosstalks, while SREBP and VDR show a interesting connection. Upon induction, SREBP activates VDR.
- Team Meeting (Wetlab, Nao)
- short progress report of all of us
- modeling: discussing scheme, modeling elements and strategies
- Wiki Phone Meeting with Kai Ludwig (Nao)
- designed synthetic promoters (HB_0001 - HB_0025) will be joined to CMV core promoter since JeT core promoter contains a Sp1 site in it. All other sequences (random synthesized, e.g.) are coupled with JeT core promoter.
- literature studies on combinatorial cis-regulation as well as on modelig of the lambda-switch
- prepare slides for the next modeling meeting
- Wiki Meeting (Dani, Nao)
- MATLAB license order (Jens)
- postpone Yara meeting (Wetlab, Tim)
- got sequences from Lars
- got qRT-PCR setup from Chenchen
- Modeling Meeting with Marti & Anna-Lena (Tim, Nao)
- still need to collect FACS and microscopy results
- discuss our network prediction model using TNF-alpha as aｎ example
- maybe we can use the lambda switch paper as a good starting point for our modeling
- Wiki & Presentation Meeting with Dani (Nao)
- Internal Team Meeting
- check out number and measurement plans of randomly assembled synthetic promoters (5x NFkB, 5x p53, 2x pPARg, 2x SREBP)
- Wiki Meeting (Corinna, Daniela, Nao)
- discuss design of the top page and possible features
- try out CSS design
- Wiki Design (Nao)
- Wiki Phone Meeting with Kai Ludwig (Nao)
- MATLAB has arrived!
- literature work
- Wetlab Meeting: progress report on measurement of random assembled synthetic promoters
- make thoughts about the whole storyboard of our presentation at the jamboree
- Short Meeting with Roland
- image processing work for wiki
- Measurement discussion with Lars: REU/RMPU, defining equations for mammalian systems
- literature work on PoPS paper (Kelly JR et al.) and apply their equations
- Marti Modeling Meeting (Anna-Lena, Tim, Nao)
- Journal Club (Tim, Nao)
- summary of meeting from Team Meeting from last thursday
- Marti: start modeling using MATLAB and Fuzzy Logic Toolbox (FLT), playing around with FLT and tutorial
- develop first test fuzzy inference system (FIS) for testing
- Marti Modeling Meeting, specify model topology
- collect data: FACS (Cori), Microscopy (Hannah), Sequence & TECAN (Lars)
- start calculating position score using R
- translating project abstract
- calculate affinity score using TRAP (Anna-Lena)
- collect ideas for integrating TFwise scores in order to calculate final position/affinity score for one sequence: median, mean, maximum, weighted mean?
- all data analysis is stored in three sheets (SequenceAnalysis, ResultSummary and CalculateTRAP)
- from now on we concentrate on FACS measurements because they are the most reliable ones (TECAN used only for scanning)
- fill up TRAP data with missing transcription factors
- Anna-Lena Meeting: discuss how to integrate sequence scores
- get & check p53, pPARg and random SREBP sequences
- go through FACS results
- add HEARTBEAT sequences for data analysis
- modeling documentation
- parsing experimental setups for modeling use
- Chenchen qRT-PCR results
- define possible modeling layers
- first layer: input
- drug type, pathway, drug mode of action, drug concentration, targeted cells, incubation time
- sequence type, position score, affinity score
- we choose position & affinity score, sequence type and the presence of stimulation. Time as well as different concentration (unfortunately no data available) can be added in future
- second layer: promoters
- 6 constitutives, 3 standards, 6 inducible available
- data analysis narrows this to 5 constitutives, 3 standards and 4 inducible
- HEARTBEAT sequences have to be measured a.s.a.p.
- Marti Modeling Meeting
- try to define some fuzzy rules
- we assume better binding -> better expression
- define membership functions
- start modeling with NFkB results
- Internal Team Meeting
- reminder: wiki task, wiki to do
- Official Team Meeting
- final decision: we go for maximum of position and affinity score
- added HB sequences for data analysis table; as soon as results are there we can model designed synthetic promoters
- define shape of membership functions
- literature search for missing activity values?
- still TODO: check out p53 results since the p53-NFkB crosstalk is really interesting!
- SREBP/VDR paper arrived
- finish data analysis
- study & playing around with MATLAB FLT, programming from both FLT GUI and MATLAB command line
- define our work to be (i) error checking and (ii) exclusive pathway modeling
- high potential of this model lies in its plug'n'play structrue, with a high capacity of integrating more inputs, outputs and also the middle layer (promoter diversity)
- define final network structure
- wiki work
- reading RFC documentation and correction
- we call this project HEARTBEAT fuzzy network (FN)
- HB FN documentation and first results!
- creating two fuzzy controllers: inducible NFkB and constitutive
- how do we integrate the data? combine via Simulink!
- Creating, developing, integrating and combining fuzzy network modeling (MATLAB, Simulink)
- first analysis of HB sequences
- HEARTBEAT FN documentation