Team:Heidelberg/Notebook modeling

From 2009.igem.org

Revision as of 01:35, 22 October 2009 by DouaaM. (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Notebook HEARTBEAT

Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!

Contents

July

August

September

October

July

7-27-2009

Meeting with Oliver Pelz
- Discuss general ideas of our Database Structure and Content
- An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
- Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
- We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
- GUI on wiki: which language? php? javascript?
- Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)

aim: create database until end of August

August

Week	Days
	Mon	Tue	Wed	Thu	Fri	Sat	Sun
31	-	-	-	-	-	1	2
32	3	4	5	6	7	8	9
33	10	11	12	13	14	15	16
34	17	18	19	20	21	22	23
35	24	25	26	27	28	29	30
36	31	-	-	-	-	-	-

8-3-2009

First contact with MySQL
Start making an overview of other team's projects
Configuring our Virtual Server

8-4-2009

Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
Start installing developing environment on our internal server
- GNOME
- Mediawiki

8-5-2009

Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
- Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
- select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
- select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
- future experimental validation: ChIP-on-Chip
  - for this we need a TFBS-free sequence
- idea: plot histogram of TFBS relative to TSS
  - problem: choice of sequence: upstream only? inculde downstream?
- new programming language: R and perl
- next meeting: Friday after team meeting

Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
- An introduction into PromoterSweep
- Structure and analysis principles of PromoterSweep
- Output is stored in an XML file. This means we have to parse the xml code.
- Oliver Pelz will give help for us in programming

Protocol of the meeting can be downloaded from here.

Start working with MySQL
request UNIX/HUSAR/HPC access at DKFZ (Nao)
first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra

8-6-2009

Meeting with Oliver Pelz
- defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms

installation of NX server for access onto internal server from Windows
configure developing environment (printing from Linux, configure Mediawiki)
defining basic concept of database construction
- we select annotated promoter sequences in DoOP
- we make a selection of pathway of interest using KEGG
- narrow down number of target promoter sequences <10000.

8-7-2009

Official Team Meeting on Scheduling
Meeting with Anna-Lena and Tobias
- Introduction into R
- Tobias will give us access to their computing cluster (Group Roland Eils)
- Promoter Selection: DoOP, EnsEMBL, or UCSC?

HUSAR account arrived
installation of R, R editor and perl editor
further configuration of our internal server / mediawiki
writing first perl program - "Hi there"

8-10-2009

first contact with R and perl
playing around with R and perl
playing around with R library: Biobase
check working on DKFZ cluster

8-11-2009

defining programming languages: perl, R, MySQL
retrieving first Promotersweep output files

Meeting with Marti
- ideas for modeling
  - we will have at least three colors which overlap in their spectra.
  - a very nice approach will be Fuzzy Logic Modeling.
  - idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
  - idea 2: create&visualize fancy and fuzzy data from in silico simulation
- combine: promoter, output and graphic representation
- next meeting with Marti: end of next week.

extract NCBI Entrez Gene IDs with R and perl
MAC adresses registered for bioquant network

8-12-2009

configure perl working environment
study structure of DoOP database
download DoOP and load DoOP database into MySQL

8-13-2009

trying out some DoOP queries
download fasta sequences from UCSC gene browser
mapping of NCBI Entrez Gene IDs with RefSeq IDs
configure perl working environment on Windows XP
contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP

8-14-2009

parse UCSC fasta sequences according to our selection
write parsed sequences into multifasta format
start PromoterSweep Analysis over Weekend

8-18-2009

Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!

study outputfile of PromoterSweep. check out general structure and pick up useful information.
result is grouped in: General Info, Best Genomic Mapping, Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
upon selection, sections of interest will be collected and made ready for entry into MySQL DB
discuss table structure of our database

How should our database be called? - Brainstorming -
- SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
- MAY contain: position, heartbeat, prediction, assembly, eukaryotes
- and still more keywords to come
establishing local@host access to mysql

8-19-2009

parse Promotersweep xml file into tab-separated text file
- the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
- this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
studying structure and basic concepts of hash & key
including parsed data into mysql database

8-20-2009

pre-decision for our table-structure
- Table: Main_Info
  - RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
- Table: Gene_Info
  - Ensembl_ID, Gene Symbol, Gene Description.
- we go for the RefSeq ID to be the key connecting these two tables.

8-21-2009

update script for parsing the Promotersweep output files due to unexpected errors
we forgot to include "weak" as a category for the TFBS quality - added!
PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
- We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).

order [http://www.mathworks.com/| Matlab] iGEM licence

search for a tool to use MySQL in R programming environment
wiki: write an short article about the German Cancer Research Center (DKFZ)

Meeting with Anna-Lena: once we established our database... then
- two strategies:
  - manually select interesting transcription factors and analyse them using database queries
  - plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
- we go for both!
- idea for the future: we can analyze combinatorial appearance of distinct TF pairs

We have a name for our database - we call it -

- wait for it -

HEARTBEAT database (Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool)

[TOP]

8-24-2009

Meeting with Marti: defining output modeling strategies
- "exclusive promoters"
  - a model for predicting the behaviour of activation of one, two, three... promoters at the same time.
  - the potential of this model lies in the possibility to model single as well as many pathways in combination and even check for synergistic effects
  - modeling logic: quantitative ODE VS. quantitative & qualitative fuzzy logic
- "error checking"
  - what to capture/measure: affinity of transcription factor binding to DNA
    - calculate score / reliabilty
    - phenotypic measurement
  - if we have time in the end: model/experiment optimization by wetlab-drylab-rounds (GRAFIK)
  - if we do not have much time: figure out where is catch
- modeling layers & final visualization
  - (i) capture affinity - (ii) model gene expression - (iii) pathway activity - (iv) fancy visualization (Mathworks Simulink?)
  - plot: time course, dynamic affinity
  - keep in mind the possible high amount of False Positives using promoter search/analysis

8-25-2009

official Team Meeting also with Mr. Kai Ludwig (LANGE + PFLANZ) as guest for Logo / Title Claim discussion

so far we have 1753 promoter sequences analyzed by PromoterSweep!

Meeting with Daniela (Nao): Cell Profiler for capturing biological images & data analysis based on MATLAB

working with R module RMySQL for using the pipeline between R and MySQL
create a list of useful RMySQL commands

8-26-2009

Workflow for plotting histogram - workflow (SOURCE CODE/S?)
- make MySQL query using R
- make list of TFs, avoid duplicates using perl
- pick up each TF (perl/R) and plot histogram (R)

create MySQL command list including combinatorial queries

8-27-2009

check HEARTBEAT DB for duplicate entries
how should we plot the histogram?
- (a) histogram - how "wide" should be each bin? 100bp? 50bp? 20bp?
- (b) plot probability density
study Transfac PWM (position weight matrices) for
- difference in consensus sequences (also ask Anna-Lena)
- different PWM types (vertebrates, plant, insect, fungi, bacteria, nematodes...)
- positive control: when histograms are generated and plotted, check distribution of Sp1

so far we have 3640 promoter sequences "sweeped"!

access from R to mysql at the local@host server established

8-28-2009

dealing with perl - introduce transition of variables between perl and R

8-31-2009

September

Week	Days
	Mon	Tue	Wed	Thu	Fri	Sat	Sun
36	-	1	2	3	4	5	6
37	7	8	9	10	11	12	13
38	14	15	16	17	18	19	20
39	21	22	23	24	25	26	27
40	28	29	30	-	-	-	-

9-1-2009

derive transcription factor data using R and MySQL
plot HEARTBEAT TF hit distribution as histograms & density functions for different PWM subsets (all, vertebrates only, single matrices and joined TFs)
further completion of the database

9-2-2009

discussion on how to make statistical studies on our gained distributions
- ideas: define maximum and variance -> Nao
look for motif sequences -> Tim

we have 4476 sequences analysed by Promotersweep so far!
- but we are expecting 4700 sequences - check missing ones!

9-3-2009

internal team meeting: Tim, Lars, Stephen, Nao
- select especially interesting TFs
  - criteria: (a) good hits in our distributions; (b) easy experimental handling
  - we go for HIF, SREBP and VDR to analyse and make synthetic promoter design
Transfac PWM: there are some annotaion inconveniences of some matrices
which "spacer" sequences should we use in order to generate TFBS free sequece parts

rational design of synthetic promoters
- Tim: SREBP, Nao: VDR
- both go for a total number of 10 sequences
- strategies:
  - single TFs: search for density maxima
  - check combinatorial appearance and design promoter sequences with multiple binding TFs
- use spacer sequences generated by Lars and check for TFBS using Transfac
- sequence length: max. 1000bp

back-up idea: if synthesis does not work for a long (~1000bp) sequence then try to work out a protocol for a two-step promoter synthesis combining one empty (TFBS free) sequence with another which consists of many TF and activator binding sites.

9-4-2009

work with Transfac PWM: structure, description, and using consensus sequence
write script to get the ID's and frequencies for all co-occuring TFBS of VDR and SREBP
write script for generating consensus sequence based on Transfac PWM and replacing ambiguity code with A, C, G or T
```
Getconsensus.pl, MakeConsensus.pl
```

Wiki Meeting (Nao)
- Logo choice & modification
- choose header pics
- navigation layout
- develop a catchy, cool homepage

9-5-2009

Meeting with Tim, design synthetic promoter sequences
check spacer sequence (200bp) for TFBS: one TFBS found; remove it by cutting and shortening the sequence to 190bp)
Kid3 is a repressor!

9-6-2009

design more synthetic promoter sequences by manual iteration process which consists of (i) TFBS check and (ii) TFBS removal & filling up random sequence

aim: creation of an automatic designing tool for synthetic promoters which include sequence design, transfac search as well as filling the sequence up with spacer sequences.

9-7-2009

check designed sequences for restriction sites
```
CheckRestrictionsites.pl
```
finish creating sequences
consider CMV core promoter into the calculation of the relative position of TFBS to the TSS
create sequences for negative control
- pure TFBS free sequence
- sequences with TFBS at minima of the density function
checking for all sequences for further binding sites with the Transfac match tool

9-8-2009

check restriction sites for reverse complementary strand
add flanking sites with restriction sites and spacer nucleotides to our designed sequences
is there any possibility to automatize Transfac queries?
work with combined / joined MySQL query structures
or solve this process by simply writing new temporary tables?

workflow summary (short) for manual designing of a synthetic promoter:
- (A) use random sequence
- (B) check TF-matrices
- (C) validate TFs (mouse? human? repressor?)
- (D) check Transfac and restriction sites

Phone conference with Kai Ludwig, Logo & Web Design (Nao)

official Team Meeting

wiki closure on Oct 21st!

9-9-2009

modify synthetic promoter sequences to be ready for ordering
Sweep more promoter sequences using Promotersweep
start Modeling
revise and improve HEARTBEAT
discuss differences between PWMs

9-10-2009

still modifying synthetic sequences to be ready for shipping
we have altogether 25 designed promoter sequences!

9-11-2009

Software Meeting (Stephen, Tim, Nao)
- compartibility with mediawiki: HTML, perl, php, R, java?
- GUI design
  - simple interface: single TF, auxiliary TFs, #TFBS, sequence length
  - "interactive": multiple TF, choosing auxiliary TFs, additional information (see Eukaryopedia), density function plot & histogram
  - "hyper-interactive" step-by-step design & creation

Modeling Meeting with Marti and Anna-Lena (Tim, Nao)
- aim: fancy visualization to show expectation & prediction providing pathway insights
- TODO/QUESTIONS
  - what is the stimulus? collect possible inputs!
  - measurable outcome: experiments & pathways
  - quality of synthetic sequence: error checking
    - we need to define the quality of our sequences
- LEVELS of modeling
  - (1) DNA (2) expression/transcriptional activity (3) output
  - each with corresponding measurement

general modeling scheme: input - "What we are affecting" - possible outcomes
how? We use fuzzy logic

9-14-2009

collect input for inducing the system (e.g. p53: CPT, Pifithrin-alpha; NFkB: TNF-alpha etc.)
phone conference with Kai Ludwig
learn how to include Perl code into html code
- learn how to use embperl
- configure apache2 server such that embperl can be interpreted
- try to make offline use of embperl working
try to find nice html editor for ubuntu - (seamonkey, Amaya)

9-15-2009

create network picture for meeting tomorrow
Logo discussion
Read paper: Fuzzy Logic Modeling of Signaling Networks (Aldridge 2009) (see References)
learn data management of virtual server
get an overview about the apache2 file and security system

9-16-2009

Modeling Meeting with Marti (Douaa, Tim, Nao)
- update on available drugs/sequences
- decide what to model: (A) error checking, and (B) differential expression?
- use natural promoters to build up model for prediction of activity of synthetic promoters
- Discussion of TF score
  - Transfac sequence alignment score
  - promotersweep binding site quality
  - relative position to TSS: How?
    - (A) peak width & amplitude, (B) distance to maximal peak & position, (C) number of PEAK, (D) "sliding window" and calculate area under curve, (E) #TFBS (also for comparison of different synthetic promoters)
  - biophysical affinity using TRAP (REFERENZ)
- first model: build up either on CMV or on JeT
- potential: integrate many stimuli -> find out crosstalks of pathways?

TODO (meeting)
- collect data
- define WHAT we want to model
- summarize available sequences
- try to formulate IF ... THEN "sentences"
- check MATLAB & MATLAB Fuzzy Logic Toolbox availability

9-17-2009

internal Team Meeting
find error.log files on the server and learn how to use it

9-18-2009

9-20-2009

learn how to use tag language of embperl
- learn how to write loops with embperl
- access of input variables in embperl -- using the %fdat hash

9-21-2009

struggling with how to use R from embperl

9-22-2009

Wiki Meeting (Dani, Cori, Nao)
- install image processing tool
- design wiki, brainstorming for possible navigation bars
Wiki Phone Meeting with Kai Ludwig (Nao)
- design header & presentation-master as well as team shirts

Seminar: Martijn Luijsterburg (Karolinska Institute) - Heterochromatin Protein 1 is involved in the DNA damage response. Host: Thomas Höfer, Bioquant

9-23-2009

Modeling Meeting with Marti, Anna-Lena (Tim, Nao)
- contact database group (TP3)
- statistics: characterizing peaks
  - we go for area under the curve and affinity. optionally we can choose Transfac sequence score and peak height & width
- strategy to convince the wetlab people from the importance of modeling during the meeting on upcoming friday.
- MATLAB license?
- logical gates: try to start creating model topology after Friday

Presentation: Marti Bernado Faura (Bioquant, University of Heidelberg): Data-driven Fuzzy Logic modeling of Programmed Cell Death
- intro into fuzzy logic
- system development & work flow of fuzzy logic
- fuzzy inference & model prediction
- model types: MISO / MIMO

Wrap-up meeting: Team HEARTBEAT (Tim, Nao)
- split up computational work into three tracks: HEARTBEAT DB, HEARTBEAT GUI and modeling
  - database: documentation (until Oct 18), peak characterization, calculate absolute density function
  - GUI: based on embperl, design according to our new wiki
  - modeling: MATLAB license, collect sequences & input data, develop network model, include pathways

literature work

9-24-2009

prepare slides for meeting tomorrow
pathway search: TNF-alpha/NFkB, VDR, SREBP and crosstalks. NFkB has a lot of pathway crosstalks, while SREBP and VDR show a interesting connection. Upon induction, SREBP activates VDR.

9-25-2009

Team Meeting (Wetlab, Nao)
- short progress report of all of us
- modeling: discussing scheme, modeling elements and strategies

9-28-2009

Wiki Phone Meeting with Kai Ludwig (Nao)

9-29-2009

designed synthetic promoters (HB_0001 - HB_0025) will be joined to CMV core promoter since JeT core promoter contains a Sp1 site in it. All other sequences (random synthesized, e.g.) are coupled with JeT core promoter.
literature studies on combinatorial cis-regulation as well as on modelig of the lambda-switch
prepare slides for the next modeling meeting

9-30-2009

Wiki Meeting (Dani, Nao)
MATLAB license order (Jens)
postpone Yara meeting (Wetlab, Tim)

got sequences from Lars
got qRT-PCR setup from Chenchen

Modeling Meeting with Marti & Anna-Lena (Tim, Nao)
- still need to collect FACS and microscopy results
- discuss our network prediction model using TNF-alpha as aｎ example
- maybe we can use the lambda switch paper as a good starting point for our modeling

October

Week	Days
	Mon	Tue	Wed	Thu	Fri	Sat	Sun
40	-	-	-	1	2	3	4
41	5	6	7	8	9	10	11
42	12	13	14	15	16	17	18
43	19	20	21	22	23	24	25
44	26	27	28	29	30	31	-

10-1-2009

Wiki & Presentation Meeting with Dani (Nao)

10-2-2009

some wiki work

10-5-2009

10-6-2009

Internal Team Meeting
- check out number and measurement plans of randomly assembled synthetic promoters (5x NFkB, 5x p53, 2x pPARg, 2x SREBP)

Wiki Meeting (Corinna, Daniela, Nao)
- discuss design of the top page and possible features
- try out CSS design

10-7-2009

Wiki Design (Nao)
Wiki Phone Meeting with Kai Ludwig (Nao)

MATLAB has arrived!
literature work

Wetlab Meeting: progress report on measurement of random assembled synthetic promoters
make thoughts about the whole storyboard of our presentation at the jamboree

10-8-2009

Short Meeting with Roland
image processing work for wiki

10-9-2009

10-12-2009

10-13-2009

Measurement discussion with Lars: REU/RMPU, defining equations for mammalian systems
literature work on PoPS paper (Kelly JR et al., see references) and apply their equations

Marti Modeling Meeting (Anna-Lena, Tim, Nao)
- Journal Club (Tim, Nao)
- summary of meeting from Team Meeting from last thursday

Marti: start modeling using MATLAB and Fuzzy Logic Toolbox (FLT), playing around with FLT and tutorial

10-14-2009

Nao

develop first test fuzzy inference system (FIS) for testing
Marti Modeling Meeting, specify model topology
collect data: FACS (Cori), Microscopy (Hannah), Sequence & TECAN (Lars)
start calculating position score using R
translating project abstract

10-15-2009

Nao

calculate affinity score using TRAP (Anna-Lena)
collect ideas for integrating TFwise scores in order to calculate final position/affinity score for one sequence: median, mean, maximum, weighted mean?
all data analysis is stored in three sheets (SequenceAnalysis, ResultSummary and CalculateTRAP)
from now on we concentrate on FACS measurements because they are the most reliable ones (TECAN used only for scanning)
fill up TRAP data with missing transcription factors

10-16-2009

Nao

Anna-Lena Meeting: discuss how to integrate sequence scores
get & check p53, pPARg and random SREBP sequences
go through FACS results
add HEARTBEAT sequences for data analysis
modeling documentation
parsing experimental setups for modeling use
Chenchen qRT-PCR results

define possible modeling layers
- first layer: input
  - drug type, pathway, drug mode of action, drug concentration, targeted cells, incubation time
  - sequence type, position score, affinity score
  - we choose position & affinity score, sequence type and the presence of stimulation. Time as well as different concentration (unfortunately no data available) can be added in future
- second layer: promoters
  - 6 constitutives, 3 standards, 6 inducible available
  - data analysis narrows this to 5 constitutives, 3 standards and 4 inducible
  - HEARTBEAT sequences have to be measured a.s.a.p.

Marti Modeling Meeting
- try to define some fuzzy rules
- we assume better binding -> better expression
- define membership functions
- start modeling with NFkB results

All

Internal Team Meeting
- reminder: wiki task, wiki to do
Official Team Meeting
- see here (LINK) for protocol

10-17-2009

final decision: we go for maximum of position and affinity score
added HB sequences for data analysis table; as soon as results are there we can model designed synthetic promoters
define shape of membership functions
literature search for missing activity values?
still TODO: check out p53 results since the p53-NFkB crosstalk is really interesting!

10-18-2009

Nao

SREBP/VDR paper arrived (see references)
finish data analysis
study & playing around with MATLAB FLT, programming from both FLT GUI and MATLAB command line
define our work to be (i) error checking and (ii) exclusive pathway modeling
high potential of this model lies in its plug'n'play structrue, with a high capacity of integrating more inputs, outputs and also the middle layer (promoter diversity)

10-19-2009

Nao

define final network structure (GRAFIK)?
wiki work
reading RFC documentation and correction
we call this project HEARTBEAT fuzzy network (FN)
HB FN documentation and first results!
- creating two fuzzy controllers: inducible NFkB and constitutive
how do we integrate the data? combine via Simulink!

10-20-2009

Creating, developing, integrating and combining fuzzy network modeling (MATLAB, Simulink)
first analysis of HB sequences
HEARTBEAT FN documentation

10-21-2009

10-22-2009

FROZEN WIKI!!!!

Retrieved from "http://2009.igem.org/Team:Heidelberg/Notebook_modeling"