Team:Heidelberg/Notebook modeling

From 2009.igem.org

(Difference between revisions)
(8-21-2009)
(8-21-2009)
Line 223: Line 223:
'''HEARTBEAT''' database ('''''He'''idelberg '''Ar'''tificial '''T'''ranscription Factor '''B'''inding Site '''E'''ngineering and '''A'''ssembly '''T'''ool)
'''HEARTBEAT''' database ('''''He'''idelberg '''Ar'''tificial '''T'''ranscription Factor '''B'''inding Site '''E'''ngineering and '''A'''ssembly '''T'''ool)
 +
<br><br><br>
== 8-22-2009 ==
== 8-22-2009 ==

Revision as of 00:26, 19 October 2009

Notebook HEARTBEAT

Welcome to the notebook of the HEARTBEAT (Heidelberg Artificial Transcription Factor Binding Sites Assembly and Engineering Tool) project. This notebook comprises the work on three sublanes: HEARTBEAT database (DB), HEARTBEAT graphical user interface (GUI) and HEARTBEAT fuzzy modeling (FN) as well as some additional work on logo as well as wiki design. Have fun!


Contents

  • August
  • September
  • October


July

7-27-2009

  • Meeting with Oliver Pelz
    • Discuss general ideas of our Database Structure and Content
    • An introduction into PromoterSweep (LINK). PromoterSweep screens a given sequence for conserved regions giving us consensus sequences and moreover screens them for TFBS by using database search (TRANSFAC, Jasper) (LINK)
    • Our new database should contain following informations: promoter sequence, TFs, TFBS, position of TFBS, number of binding TFBS, "host organism"
    • We decide to choose MySQL as a appropiate language solving this challenge which allows us also a graphical representation of the database on the web later.
    • GUI on wiki: which language? php? javascript?
    • Problems: access to PromoterSweep (Husar Bioinformatics Group, DKFZ), choice of Promoter Database (DoOP, UCSC, EnsEMBL) (LINK)
  • aim: create database until end of August

AUGUST

Week Days
32 8-3-2009 8-4-2009 8-5-2009 8-6-2009 8-7-2009 - -
33 - - 8-12-2009 - 8-14-2009 - -
34 8-17-2009 8-18-2009 8-19-2009 8-20-2009 8-21-2009 - -

8-3-2009

  • First contact with MySQL
  • Start making an overview of other team's projects
  • Configuring our Virtual Server

8-4-2009

  • Official Team Meeting (LINK) @ BQ seminar room 43: preparaing presentation & writing meeting report
  • Start installing developing environment on our internal server
    • GNOME
    • Mediawiki

8-5-2009

  • Meeting with Tobias Bauer & Anna-Lena Kranz (Theoretical Bioinformatics, DKFZ) @ TP3, DKFZ
    • Integrating ideas of PromoterSweep, Transfac as well as DoOP/CisRED
    • select "interesting" TFs (e.g. HIF, NFkB, c-myc, p53) for Wetlab
    • select "interesting" pathways (e.g. cell cycle, inflammation, metabolism etc)
    • future experimental validation: ChIP-on-Chip
      • for this we need a TFBS-free sequence
    • idea: plot histogram of TFBS relative to TSS
      • problem: choice of sequence: upstream only? inculde downstream?
    • new programming language: R and perl
    • next meeting: Friday after team meeting
  • Meeting with Karl-Heinz Glatting (HUSAR, DKFZ) @ TP3, DKFZ
    • An introduction into PromoterSweep
    • Structure and analysis principles of PromoterSweep
    • Output is stored in an XML file. This means we have to parse the xml code.
    • Oliver Pelz will give help for us in programming
  • Protocol of the meeting can be downloaded from here.
  • Start working with MySQL
  • request UNIX/HUSAR/HPC access at DKFZ (Nao)
  • first contact with several databases: EmsEMBL, Compara, cisRED, DoOP, TiProD, contra (LINKS)

8-6-2009

  • Meeting with Oliver Pelz
    • defining workflow with PromoterSweep, Matrix Profile Search and introduction into different Motif Discovery Algorithms
  • installation of NX server for access onto internal server from Windows
  • configure developing environment (printing from Linux, configure Mediawiki)
  • defining basic concept of database construction
    • we select annotated promoter sequences in DoOP
    • we make a selection of pathway of interest using KEGG
    • narrow down number of target promoter sequences <10000.

8-7-2009

  • Official Team Meeting on Scheduling
  • Meeting with Anna-Lena and Tobias
    • Introduction into R
    • Tobias will give us access to their computing cluster (Group Roland Eils)
    • Promoter Selection: DoOP, EnsEMBL, or UCSC?
  • HUSAR account arrived
  • installation of R, R editor and perl editor
  • further configuration of our internal server / mediawiki

8-10-2009

  • first contact with R and perl
  • playing around with R and perl
  • playing around with R library: Biobase
  • check working on DKFZ cluster

8-11-2009

  • defining programming languages: perl, R, MySQL
  • retrieving first Promotersweep output files
  • Meeting with Marti
    • ideas for modeling
      • we will have at least three colors which overlap in their spectra.
      • a very nice approach will be Fuzzy Logic Modeling.
      • idea 1: error checking of affinity: compare expectation to experimental results and figure out where the error is hiding
      • idea 2: create&visualize fancy and fuzzy data from in silico simulation
    • combine: promoter, output and graphic representation (GRAFIK!)
    • next meeting with Marti: end of next week.
  • extract NCBI Entrez Gene IDs with R and perl
  • MAC adresses registered for bioquant network

8-12-2009

  • configure perl working environment
  • study structure of DoOP database
  • download DoOP and load DoOP database into MySQL

8-13-2009

  • trying out some DoOP queries
  • download fasta sequences from UCSC gene browser (LINK)
  • mapping of NCBI Entrez Gene IDs with RefSeq IDs
  • configure perl working environment on Windows XP
  • contact Endre Sebestyen concerning the perl module Bio-DoOP-DoOP (LINK)

8-14-2009

  • start PromoterSweep Analysis over Weekend

8-18-2009

Tim, Stephen, ab hier müsst ihr eure Sachen selber eintragen!

  • study outputfile of PromoterSweep. check out general structure and pick up useful information.
  • result is grouped in: General Info, Best Genomic Mapping, Promoter DB Search Result, Graphical Overview, Combined Binding Sites, TSS and Exon Info, Profile Matrices and Generated Output Files.
  • upon selection, sections of interest will be collected and made ready for entry into MySQL DB
  • discuss table structure of our database
  • How should our database be called? - Brainstorming -
    • SHOULD contain: iGEM, Transcription Factor, Binding Site, Promoter, synthetic biology, Heidelberg
    • MAY contain: position, heartbeat, prediction, assembly, eukaryotes
    • and still more keywords to come

8-19-2009

  • parse Promotersweep xml file into tab-separated text file (PERL CODE?)
    • the text file should contain: RefSeq ID, TF name, TFBS position, TF motif sequence, TFBS Quality, TSS, Entrez ID, EnsEMBL ID, further gene description.
    • this provided us with several programming problems concerning working with multiple arrays, hashes and their combinations (arrays of hashes, hashes of hashes, etc.) thus
  • studying structure and basic concepts of hash & key

8-20-2009

  • pre-decision for our table-structure
    • Table: Main_Info
      • RefSeq ID, TF, TF motif start & end position, TFBS motif score, TFBS quality, TSS database info
    • Table: Gene_Info
      • Ensembl_ID, Gene Symbol, Gene Description.
    • we go for the RefSeq ID to be the key connecting these two tables.

8-21-2009

  • update script for parsing the Promotersweep output files due to unexpected errors
  • we forgot to include "weak" as a category for the TFBS quality - added!
  • PromoterSweep result contains information about TSS derived from different promoter databases. On which should we rely, if they differ from each other?
    • We set our highest priority to DoOP database since they show a good accordance within the RefseqID results when compared to other databases (e.g. DBTSS).
  • order [http://www.mathworks.com/| Matlab] iGEM licence
  • search for a tool to use MySQL in R programming environment
  • wiki: write an short article about the German Cancer Research Center (DKFZ)
  • Meeting with Anna-Lena: once we established our database... then
    • two strategies:
      • manually select interesting transcription factors and analyse them using database queries
      • plot histograms of TFBS occurance within the target promoter sequence (TSS - 1000bp upstream) for each TF and make systematic analysis
    • we go for both!
    • idea for the future: we can analyze combinatorial appearance of distinct TF pairs
  • We have a name for our database - we call it -


- wait for it -


HEARTBEAT database (Heidelberg Artificial Transcription Factor Binding Site Engineering and Assembly Tool)


8-22-2009

  • Insert Amplification of mitoneet-eGFP by PCR

8-24-2009

  • Restriction digest of mutagenized Plasmids (PstI) and analysis on gel
  • Amplified inserts were gel-purificated


  • What worked: eBFP, eBFP+NLS, eBFP_kozak, eBFP+NLS_kozak, eGFP, eGFP_kozak
  • What didn't: NLS, NLS_kozak, eGFP+mitomeet, mitomeet, eGFP+mitomeet_kozak, mitomeet_kozak,

8-26-2009

BBBing of Insertsequences

  • PCR of cherry, cherry_myrpalm, myrpalm, NLS with kozak Primers to amplify cherry_kozak, cherry_myrpalm_kozak, myrpalm_kozak, NLS_kozak
  • Restriction with NheI and SpeI of localisationsequences and Flourophores, Restricted Plasmid was provided by Synthetic Promoter Group and digested with SAP
  • Ligation with p31
  • Transformation in DH5alpha with ligated Plasmids
  • Outplating of Transformed cells on Amp-plates

8-27-2009

  • Ligation and Transformation did not work (no colonies, except of two on the NLS )
  • New PCR with flourophores and localisationsequences, to get higher amounts
  • GEl purification of: eGFP, eGFP_kozak, eBFP, eBFP_NLS, eBFP_kozak, eBFP_NLS_kozak, NLS_kozak, cherry, cherry_myrpalm, myrpalm, cherry_kozak, cherry_myrpalm_kozak, myrpalm_kozak

8-28-2009

BBBing of Insertsequences2.0

  • Restrictiondigest of flourophores and localisationsequences with SpeI and NheI (1 h, Buffer 2, BSA)
  • Restrictiondigest of p49 with SpeI and NheI (1 h, Buffer 2, BSA) and SAP (30 min), purification
  • Nanodrop of digest shows no DNA inside of the samples -- purification was maybe unsuccessful

8-29-2009

BBBing of Insertsequences2.1

  • Restrictiondigest of flourophores and localisationsequences with SpeI and NheI (1 h, Buffer 2, BSA)
  • Restrictiondigest of p49 with SpeI and NheI (1 h, Buffer 2, BSA) and SAP (30 min), purification

8-31-2009

BBBing of Insertsequences2.1 (part 2)

  • Ligation of Insertsequences with restricted p49
  • Transformation
  • Outplating -> Wrong resistance