Team:Warsaw/Modelling/Structural

From 2009.igem.org

Revision as of 23:54, 15 October 2009 by Seth (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Introduction

Because the native conformation of secretion peptide from hemolysin A is not determined we decided to use several computational structure prediction methods to find the three-dimensional structure of this domain. Additionally we attempt to obtain the theoretical models of proapoptotic fusion proteins and naturally occuring proteins which are used in our project.

Fundamental basis

Protein folding

Protein folding is the physical phenomena by which a polypeptide chain folds into highly specific and functional three-dimensional structure from a random coil. Shortly after translation from mRNA each protein molecule exists as an unfolded chain with no characteristic conformation. However aminoacids interact with each other to create a well-defined three dimensional structure known as the native state. This resulting conformation is determined by the amino acid sequence.

Fusion proteins

Fusion proteins are proteins which are created by joining two or more genes which originally encoded separate polypeptide chain. Expression of that fusion gene results in a single polypeptide with functional properties derived from each of the proteins encoded by used genes. Recombinant fusion proteins are created artificially via DNA recombination for use in biological research or to produce altered proteins with new features.

In most cases the functionality of fusion proteins is not interrupted. It is possible due to intrinsic protein domains modularity. The fragment of polypeptide which corresponds to a given domain may be removed or added to the rest of the molecule without destroying its native capabilities.

However it is highly recommended to predict the three-dimensional structure of fusion protein or the artificially attached domains. The knowledge of the spatial organization of any given protein is an extremely useful prerequisite for the understanding of the function and for the rational modifications of the proteins.

Methods

Computation

We choose following servers to compute the secondary structures and full models for proteins of interest.

[http://www.bioinfo.pl/ BioInfoBank Meta Server]

This server offers a set of structural models collected from the prediction servers are assessed using the powerful 3D-jury consensus approach.

[http://zhang.bioinformatics.ku.edu/I-TASSER/ TASSER]

I-TASSER server is an Internet service for protein structure and function predictions. Models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations.

[http://robetta.bakerlab.org/ Robetta]

Robetta is a full-chain protein structure prediction server. It parses protein chains into putative domains and models those domains either by homology modeling or by de novo modeling

[http://www.reading.ac.uk/bioinf/ModFOLD/ The ModFOLD Model Quality Assessment Server]

ModFOLD is a server which can provide a single score and a p-value relating to the predicted quality of a single 3D model of a protein structure and rankings for multiple 3D models for the same protein target according to predicted model quality. It also may do some predictions of the local quality within multiple models.


More detailed description of used methods is available here

Evaluation

We used the following measures of the models validity

  • Ramachandran plot
  • RMSD
  • TM-score
  • C-score


More detailed description of used methods is available here

Results

2D-predictions

By means of some programs available on bioinformatics metaservers the secondary structures for our proteins of interest have been found. All structures (except one which was found by i-TASSER) were predicted using the meta.bioinfo.pl server. If you want to know the detailed information about secondary structures click here

Full models

Secretion peptid

It was recommended to elucidate the three dimensional structure of the hemolysin A domain responsible for its secretion. In the case of commonly used large proteins tag such as GST it is known that the added domain usually folded autonomically and do not interrupt the native structure of the rest of molecule. Although there is no available data concerning the influence of aforementioned secretion domain on the correct folding.


Models score quality (TM  score calculated by the ModFOLD)  0.3015 – 0.1873
0.3015 – model1 (Muster)
0.2813 – model2 (Modeller) 
0.2754 – model3 (Modeller)
0.2585 – model4 (Tasser)
Some calculated RMSD values (expressed in Ångström units)
Model1(Tasser) vs Model3(Modeller) - 4.315
Model1(Tasser) vs Model1(Muster) - 3.116
Model2(Tasser) vs Model3(Modeller) - 3.468
Model2(Tasser) vs Model1(Muster) - 3.874
Visualisation of the modelled domain (predicted by Modeller)
Structural alignment of the three top models


Predicted residue errors for model from Modeller (left) and i-TASSER (right)
Ramachandran plots for model from Modeller (left) and i-TASSER (right)

The accuracy of predicted structures is moderate. All generated models resemble each other however the RMSD values among them show that the similarity of these structures is not very significant. The global geometry of the modelled domains is not altered however in each case the spatial localisation of amino acid residues is different. Mediocre TM score indicates that the global topology of the models may not correspond to the real structure of the secretion peptid.


Bax with secretion domain secreted

Structural alignment of one predicted model (green) with secretion peptide model (purple) and crystal structure of bax core domain (yellow)
Structural alignment of one predicted model (green) with secretion peptide model (purple) and NMR structure of bax core domain (yellow)
3D histogram representing Ramachandran plot for the best obtained model

Most of obtained models are at first sight incorrect and they do not form valid proteins. Only structured predicted by TASSER server seems to be physically acceptable. It should be remarked that the resolution of these models is low especially for the secretion protein. Many residues has improper values of dihedral angles. In spite of these results it appear that the conformation of the protein part which is corresponded to the bax appears to have not been altered by the presence of the additional domain. Unfortunately calculated TM score indicates that the global topology of the models may not correspond to the real structure of the fusion protein.

Top models score quality 
(TM score calculated by the ModFOLD)  
0.2470 - model1
0.2463 – model2
Some calculated RMSD values (expressed in Ångström units)
1.083 (crystal structure of bax vs model)
13.411 (compared models)
Predicted residue errors for the best models

p53 wih both signal domains

Structural alignment of model generated by LOMETS (green), modelled secretion domain (cyan) and experimentally resolved p53 core domain (purple)
Structural alignment of model generated by i-TASSER (red), modelled secretion domain (creamy) and experimentally resolved p53 core domain (purple)

The best models were created by simple threading programs Lomets but the structures found by TASSER server were almost the same. The major distinction between models created by these programs was the topology of the secretion domain. Lomets recognized correctly the structure of secretion peptide but it was unable to reconstruct the geometry of p53. The physical quality of all obtained models was evaluated by the Modfold server. The assessment reveals that the resolution of all predicted structures is low.

Top models score quality 
(TM score calculated by the ModFOLD) 
0.1995 (LOMETS)
0.1790 (i-TASSER)
Some calculated RMSD values (expressed in Ångström units)
mod5 (i-TASSER) vs secretion domain (Modeller)
RMSD = 11.635 (175 to 175 atoms)
mod5 (i-TASSER) vs crystal structure
192 atoms aligned.
RMSD = 1.135 (171 to 171 atoms
mod1 (LOMETS) vs secretion domain (Modeller)
190 atoms aligned.
RMSD = 3.129 (170 to 170 atoms)

However one can find the structural similarity between the secretion domain from hemolysin A and obtained model of the domain. In the case of p53 core domain situation is better. RMSD between modelled protein core and experimentally resolved structures collected from PDB is surprisingly high. Unfortunately the validity of the other parts of the molecule is below the level of confidence and it appears to be without significant statistical meaning. As it was mentioned before in the case of models created by LOMETS the resolution of secretion domain is acceptable.

Ramachandran plots for model from LOMETS (left) and i-TASSER (center) in contrast to plot calculated for native structure of p53 (rigth)
Predicted residue errors for the best models created by LOMETS (left) and i-TASSER

Listeriolysin with secretion domain

Structural alignment of listeriolysin model (green) with closely related perfringolysin O crystal structure (orange) and the theoretical model of secretion domain (grey)
Predicted residue errors for the top model created by LOMETS

Most of obtained models are physically incorrect and it is unlikely they represented valid proteins. Only one structured predicted by Tasser seems to be acceptable. It should be remarked that the resolution of this models is low especially for the secretion protein. Many residues have improper values of dihedral angles. Despite of these findings it appears that the conformation of the protein part which is corresponded to the listeriolysin appears to have not been altered by the presence of the additional domain. Unfortunately calculated TM score suggests that the global topology of the models may not correspond to the real structure of the fusion protein.

Calculated RMSD value (expressed in Ångström units):
mod1 (i-TASSER) vs crystal structure of perfringolysin
RMSD = 11.635 (175 to 175 atoms)
Top model score quality (TM score calculated by the ModFOLD)
0.2296
Ramachandran plots for model from i-TASSER (left) and for native structure of perfringolysin (rigth)

Invasin

Alignment of two theoretical models: first is generated by Modeller (orange) and the latter by i-TASSER (blue)
Alignment of model calculated by i-TASSER and the crystal structure of invasin from Yersinia pseudotuberculosis

Due to lack of proper structural template the best solution was to create a model of the domain responsible for invasiveness. Employing the crystal structure of related invasin from Yersinia pseudotuberculosis the full structure model of invasive domain was created. Alignment to the known structure of similar invasin indicate the global geometry of both protein is closely related. It should be underlined that the spatial organisation of essential amino acid residues in the part of molecule which interact with the integrin receptor is almost the same.

Evaluation of the models
RMSD =  3.274 (between both theoretical models)
Top model score quality 0.2236 (i-TASSER) 0.2205 (Modeller)
Ramachandran plots for model from i-TASSER (left) and for native structure of related invasin (rigth)
Predicted residue errors for the models created by i-TASSER (left) and Modeller (rigth)


PhoP and PhoQ

The two proteins are responsible for induction of pH-dependent promoter which is pivotal element of our system. Because the full structures of both PhoP and PhoQ were unknown we decided to find them by means of structural modeling.

Accurate model of PhoP created by Modeller
Structural alignment of four models of PhoP protein

PhoP was an easy target due to presence of a very similar proteins in PDB database. Models generated by four programs have the same geometry and the RMSD between the structures is minimal (<0.25). All structures were validated using modFOLD server. The analysis reveals that all of them appear to be correct and the distinctions among the best structures are not significant. These findings indicate the obtained models are notable congruent to the native structure.

Predicted residue errors for the models created by Modeller (left) i-TASSER (center) and MUSTER (right)
Ramachandran plot for top model created by Modeller
Calculated RMSD value (expressed in Ångström units):

RMS = 0.340 (Modeller vs i-TASSER)
RMS = 0.561 (Modeller vs MUSTER)
RMS = 0.151 (Modeller vs LOMETS)
For the 15 top models TM score 
calculated by the ModFOLD were above 0.6

PhoQ was more difficult target for all used programs. Most of found models were at first sight physically incorrect. Only i-TASSER was able to created structures which resemble protein molecules. However, the structural alignments between the models reveal low level of similarity for these structures. Some parts were close related but their spatial organisation within the molecule was different in each case. Low TM score suggesting the probability that the structures are valid may not be significant.

Top models score quality (TM score calculated by the ModFOLD)
0.2908 0.2456 0.2393
Structural alignment of three models of PhoQ generated by i-TASSER
Ramachandran plot for one model created by i-TASSER
Predicted residue errors for the models created by i-TASSER