Team:Warsaw/Modelling/Structural
From 2009.igem.org
Contents |
Introduction
Fundamental basis
Protein folding
Protein folding is the physical phenomena by which a polypeptide chain folds into highly specific and functional three-dimensional structure from a random coil. Shortly after translation from mRNA each protein molecule exists as an unfolded chain with no characteristic conformation. However aminoacids interact with each other to create a well-defined three dimensional structure known as the native state. This resulting conformation is determined by the amino acid sequence.
Fusion proteins
Fusion proteins are proteins which are created by joining two or more genes which originally encoded separate polypeptide chain. Expression of that fusion gene results in a single polypeptide with functional properties derived from each of the proteins encoded by used genes. Recombinant fusion proteins are created artificially via DNA recombination for use in biological research or to produce altered proteins with new features.
In most cases the functionality of fusion proteins is not interrupted. It is possible due to intrinsic protein domains modularity. The fragment of polypeptide which corresponds to a given domain may be removed or added to the rest of the molecule without destroying its native capabilities.
However it is highly recommended to predict the three-dimensional structure of fusion protein or the artificially attached domains. The knowledge of the spatial organization of any given protein is an extremely useful prerequisite for the understanding of the function and for the rational modifications of the proteins.
Methods
Computation
We choose following servers to compute the secondary structures and full models for proteins of interest.
[http://www.bioinfo.pl/ BioInfoBank Meta Server]
This server offers a set of structural models collected from the prediction servers are assessed using the powerful 3D-jury consensus approach.
[http://zhang.bioinformatics.ku.edu/I-TASSER/ TASSER]
I-TASSER server is an Internet service for protein structure and function predictions. Models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations.
[http://robetta.bakerlab.org/ Robetta]
Robetta is a full-chain protein structure prediction server. It parses protein chains into putative domains and models those domains either by homology modeling or by de novo modeling
[http://www.reading.ac.uk/bioinf/ModFOLD/ The ModFOLD Model Quality Assessment Server]
ModFOLD is a server which can provide a single score and a p-value relating to the predicted quality of a single 3D model of a protein structure and rankings for multiple 3D models for the same protein target according to predicted model quality. It also may do some predictions of the local quality within multiple models.
More detailed description of used methods is available here
Evaluation
We used the following measures of the models validity
- Ramachandran plot
- RMSD
- TM-score
- C-score
More detailed description of used methods is available here
Results
2D-predictions
By means of some programs available on bioinformatics metaservers the secondary structures for our proteins of interest have been found. All structures (except one which was found by i-TASSER) were predicted using the meta.bioinfo.pl server. If you want to know the detailed information about secondary structures click here
Full models
Secretion peptid
It was recommended to elucidate the three dimensional structure of the hemolysin A domain responsible for its secretion. In the case of commonly used large proteins tag such as GST it is known that the added domain usually folded autonomically and do not interrupt the native structure of the rest of molecule. Although there is no available data concerning the influence of aforementioned secretion domain on the correct folding.
Models score quality (TM score calculated by the ModFOLD) 0.3015 – 0.1873
0.3015 – model1 (Muster) 0.2813 – model2 (Modeller) 0.2754 – model3 (Modeller) 0.2585 – model4 (Tasser)
Some calculated RMSD values (expressed in Ångström units) Model1(Tasser) vs Model3(Modeller) - 4.315 Model1(Tasser) vs Model1(Muster) - 3.116 Model2(Tasser) vs Model3(Modeller) - 3.468 Model2(Tasser) vs Model1(Muster) - 3.874
The accuracy of predicted structures is moderate. All generated models resemble each other however the RMSD values among them show that the similarity of these structures is not very significant. The global geometry of the modelled domains is not altered however in each case the spatial localisation of amino acid residues is different. Mediocre TM score indicates that the global topology of the models may not correspond to the real structure of the secretion peptid.
Bax with secretion domain secreted
Most of obtained models are at first sight incorrect and they do not form valid proteins. Only structured predicted by TASSER server seems to be physically acceptable. It should be remarked that the resolution of these models is low especially for the secretion protein. Many residues has improper values of dihedral angles. In spite of these results it appear that the conformation of the protein part which is corresponded to the bax appears to have not been altered by the presence of the additional domain. Unfortunately calculated TM score indicates that the global topology of the models may not correspond to the real structure of the fusion protein.
Top models score quality (TM score calculated by the ModFOLD)
0.2470 - model1 0.2463 – model2
Some calculated RMSD values (expressed in Ångström units) 1.083 (crystal structure of bax vs model) 13.411 (compared models)
p53 wih both signal domains
The best models were created by simple threading programs Lomets but the structures found by TASSER server were almost the same. The major distinction between models created by these programs was the topology of the secretion domain. Lomets recognized correctly the structure of secretion peptide but it was unable to reconstruct the geometry of p53. The physical quality of all obtained models was evaluated by the Modfold server. The assessment reveals that the resolution of all predicted structures is low.
Top models score quality (TM score calculated by the ModFOLD)
0.1995 (LOMETS) 0.1790 (i-TASSER)
Some calculated RMSD values (expressed in Ångström units)
mod5 (i-TASSER) vs secretion domain (Modeller) RMSD = 11.635 (175 to 175 atoms)
mod5 (i-TASSER) vs crystal structure 192 atoms aligned. RMSD = 1.135 (171 to 171 atoms
mod1 (LOMETS) vs secretion domain (Modeller) 190 atoms aligned. RMSD = 3.129 (170 to 170 atoms)
However one can find the structural similarity between the secretion domain from hemolysin A and obtained model of the domain. In the case of p53 core domain situation is better. RMSD between modelled protein core and experimentally resolved structures collected from PDB is surprisingly high. Unfortunately the validity of the other parts of the molecule is below the level of confidence and it appears to be without significant statistical meaning. As it was mentioned before in the case of models created by LOMETS the resolution of secretion domain is acceptable.
Listeriolysin with secretion domain
Most of obtained models are physically incorrect and it is unlikely they represented valid proteins. Only one structured predicted by Tasser seems to be acceptable. It should be remarked that the resolution of this models is low especially for the secretion protein. Many residues have improper values of dihedral angles. Despite of these findings it appears that the conformation of the protein part which is corresponded to the listeriolysin appears to have not been altered by the presence of the additional domain. Unfortunately calculated TM score suggests that the global topology of the models may not correspond to the real structure of the fusion protein.
Calculated RMSD value (expressed in Ångström units): mod1 (i-TASSER) vs crystal structure of perfringolysin RMSD = 11.635 (175 to 175 atoms)
Top model score quality (TM score calculated by the ModFOLD) 0.2296
Invasin
Due to lack of proper structural template the best solution was to create a model of the domain responsible for invasiveness. Employing the crystal structure of related invasin from Yersinia pseudotuberculosis the full structure model of invasive domain was created. Alignment to the known structure of similar invasin indicate the global geometry of both protein is closely related. It should be underlined that the spatial organisation of essential amino acid residues in the part of molecule which interact with the integrin receptor is almost the same.
Evaluation of the models
RMSD = 3.274 (between both theoretical models)
Top model score quality 0.2236 (i-TASSER) 0.2205 (Modeller)
PhoP and PhoQ
The two proteins are responsible for induction of pH-dependent promoter which is pivotal element of our system. Because the full structures of both PhoP and PhoQ were unknown we decided to find them by means of structural modeling.
PhoP was an easy target due to presence of a very similar proteins in PDB database. Models generated by four programs have the same geometry and the RMSD between the structures is minimal (<0.25). All structures were validated using modFOLD server. The analysis reveals that all of them appear to be correct and the distinctions among the best structures are not significant. These findings indicate the obtained models are notable congruent to the native structure.
Calculated RMSD value (expressed in Ångström units): RMS = 0.340 (Modeller vs i-TASSER) RMS = 0.561 (Modeller vs MUSTER) RMS = 0.151 (Modeller vs LOMETS)
For the 15 top models TM score calculated by the ModFOLD were above 0.6
PhoQ was more difficult target for all used programs. Most of found models were at first sight physically incorrect. Only i-TASSER was able to created structures which resemble protein molecules. However, the structural alignments between the models reveal low level of similarity for these structures. Some parts were close related but their spatial organisation within the molecule was different in each case. Low TM score suggesting the probability that the structures are valid may not be significant.
Top models score quality (TM score calculated by the ModFOLD)
0.2908 0.2456 0.2393