Team:Warsaw/Modelling/Structural
From 2009.igem.org
Contents |
Introduction
Fundamental basis
Protein folding
Protein folding is the physical phenomena by which a polypeptide chain folds into highly specific and functional three-dimensional structure from random coil. Shortly after translation from mRNA each protein molecule exist as an unfolded chain with no characteristic conformation. However aminoacids interact with each other to create a well-defined three dimensional structure known as the native state. This resulting conformation is determined by the amino acid sequence.
Fusion proteins
Fusion proteins are proteins which are created by means the joining two or more genes which originally encoded separate polypeptide chain. Expression of that fusion gene results in a single polypeptide with functional properties derived from each of the proteins encoded by used genes. Recombinant fusion proteins are created artificially via DNA recombination for use in biological research or to produce altered proteins with new features.
In most cases the functionality of fusion proteins is not interrupted. It is possible due to intristic protein domains modularity. The fragment of polypeptide which corresponds to a given domain may be removed ar added to the rest of the molecule without destroying its native capabilities.
However it is highly recommended to predict the three-dimensional structure of fusion protein or the artificially attached domains. The knowledge of the spatial organisation of any given protein is an extremely useful prerequisite for the understanding of the function and for the rational modification of the proteins.
Methods
Computation
We choose following servers to compute the secondary structures and full models for proteins of interest.
[http://www.bioinfo.pl/ BioInfoBank Meta Server]
This server offers a set of structural models collected from the prediction servers are assessed using the powerful 3D-jury consensus approach.
[http://zhang.bioinformatics.ku.edu/I-TASSER/ TASSER]
I-TASSER server is an Internet service for protein structure and function predictions. Models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations.
[http://robetta.bakerlab.org/ Robetta]
Robetta is a full-chain protein structure prediction server. It parses protein chains into putative domains and models those domains either by homology modeling or by de novo modeling
[http://www.reading.ac.uk/bioinf/ModFOLD/ The ModFOLD Model Quality Assessment Server]
ModFOLD is a server which can provide a single score and a p-value relating to the predicted quality of a single 3D model of a protein structure and rankings for multiple 3D models for the same protein target according to predicted model quality. It also may do some predictions of the local quality within multiple models.
More detailed description of used methods is available here
Evaluation
We used the following measures of the models validity
Ramachandran plot
Ramachandran plot is a plausible way to depict dihedral angles phi against psi in the amino acids backbone residues in protein structure. Since these angles are mainly responsible for protein conformation the plot indirectly reveal the local geometry of a polypeptide chain. If the analysed structure has a large number residues having dihedral angles with unexpected values for both dihedral angles it suggest that the structure might be incorrect.
RMSD
One of the most widely accepted difference measures for conformations of a molecule is least root mean square deviation (RMSD). To calculate the RMSD of a pair of structures it is required for each structure to be represented as a 3N-length vector of coordinates. The RMSD is the square root of the average of the squared distances between corresponding atoms of both compared structures. It is a measure of the average atomic displacement between the two conformations.
TM-score
TM-score is a recently proposed scale for measuring the structural similarity between two structures . The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will arise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. The value of TM score is in the range of [0,1] and a TM-score >0.5 indicates a model of correct topology and a TM-score < 0.17 means only a random similarity. These cutoff does not depends on the protein length.
C-score
C-score is a confidence score for estimating the quality of predicted models by I-TASSER server. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa. In a benchmark test set of 500 non-homologous proteins, it has been found that C-score is highly correlated with TM-score and RMSD.