Team:Warsaw/Modeling methods

From 2009.igem.org

Contents

Short description of used programs and servers

Robetta and Rosetta

Robetta is a full-chain protein structure prediction server. It parses protein chains into putative domains with the Ginzu protocol, and models those domains either by homology modeling or by de novo modeling

The first part of the modeling process consists of determination of the locations of putative domains in the query sequence, assignment of domains to the appropriate protocol, and identification of any likely homologs with experimentally characterized structures. The approach consists of scanning the target sequence with successively less confident methods to assign predicted domains. Once those regions are identified, cut points in the putative linkers are determined, if possible a single parent PDB chain is associated with each putative domain, and for each putative domain the homology modeling or de novo protocol is subsequently initiated.

Robetta uses specific alignment method to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. To realize this attempt Robetta generates three- and nine-residue fragment libraries that represent local conformations seen in the PDB, and then assembles models by fragment insertion using a scoring function that favors protein-like features [1, 2].

i-TASSER

Tasser is a full-chain proteins structure server which is base on iterative algoritm for de novo modeling. After the protein submision , the server tries to retrieve template proteins of similar folds (or super-secondary structures) from the PDB library.

In the next step fragments excited from the PDB templates are reasembled into full-length models by replica-exchange Monte Carlo simulations with the threading unaligned regions built by de novo modeling. In case where no appropriate template is identified I-TASSER will build whole structures without structural template.

Finally, the fragments assembly simulation is performed again starting from cluster centroids where the spatial restrains collected by additional software are used to guide the simulations. The purpose of the second iteration is to remove the steric clash as well to refine the global topology of the cluster centroids. The decoys generated in this simulation are subsequently clustered and the structures with lowest energy are selected. The final atomic details are built from the selected decoys through the optimalization of the hydrogen-bonding network. [3, 4]

MUSTER

MUSTER is a threading program which extend the secondary structure enhanced sequence profile-profile algorithm (PPA). It improve the PPA results due to addition some structural-derived features such as depth-dependent structure profiles ans hydrophobic scoring matrix. These features are implemented in the dynamic programming procedure [5].

MODELLER

MODELLER is a computer program which is used for comparative modeling of protein three-dimensional structures. The user provides an alignment of a to create a spatial constrains that are necessary to limit the exploration of conformational space. The restraints can operate on distances, angles, dihedral angles, pairs of dihedral angles and some other spatial features defined by atoms or pseudo atoms. Presently, MODELLER automatically derives the restraints only from the known related structures and their alignment with the target sequence.

A 3D model is obtained by optimization of a molecular probability density function (PDF). The density function for comparative modeling is optimized with the variable target function procedure in Cartesian space that employs methods of conjugate gradients and molecular dynamics with simulated annealing. The PDFs restrain C alpha-C alpha distances, main-chain N-O distances, main-chain and side-chain dihedral angles. A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. The 3D model of a protein is obtained by optimization of the molecular density function such that the model violates the input restraints only a little [6].

This program can perform many additional tasks, including de novo modeling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures [7].

3D-Jury

The 3D-Jury is a versatile system that generates meta-predictions from sets of models created using variable set of methods. It is not necessary to know prior characteristics of these methods. It incorporates the comparison of models as the main processing step. It follows an approach similar to that employed in the field of ab initio fold recognition. 3D-Jury takes as input groups of models which are generated by a set of servers, however, neglecting the assigned confidence scores. All models are compared with each other and a similarity score is assigned to each pair which is enough congruent after optimal superposition [8].

ModFOLD

modFOLD is a tool for evaluation of models accuracy which rely on neural networks which is trained to discriminate between models based on the TM-score. The models for the training set were built from large alignments set using an in-house program, which simply mapped aligned residues in the target to the full backbone coordinates of the template and carried out renumbering. The target-template pairs were then generated from an all against all comparison of the sequences from non-redundant fold library [9].

References

  1. Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malmström L, Robertson T, Baker D. De novo prediction of three-dimensional structures for major protein families. J Mol Biol. 2002 Sep 6;322(1):65-78. PMID:12215415
  2. Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res. 2006;34(17):e112. Epub 2006 Sep 13.Click here to read PMID: 16971460
  3. Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008 Jan 23;9:40. PMID: 18215316
  4. Wu S, Skolnick J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 2007 May 8;5:17. PMID: 17488521
  5. Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins. 2008 Aug;72(2):547-56. PMID: 18247410
  6. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993 Dec 5;234(3):779-815. PMID: 8254673
  7. Fiser A, Do RK, Sali A. Modeling of loops in protein structures. Protein Sci. 2000 Sep;9(9):1753-73. PMID: 11045621
  8. Ginalski K, Elofsson A, Fischer D, Rychlewski L. 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics. 2003 May 22;19(8):1015-8. PMID: 12761065
  9. McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics. 2007 Sep 18;8:345. PMID: 17877795