Pipeline for Protein 3D Structure Prediction
Protein Structure Prediction Pipeline - PSPP

This software suite performs protein structure prediction on a genomic scale, with varying prediction accuracy depending on sequence homology[1] with previously known protein structures[2]. Sequences with high homology to known structures can be predicted at atomic resolutions, and might be used to identify targets for drug design. While sequences with lower similarity to know protein structures may be annotated at the fold topology level, which can provide clues towards function.

This application directs a set of protein sequences to three possible software programs: a) a comparative modeller (PSI-BLAST/Nest), b) a fold recognition/threader (PROSPECT II/Nest) [3], and c) an ab initio folder (Rosetta) [4], based on the level of sequence similarity (homology) of the query sequence with proteins that have known structure in the Protein Database [2]. The comparative modeller and fold recognition are capable of processing one protein/CPU/hour, while the ab initio folder is capable of processing one protein per 16 CPUs per day.

The application runs in parallel and is available to DoD HPC users. It has a graphical user interface to submit and monitor jobs and view results. It is currently deployed on jaws at the Maui High Performance Computing Center.

      

[1] Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25:3389.
[2] Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padilla D, Ravichandran V, Schneider B, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002 Jun;58:899.
[3] Kim D, Xu D, Guo JT, Ellrott K, Xu Y. PROSPECT II: protein structure prediction program for genome-scale applications. Protein Eng. 2003 Sep;16:641.
[4] Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997 Apr 25;268:209.
Planned Upgrades
  • Adding sequence-level analysis, such as secondary structure prediction, transmembrane helix prediction, and disorder prediction
  • Incorporating automated domain prediction for larger sequences
  • Enhancing the GUI, including dynamic output, job monitoring, and job restarting
Publications
Lee, M., I.-C. Yeh, N. Zavaljevski, P. Wilson, and J. Reifman. A software pipeline for protein structure prediction. Paper presented at 25th Army Science Conference. Orlando, FL. 2006 November 17-21:1-8. [PDF]