Pipeline for Protein Function Annotation
Pipeline for Protein Function Annotation - PIPA

PIPA - (Pipeline for Protein Annotation) provides a bioinformatics-based approach for predicting the functions and properties of a protein directly from its amino acid sequence. PIPA annotates protein functions by combining the results of multiple integrated programs and databases into common Gene Ontology (GO) [1] terms. The major algorithms implemented in PIPA include: (1) a profile database generation algorithm, which enables the generation of new, customized profile databases to enhance the prediction of particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm, which generates consensus annotations from the integrated programs and databases.

PIPA, deployed on two Linux clusters, jvn at the Army Research Laboratory Major Shared Resource Center and jaws at the Maui High Performance Computing Center, includes the following features:

  • A newly developed Catalytic Families (CatFam) profile database, which provides for more precise enzyme function predictions [2].
  • A set of integrated protein prediction programs, such as the 11 member databases of InterPro [3] and CDD [4].
  • A consensus function annotation based on functions predicted from the multiple independent programs.
  • A throughput of 4,000 protein annotations in ~6 hours using 64 CPUs. [5]
  • Is easily accessible through its Web-based graphic user interface (GUI) using the User Interface Toolkit (UIT).
      

[1]Available at www.geneontology.org.
[2]Yu, C., N. Zavaljevski, V. Desai, and J. Reifman. Genome-wide enzyme annotation with precision control: Catalytic Families (CatFam) databases. IN PRESS - Proteins: Structure, Function, and Bioinformatics. 2008.
[3]Available at www.ebi.ac.uk/interpro.
[4]Available at www.ncbi.nlm.nih.gov/sites/entrez?db=cdd.
[5]Yu, C., N. Zavaljevski, V. Desai, S. Johnson, F. J. Stevens, and J. Reifman. The development of PIPA: An integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics. 2008 January 29; 9:52.
Publications
Yu, C., N. Zavaljevski, V. Desai, S. Johnson, F. J. Stevens, and J. Reifman. The development of PIPA: An integrated and automated pipeline for genome-wide protein function annotation. BMC Bioinformatics. 2008 January 29; 9:52. [PDF]
Yu, C., V. Desai, N. Zavaljevski, and J. Reifman. PIPA: A high-throughput pipeline for protein function annotation. Paper presented at HPCMP Users Group Conference. Seattle, WA. 2008 July 14-17. [PDF]