RaptorX-SS8 [Click here to DOWNLOAD]
1.
Overview
RaptorX-SS8
is a software package that predicts both 3-class and 8-class protein secondary
structure using a probabilistic graphical model Conditional Neural
Fields (CNFs).
This package takes as input PSSM (position specific score matrix)
generated by
PSIBLAST, the physico-chemical properties of amino acids and their
statistical properties
to predict secondary structure. For each position of a given protein, RaptorX-SS8
outputs the probability of this position belonging to each of the
three or eight
secondary structure types. The type with the highest probability is
used as the
predicted type. The technical detail of this software is described in
the following paper.
Zhiyong Wang, Feng Zhao, Jian Peng, and Jinbo Xu. Protein 8-class Secondary Structure Prediction Using Conditional Neural Fields, Proceedings of IEEE BIBM 2010, Dec 2010, Hong Kong.
This software has been compiled and tested on a Ubuntu 9.04 Linux server (kernel 2.6.28-19-server) with Quad-Core AMD Opteron(tm) Processors.
2. Installation
Before installation, please make sure that Perl v5.10.0 is properly installed on your computer systems. To install the package, first create a new folder and uncompress all the files in the package to the folder and then run setup.pl to setup RaptorX-SS8 as follows.
>perl ./setup.pl -home [the full path of this folder] -blast [the full-path executable file for the Psiblast program, usually psiblast or blastpgp] -nr [the full-path file of the non-redundent (NR) database (no suffix)]
For example, in the case that the NR database, including files nr.00.phr, nr.00.pin, nr.00.psq ... nr.03.psq, is kept in the folder /home/bob/db/nr/ ,
>perl ./setup.pl -home /home/bob/raptorxss8 -blast /usr/bin/psiblast -nr /home/bob/db/nr/nr
3. Run RaptorX-SS8
a) Eight-class secondary structure prediction:
To predict 8-class secondary structure for a protein sequence in 1azz.seq, you can use the following command.
>./bin/run_raptorx-ss8.pl examples/1aaz.seq
RaptorX-SS8 will generate a single result file 1aaz.ss8 in current directory.
b) Three-class secondary structure prediction:
To predict 3-class secondary structure for a protein sequence, you can use the following command.
>./bin/run_raptorx-ss3.pl examples/1aaz.seq
RaptorX-SS8
will produce two result files 1aaz.ss3 and 1aaz.horiz in current
directory.
4. File formats
The input file can be in a FASTA-formatted file or a plain text file. The amino acid type 'X' is allowable in the input file. See examples/1aaz.seq and examples/T0643.seq for two example input files. The ".ss8" file contains two comment lines starting with "#" followed by prediction results. The results are formatted as a table with 11 columns and as many rows as the length of the protein sequence. Each row corresponds to one residue in the sequence. The first column is the residue index number. The 2nd and 3rd columns are the amino acid type of the residue and the predicted secondary structure type, respectively. The 4th-11th columns are the eight probability values for the 8 secondary structure types in the order of H, G, I, E, B, T, S and C.
Similar to the ".ss8" file, the ".ss3" file contains the 3-class prediction result. The ".ss3" file has a very similar format as the ".ss2" file generated by PSIPRED. The prediction result is formatted as a table with 6 columns and as many rows as the length of the sequence. Each row corresponds to one residue in the sequence. The 1st column is the residue index number. The 2nd and 3rd columns are the amino acid type of the residue and the predicted secondary structure type, respectively. The 4th-6th columns are the three probability values for the 3 secondary structure types in the order of H(alpha-helix), E(beta-strand) and C(loop).
The ".horiz" file has a similar format as the ".horiz" file generated by PSIPRED, which contains a confidence value, the predicted secondary structure type and amino acid type for each residue in the protein.
5. Contact:
Zhiyong Wang (zywang@ttic.edu) and Jinbo Xu (jinboxu@gmail.com)