INTRODUCTION

EPAD is an innovative position-specific distance-dependent statistical potential for protein structure and functional study. EPAD has different energy profiles for the same type of atom pairs, depending on their sequence positions. This is very different from currently popular potentials such as DFIRE and DOPE, which have a single energy profile for the same type of atom pairs across all proteins.

Experimentis show that EPAD show that EPAD greatly outperforms currently popular higher-resolution full-atom potentials in several decoy discrimination tests and also ab initio folding. This implies that statistical potentials can be significantly improved using evolutionary information.

For more details, please see the REFERENCES.

INSTALLATION

Unzip the EPAD.tar.gz file, you will find three sub-directories: bin, config, and SEQ. The files in the bin directory are the executables for generating necessary preliminary data files from a given sequence, and for calculating the potential for decoys (EPADCalc); the config directory contains the configuration files and model files for EPAD to generate potentials; the SEQ directory is for the user specific data.

EPAD needs to run some executables from PSI-BLAST, PSIPRED and HHpred to generate the preliminary data files. Those executables are already enclosed in this package. By default, PSI-BLAST uses NR database, which can be downloaded at ftp://ftp.ncbi.nih.gov/blast/db/ or here.

Please modify the file generateEPADfeature.sh by setting the BLASTDB parameter to the correct NR database location in the TODO section:

# TODO: please specify the directory where you have installed NR database
export BLASTDB=~/NR/nr

Or using the following command for bash:

BLASTDB=~/NR/nr; export BLASTDB

RUN EPAD

EPAD works in two steps.

Step 1. Preparation

EPAD accepts FASTA format sequence file as its input. Given a sequence file named *.seq, you need to copy this file to the SEQ directory, then run generateEPADfeature.sh.
Here's an example (Suppose the sequence file is 1a19.seq):

./generateEPADfeature.sh 1a19

The only parameter is the name of the target protein.

The above command will generate feature files (e.g. 1a19.epad and 1a19.epad_local) in the SEQ/EPAD directory.

Step 2. Calculating Potential

When the feature files are ready, you can run EPADCalc in the bin directory to calculate the potentials of the decoys. We have provided an example shell script file: runEPAD_example.sh for calculating the potentials of a list of decoys of the same target protein sequence.

Example:

./runEPAD_example.sh 1a19

The only parameter is the target protein name.

Output:

The above command will generate a new data file named 1a19.EPAD under the working directory. It contains the potentials for each decoy structure.

Please note:
1. There is a DECOYS folder, which contains the 1a19.lst file and a sub-folder 1a19 containing 100 decoys from the Rosetta decoy set as an example. The 1a19.lst file lists the names of the 100 decoys.

2. We assume that the amino acids in the sequence file are numbered starting from 1 to sequence length. The amino acids (residues) in the decoy structures shall match with the sequence file. Missing residues are allowed, as long as the remaining residues are numbered the same as in the sequence file.

REFERENCES

1. Feng Zhao, Jinbo Xu, A Position-Specific Distance-Dependent Statistical Potential for Protein Structure and Functional Study, Structure, Volume 20, Issue 6, 6 June 2012, Pages 1118-1126, ISSN 0969-2126