INTRODUCTION
EPAD is an innovative position-specific distance-dependent statistical
potential for protein structure and functional study.
EPAD has different energy profiles for the same type of atom pairs, depending on their sequence positions. This is very different from currently popular potentials such as DFIRE and DOPE, which have a single energy profile for the same type of atom pairs across all proteins.
Experimentis show that EPAD show that EPAD greatly outperforms currently popular higher-resolution full-atom potentials in several decoy discrimination tests and also ab initio folding. This implies that statistical potentials can be significantly improved using evolutionary information.
For more details, please see the REFERENCES.
INSTALLATION
Unzip the EPAD.tar.gz file, you will find three sub-directories: bin, config,
and SEQ. The files in the bin directory are the executables for generating
necessary preliminary data files from a given sequence, and for calculating the
potential for decoys (EPADCalc); the config directory contains the
configuration files and model files for EPAD to generate potentials; the SEQ
directory is for the user specific data.
EPAD needs to run some executables from PSI-BLAST,
PSIPRED and HHpred to generate the
preliminary data files. Those executables are already enclosed in this package. By default, PSI-BLAST uses NR database, which can be
downloaded at ftp://ftp.ncbi.nih.gov/blast/db/
or here.
Please modify the file generateEPADfeature.sh by setting the BLASTDB parameter
to the correct NR database location in the TODO section:
# TODO: please specify the directory where you have installed NR database
export BLASTDB=~/NR/nr
Or using the following command for bash:
BLASTDB=~/NR/nr; export BLASTDB
RUN EPAD
EPAD works in two steps.
Step 1. Preparation
EPAD accepts FASTA format sequence file as its
input. Given a sequence file named *.seq, you need to copy this file to the SEQ
directory, then run generateEPADfeature.sh.
Here's an example (Suppose the sequence file is 1a19.seq):
./generateEPADfeature.sh 1a19
The only parameter is the name of the target protein.
The above command will generate feature files (e.g. 1a19.epad and
1a19.epad_local) in the SEQ/EPAD directory.
Step 2. Calculating Potential
When the feature files are
ready, you can run EPADCalc in the bin directory to calculate the potentials of the decoys. We have
provided an example shell script file: runEPAD_example.sh for calculating the
potentials of a list of decoys of the same target protein sequence.
Example:
./runEPAD_example.sh 1a19
The only parameter is the target protein name.
Please note:
1. There is a DECOYS folder, which contains the 1a19.lst file and a sub-folder 1a19 containing
100 decoys from the Rosetta
decoy set as an example. The 1a19.lst file lists the names of the 100 decoys.
2. We assume that the amino acids in the sequence file are numbered starting from 1 to sequence length.
The amino acids (residues) in the decoy structures shall match with the sequence file. Missing residues are allowed, as
long as the remaining residues are numbered the same as in the sequence file.