Pairwise Sequence Alignment

Needleman-Wunsch algorithm: A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol. 48(3):443-53.

Smith-Waterman algorithm: Identification of Common Molecular Subsequences, Journal of Molecular Biology, 147:195-197, 1981

Multiple Sequence Alignment

C. Notredame, D. G. Higgins and J. HeringaT-Coffee: A Novel Method for Fast and Accurate Multiple Sequence Alignment. J. Mol Biol (2000) 302, 205-217

Wallace IM, O'Sullivan O, Higgins DG and Notredame C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006 Mar 23;34(6):1692-9.

Edgar RC and S. Batzoglou. Multiple sequence alignment. Curr Opin Struct Biol. 2006 Jun;16(3):368-73. Epub 2006 May 5.

J. D. Thompson, D. G. Higgins and T. J. Gibson. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994, Vol. 22, No. 22, 4673-4680

R. C. Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004, Vol. 32, No. 5, 1792-1797

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Research, 2005

Alignment Scoring Function

Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. (1978) "A model of evolutionary change in proteins." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3." M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found., Washington, DC.

Schwartz, R.M. & Dayhoff, M.O. (1978) "Matrices for detecting distant relationships." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3." M.O. Dayhoff (ed.), pp. 353-358, Natl. Biomed. Res. Found., Washington, DC.

Altschul, S.F. (1991) "Amino acid substitution matrices from an information theoretic perspective." J. Mol. Biol. 219:555-565.

States, D.J., Gish, W., Altschul, S.F. (1991) "Improved sensitivity of nucleic acid database searches using application-specific scoring matrices." Methods 3:66-70.

Henikoff, S. & Henikoff, J.G. (1992) "Amino acid substitution matrices from protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919.

Altschul, S.F. (1993) "A protein alignment scoring system sensitive at all evolutionary distances." J. Mol. Evol. 36:290-300.

BLAST for Homology Search

Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410.

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.

PatternHunter for Homology Search

Bin Ma, John Tromp, Ming Li. PatternHunter: faster and more sensitive homology search. Bioinformatics, 18(3):440-445. March 2002.

Ming Li, Bin Ma, Derek Kisman, John Tromp. PatternHunter II: Highly Sensitive and Fast Homology Search. Journal of Bioinformatics and Computational Biology, 2(3):417-439. 2004.

Jinbo Xu, Daniel G. Brown, Ming Li, Bin Ma. Optimizing multiple spaced seeds for homology search. Journal of Computational Biology, 2005. Accepted November 2004.

Daniel G. Brown. Optimizing Multiple Seed for Protein Homology Search. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(1):29-38. 2005.

Brona Brejova, Daniel G. Brown, Tomas Vinar. Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences, 70(3):364--380. 2005.

Daniel G. Brown, Ming Li, Bin Ma. A tutorial of recent developments in the seeding of local alignment. Journal of Bioinformatics and Computational Biology, 2(4):819-842. 2004.

Compressive Genomics

 

Po-Ru Loh, Michael Baym and Bonnie Berger. Compressive Genomics. Nature Biotechnology, 2012.

 

Noah M. Daniels, Andrew Gallant, Jian Peng, Lenore J. Cowen, Michael Baym and Bonnie Berger. Compressive genomics for protein databases. Bioinformatics, 2013.

Significance of Alignment and Homology Search

Karlin, S. & Altschul, S.F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87:2264-2268.

Karlin, S. & Altschul, S.F. (1993) Applications and statistics for multiple high-scoring segments in molecular sequences. PNAS 90:5873-5877.

Dembo, A., Karlin, S. & Zeitouni, O. (1994) Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob. 22:2022-2039.

Altschul, S.F. (1997) "Evaluating the statistical significance of multiple distinct local alignments." In "Theoretical and Computational Methods in Genome Research." (S. Suhai, ed.), pp. 1-14, Plenum, New York.

Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF. (2001) "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements." Nucleic Acids Res. 2001 Jul 15;29(14):2994-3005. PubMed

Hidden Markov Model for Sequence Analysis

Yoon BJ. Hidden Markov Models and their Applications in Biological Sequence Analysis. Curr Genomics. 2009 Sep;10(6):402-15.

 

Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63.

 

Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005 Apr 1;21(7):951-60. Epub 2004 Nov 5.

Protein Structure Alignment

Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710 (download the PDF file and Correction).

 

Y. Zhang, J. Skolnick, TM-align: A protein structure alignment algorithm based on TM-score, Nucleic Acids Research, 2005 33: 2302-2309 (download the PDF file).

Matthew Menke, Bonnie Berger and Lenore Cowen. Matt: Local Flexibility Aids Protein Multiple Structure Alignment

Sheng Wang, Jianzhu Ma, Jian Peng and Jinbo Xu. Protein structure alignment beyond spatial proximity. Scientific Reports, 2013.

Protein Local Structure Prediction

D. T. Jones. Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J. Mol. Biol. (1999) 292, 195-202.

Sheng Wang, Jian Peng, Jianzhu Ma, Jinbo Xu. Protein secondary structure prediction using deep convolutional neural fields. Scientific Reports, 2016

Yuedong Yang, Jianzhao Gao, Jihua Wang, Rhys Heffernan, Jack Hanson, Kuldip Paliwal and Yaoqi Zhou. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Briefs in Bioinformatics, 2016.

 

Feng Zhao, Shuaicheng Li, Beckett W. Sterner and Jinbo Xu. Discriminative learning for protein conformation sampling. PROTEINS, 2008

Remote Homology Detection & Protein Threading

 

Profile HMM: http://bioinformatics.oxfordjournals.org/cgi/reprint/14/9/755

 

HHpred: Protein homology detection by HMM-HMM comparison.

 

Jianzhu Ma, Sheng Wang, Zhiyong Wang, Jinbo Xu. MRFalign: Protein Homology Detection through Alignment of Markov Random Fields. PLoS Computational Biology, 2014

Protein Contact Prediction and Contact-Assisted Folding

 

Sheng Wang, Siqi Sun, Zhen Li, Renyu Zhang, Jinbo Xu. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLOS Computational Biology, 2017.

 

Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012 Jan 15;28(2):184-90. doi: 10.1093/bioinformatics/btr638. Epub 2011 Nov 17.

 

Seemayer S, Gruber M, Soding J. CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics. 2014 Nov 1;30(21):3128-30. doi: 10.1093/bioinformatics/btu500. Epub 2014 Jul 26.

 

Thomas A. Hopf, Lucy J. Colwell, Robert Sheridan, Burkhard Rost, Chris Sander, Debora S. Marks. Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing. Cell, 2012

Protein Template-free Modeling

Thomas Hamelryck, John T. Kent and Anders Krogh. Sampling Realistic Protein Conformations Using Local Structural Bias.

Zhao F, Li S, Sterner BW and Xu J. Discriminative learning for protein conformation sampling.

Feng Zhao, Jian Peng, Joe DeBartolo, Karl F. Freed, Tobin R. Sosnick and Jinbo Xu. A Probabilistic Graphical Model for Ab Initio Folding

RNA Secondary Structure Prediction

Covariance model of RNA:  RNA sequence analysis using covariance models.

E. P. Nawrocki, D. L. Kolbe and S. R. Eddy. Infernal 1.0: Inference of RNA Alignments. Bioinformatics, 25:1335-1337, 2009.

CONTRAfold: RNA secondary structure prediction without physics-based models

Chuong B. Do, Chuan-Sheng Foo and Serafim Batzoglou: A max-margin model for efficient simultaneous alignment and folding of RNA sequences. ISMB 2008: 68-76

RNA 3D Structure Prediction

A Probabilistic Model of RNA Conformational Space

Automated de novo prediction of native-like RNA tertiary structures

The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data

Protein Interaction Network Alignment

Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology

IsoRankN: spectral methods for global alignment of multiple protein networks

Jason Flannick, Antal F. Novak, Chuong B. Do, Balaji S. Srinivasan, Serafim Batzoglou: Automatic Parameter Learning for Multiple Network Alignment. RECOMB 2008: 214-231

Somaye Hashemifar and Jinbo Xu. HubAlign: an accurate and efficient method for global alignment of protein-protein interaction networks. Bioinformatics, 2014

 

Somaye Hashemifar, Qixing Huang and Jinbo Xu. Joint alignment of multiple protein-protein interaction networks via convex optimization. Journal of Computational Biology, 2016