Spring 2014: Introduction to Bioinformatics and Computational Biology (under construction)

Schedule: Tuesday & Thursday 1:30-2:50pm (starting from April 8, 2014)

Location: TTI-C conference room 526 on the 5th floor, 6045 S Kenwood Ave, Chicago, IL 60637

Instructor: Jinbo Xu (jinboxu@gmail.com, office: TTI-C room 528)


The UChicago students can register this course through the University of Chicago course registration system.

Audit is also highly welcomed.



With availability of a large-scale of genomic, expression and structural data, mathematics/statistics/computer science is being extensively used for the understanding of biological data at the molecular level. This course will focus on the application of machine learning and computer algorithms to the problems in the field of molecular biology. In particular, this course will cover some fundamental computational molecular biology problems including sequence alignment, homology search, RNA/protein structure analysis and prediction, gene expression, biological network analysis and next-generation sequencing.


Non-biology students are highly encouraged to read the following materials before attending this class since they will not be covered in the class.

1. The Department of Energy's Primer on Molecular Genetics.

2. The Department of Energy's Overview of the Human Genome Project.

3. Hunter's molecular biology for computer scientists.

4. National New Biology Initiative: A New Biology for the 21st Century.


This course will cover the following topics.

  1. homology search ( 1 week)
  2. sequence alignment and motif discovery ( 1 week)
  3. next-gen sequencing and genome assembly ( 1 week)
  4. protein sequence/structure analysis including alignment, classification, structure and function prediction ( 2 weeks)
  5. RNA sequence/structure analysis including alignment, classification and prediction ( 1 week)
  6. gene expression analysis (1 week)
  7. biological network analysis ( 1 week)
  8. phylogeny ( 1 week)

A temporary reading list is available at here.

Intended Audience


Graduate students or senior undergraduate students with Math/CS/statistics/biology background.

To be able to finish the assignments and the final research project, students may do programming with C/C++, Perl, Python, Java, Matlab or R.

If you cannot do programming with any of the languages, you may conduct deep biological analysis by employing existing bioinformatics tools.



There will be no examination for this course. The final grade consists of three components: some assignments, one final research project and attendance. For the assignments, you can re-implement a popular algorithm or conduct an experiment to compare several popular bioinformatics tools and summarize your work in a technical report (around 5 pages). The assignments will account for 35% of the final grade. The final research project requires you to develop some new algorithms for a bioinformatics problem. You are not required to come up with extremely innovative ideas, although it is highly encouraged. Incremental improvement over existing algorithms is acceptable for the final research project. Please hand in a report of the final research project. The final project accounts for 55% of the final grade. All the students are required to finish both the assignments and the final research project. However, undergraduate students will be marked more generously. The students have to attend the class to earn the remaining 10%.

Example assignments

1)      Redevelop the PAM and BLOSUM matrices and compare them with the published matrices.

2)      Conduct experiments to compare PSI-BLAST, CS-BLAST and HHBlits

3)      Re-implement the dynamic programming algorithm for local sequence alignment and compare your code with the established tools such as FASTA, BLAST and the Smith-Waterman algorithm

4)      Design an experiment to study how accurate is the BLAST E-value estimation (for protein homology search). Use a random model we taught in the class for both the query sequence and the database

5)      Benchmark several multiple sequence alignment tools such as ProbCons, T-Coffee, MUSCLE

6)      Conduct experiments to compare a few protein structure prediction web servers

7)      Conduct experiments to compare a few protein function prediction web servers


The due date of the assignment is in the middle of the winter quarter. You can use existing libraries or Matlab to implement your algorithm. However, please clearly point out your contribution in your report. If you use other bioinformatics libraries, please pay more attention to result analysis.

Example research projects

Please choose one of the following topics. You are also encouraged to propose your own topics. However, you can not work on the same topic for both your assignments and your research project.

1.      Develop new algorithms for pairwise or multiple protein-protein interaction network alignment

2.      Develop new algorithms for network motif discovery

3.      Develop new algorithms for the generation of degree-preserving random networks

4.      Develop new algorithms for protein interface alignment

5.      Develop new algorithms for alignment of protein binding sites

6.      Develop new algorithms for protein binding site prediction

7.      Develop new algorithms for RNA pseudo-knots prediction


If you need your final grade to graduate, please talk to me and hand in the final project earlier. If you need more time to complete the research project, please also talk to me.