This is the course webpage for the Winter 2016 version of TTIC 31190: Natural Language Processing.
For the Spring 2018 course, go
here.
Quarter: Winter 2016
Time: Tuesday/Thursday 10:35-11:50 am
Location: Room 526 (fifth floor),
TTIC
Instructor: Kevin Gimpel
Instructor Office Hours: Mondays 3-4pm, Room 531,
TTIC, or by appointment
Teaching Assistant: Lifu Tu
Teaching Assistant Office Hours: Thursdays 4-5pm, Room 501,
TTIC
Midterm: February 18
Prerequisites:
There are no formal course prerequisites, but I will assume that you have some programming experience and that you are familiar with basic concepts from probability and calculus. No specific programming language will be required.
Undergraduates are welcome to take the course if they have the relevant background, though they may not be able to enroll online. Please bring an enrollment approval form to me at the first lecture and I will sign it.
Contents:
Textbooks
Grading
Topics
Project
Collaboration Policy
Lateness Policy
Textbooks
All textbooks are optional.
Primary:
SLP2: Daniel Jurafsky and James H. Martin. Speech and Language Processing (2nd Edition). Pearson: Prentice Hall. 2009.
SLP3: Drafts of some chapters of the 3rd edition are
freely available online.
Additional readings drawn from the following:
TAOD: Guy Lebanon. The Analysis of Data, Volume 1: Probability. (
freely available online)
IIR: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval. Cambridge University Press. 2008. (
freely available online)
Grading
3 assignments (15% each)
midterm exam (15%)
course project (35%):
project proposal and meeting with instructor (10%)
class presentation (5%)
final report (20%)
class participation (5%)
Topics
- Introduction: NLP applications, challenges of computing with language, experimental principles
- Text Classification: spam classification, sentiment analysis, topic classification, classifiers, linear models, features, naive Bayes, training linear classifiers via loss function optimization, loss functions, stochastic gradient descent
- Words: what is a word?, tokenization, word senses, morphology
- Lexical Semantics: distributional semantics, word embeddings, word clustering
- Language Modeling: n-gram models, smoothing
- Sequence Labeling: part-of-speech tagging, named entity recognition, hidden Markov models, conditional random fields, dynamic programming, Viterbi
- Syntax: weighted context-free grammars, dependency syntax, parsing algorithms
- Neural Methods in NLP: representation learning, neural language models, neural methods for classification
- Structured Neural Networks: recurrent, recursive, and convolutional nets for NLP
- Semantics: compositionality, semantic role labeling, frame semantics, semantic parsing
- Unsupervised NLP: unsupervised tagging/parsing, Brown clustering, topic models
- Machine Translation: word alignment, translation modeling, decoding
- Other NLP Tasks: coreference resolution, question answering, summarization, conversational agents
Project
The project proposal is due Feb. 16. Details are provided
here.
The final project report is due March 17. Details are provided
here.
We recommend that you use the ACL2016 LaTeX style files:
http://acl2016.org/files/acl2016.zip.
Analogous files for Microsoft Word are available
here.
Collaboration Policy
You are welcome to discuss assignments with others in the course, but solutions and code must be written individually.
The project may be done individually or in a group of two. Each member of the group will receive the same grade for the project.
Lateness Policy
We want you to do the assignments, even if that means turning them in late (whether partially or fully). There will likely be a penalty assessed if assignments are turned in after the due dates, but we will continue to accept late submissions of assignments until the end of the quarter.