This class is an introduction to statistical natural language processing (NLP) for graduate students. The goal is to introduce the students to key challenges and foundational methods of NLP. Specifically, we will study syntactic parsing for constituent and dependency representations, look into shallow representations of semantics (semantic role labeling), topic models and distributional semantics methods. Several lectures will cover important NLP applications such as statistical machine translation and summarization. We will also consider some background from machine learning (specifically, discrimantive and generative models of structures, latent variable models and, time permitting, Bayesian modeling methods and representation learning techniques) crucial in modern NLP.Blackboard will be used for semi-urgent up-to-date information. The only exception are lecture slides and reading recommendations: they will be posted here.
We will use Jurafsky and Martin's "Speech and Language Processing" (Edition 2) as the main text book. Sections / chapters related to specific lectures are listed below. Optionally, I would suggest to consider the Manning and Schuetze textbook "Statistical Natural Language Processing". Nevertheless, much of the material presented in the lectures is not available in any of them.
AssignmentsThe will be four (no-programming) assignments. They will be posted on Blackboard in due time. The submission procedures will be described there as well.
|Oct 26||Introduction to NLP, project discussion|
|Oct 29||Topic models (start)||Not in the textbook, suggested extra reading: PLSA, LDA, Gibbs sampling for LDA|
|Nov 2||Topic models|
|Nov 5||Applications / generalizations of Topic models, Hidden Markov Models, decoding algorithm (Viterbi)||Reading: J&M 5.1 - 5.5; 6.1 - 6.4|
|Nov 9||Hidden Markov Models: discriminative estimation (structured perceptron), unsupervised estimation (forward-backward)||Reading: from Nov 5 plus J&M 12.1-4, 13.1-4, 14.1-7;|
|Nov 12||Hidden Markov Models: discriminative estimation (CRF), neural sequence models (RNNs / encoder-decoder);|
|Nov 16||Syntactic (constutuent) parsing||Reading: J&M 12.1-4, 13.1-4, 14.1-7;|
|Nov 23||Syntactic (constituent) parsing (continued), dependency syntax||Reading: J&M 12.1-4, 13.1-4, 14.1-7;|
|Nov 26||Syntactic dependency parsing ( animation of transition-based parsing)|
|Nov 30||Distributional semantics, preliminary set of topics for the exam|
|Dec 4||Machine translation (slides from previous year)||Reading: J&M 25;|
|Dec 7||Machine translation (no slides for the moment, not part of the exam)|
|Dec 10||Towards machine reading and reasoning|
Lecture slides will be made downloadable (after each lecture).