Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Instructor: Ivan Titov
Time: Friday, 2.15 - 3.45 pm
Location: Building C 7.2, room 2.11
Office hours: Friday, 4 pm - 5 pm in Ivan's office (C7.4, 3.22), or send me a message by e-mail

Short Description

In this seminar, we will focus on generative (mostly Bayesian) models of texts. We will start with the most basic topic models (Latent Dirichlet Allocation) but then we will proceed to considering more recent and advanced generative models which induce topic segmentation, topic hierarchies, shallow semantic and syntactic representations and use some form of supervision. We are planning both consider inference techniques for these models (Expectation Maximization, more general variational methods, Markov chain Monte Carlo methods, belief propagation) and their application to various natural language problems (e.g., segmentation, summarization, sentiment analysis, grounded language acquisition).

The goals of this seminar is to both to understand the methodology (classes of models considered in NLP, approximation techniques for learning and inference) and to learn about interesting applications of generative methods which use little or know supervision.

The term paper is due on September 26 (see requirements below).



Attendance policy

You can skip ONE class without giving any explanation to me (if it is not the class on which you are presenting). If you need to skip more, you will need to write an additional critical review for every paper presented while you were absent.


Critical reviews

Term paper


Grading criteria:

Length: 12 - 15 pages

Deadline: September 26. I would recommend to submit it soon after your presentation, as it would probably be easy.

Submitted in PDF to my email


April 16

April 23

April 30

May 7

May 14

May 21

May 28

June 4

June 11

June 18

June 25

July 02

July 09

July 16

Full List of Papers

The papers are approximately in the order they are going to be presented. There may be some changes though, watch the schedule above for more details. The list in Google Docs (you should have a link) is supposed to be more up to date.