This class is an introduction to statistical natural language processing (NLP) for graduate students. The goal is to introduce the students to key challenges and foundational methods of NLP. Specifically, we will study syntactic parsing for constituent and dependency representations, look into shallow representations of semantics (semantic role labeling) and consider statistical machine translation. We will also consider some background from machine learning (specifically, discrimantive and generative models of structures) crucial in modern NLP.
We will use Jurafsky and Martin's "Speech and Language Processing" (Edition 2) as the main text book. Sections / chapters related to specific lectures are listed below. Optionally, I would suggest to consider the Manning and Schuetze textbook "Statistical Natural Language Processing". Nevertheless, some of the material presented in the lectures is not available in any of them.
The practical assignments should be delivered
on Blackboard via email to milosh.stanojevic at gmail.com for each of the assignments:
The assignments are done in groups of 3 students (a special permission from the assistant is needed for a group of 2 people). However, the projects need to be done in different groups (though, of course, different stages of the Project 1 need to be done in the same groups).
The assignment should
using Blackboard via email to milosh.stanojevic at gmail.com, along with a short report and a very brief README. No treebank data (model parameters, model predictions) should be
submitted. The subject of your submission message should specify the assignment number and stage, and list names
of students in the group.
The project description for Project 1 (stages 1 and 2) is online on Blackboard (including data and initial source code).
The list of papers for the assignment is online. A formal description of the requirements will be online soon. However, you can already start reading, forming teams. You must choose one topic and review all the papers associated with the topic:
|Topic 1||Morphologically-rich languages, joint modeling of morphology and syntax. The papers are Goldberg and Tsarfaty (2008) and Bohnet et al. (2013). Optionally, additionally consider joint dependency parsing and PoS-tagging: Bohnet and Nivre (2012), it should help you to follow Bohnet et al (2013) more easily.|
|Topic 2||Learning grammar and parsing states' refinements using neural network methods. The papers are Titov and Henderson (2010) and Socher et al. (2013). Optionally: evaluation of the approach of Titov and Henderson (2010) on 10 languages, Titov and Henderson (2007).|
|Topic 3||CCG parsing and its extensions for parsing into semantic representations (on restricted domains): Curran and Clark (2007) and Collins and Zettlemoyer (2005).|
|Oct 29||Introduction to NLP, modeling sequences (HMM)||Reading: J&M 5.1 - 5.5; 6.1 - 6.4|
|Nov 5||Modeling sequences: discriminative methods, syntax: introduction and basics||Reading: J&M 12.1 - 12.4, 14.1, optionally 12.5 - 7||Nov 12||Syntax: parsing with (P)CFGs (CKY), syntactic language models, parser evaluation||Reading: J&M 13.1 - 13.4, 14.2 - 14.4, 14.7|
|Nov 19||Syntax: problems of PCFGs, grammar annotation approaches (structural annotation, lexicalization), discriminative methods||Reading: J&M 14.1 - 14.6|
|Nov 26||Dependency parsing: graph-based and transition-based methods|
|Dec 3||Syntax: relying on larger tree fragments - DOP grammars (guest lecture: Jelle Zuidema)|
|Dec 10||Machine translation (guest lecture: Khalil Sima'an)|
Lectures slides will be made downloadable (after each lecture).