Home

OVERVIEW

The digital world is full of information, most of which is communicated in language (webpages, social media posts, news articles, long-format stories, etc). How can we use computers to understand and use this information to perform meaningful tasks? This course will provide you with the opportunity to learn the latest, advanced machine learning approaches to language processing. Further, a core component of the course will concern research, as you’ll produce an original research project while working in groups of 3-4 students.

LEARNING OBJECTIVES

By the end of the course, you will:

Understand the mathematical foundations of modern approaches to language processing
Be familiar with open problems & research topics around training, inference, and evaluation of NLP systems
Conduct substantial, original NLP research (e.g., critically read papers published in top conferences, understand them, and execute your own ideas so as to answer novel research questions)

PREREQUISITES

NLP: No previous experience expected or necessary. Note in particular that this course is meant to replace, not follow, 6.4610; if you’ve taken 6.4610 there may be some repeated content.
Machine Learning: basic knowledge of ML knowledge from 6.390 or a graduate ML class
Probability and Statistics: (e.g., 6.370, 6.380, 18.05)
Multivariable Calculus: (e.g., 18.02)
Linear Algebra: (e.g., 18.061)
Algorithms: (e.g., 6.120, 6.121)
Programming: knowledge of Python and at least one class with substantial object-oriented programming (e.g., 6.101)

STAFF


Jacob Andreas (Instructor)	Chris Tanner (Instructor)	Omar Khattab (Instructor)	Clarise Han (TA)	Joanna Kondylis (TA)

Maggie Lin (TA)	Linlu Qiu (TA)	Alana Renda (TA)	Zekai Wang (TA)	Zhaofeng Wu (TA)

LOGISTICS

LECTURE

Tuesdays and Thursdays @ 2:30pm - 4:00pm in 54-100
Lectures are in-person
Attendance and active participation is highly encouraged to facilitate an enriching learning environment for everyone

KEY DATES

Quiz 1: March 10
Quiz 2: April 21
HW1: released Feb 12, due Feb 26
HW2: Released Feb 26, due Mar 19
HW3: released Mar 19, due Apr 9
Pre-proposal: due Mar 3
Proposal: due Mar 19
Poster sessions: May 7 and May 12
Project report: due May 12

We will make exceptions (e.g. makeups) for official institute activities or excused illnesses / personal emergencies. We cannot grant exam accommodations for job interviews, personal travel, or conflicts with other classes.

COURSE STRUCTURE

The main delivery of information will be via lectures, which will occur every time class meets (aside from Research Project presentations). Your learning will be assessed via three homework assignments, a midterm exam, and a significant research project (in groups of 3-4 students).

3 homework assignments (30% of grade, 10% each): There will be three equally-weighted, individual homework assignments. See the Collaboration Policy below for details.
2 quizzes (30% of grade, 15% each): The quizzes are intended to assess students’ knowledge of foundational content. They will be conducted in class (Mar 10 and Apr 21) on paper, closed-book, and will include a combination of multiple-choice and free-response questions. The midterm will not ask students to write any code on paper. Further details will be presented closer to the exam date.
Research project (40% of grade): Throughout the semester, students will work in groups of three or four on a research project of their choosing. Students will present the final poster on May 7 or May 12. We’ll provide support over the course of the semester for coming up with project ideas, and you’ll have a number of project-related deliverables over the course of the semester:
- A project pre-proposal (ungraded)
- A project proposal (10% of your grade)
- A final report (25% of your grade)
- A poster presentation (5% of your grade)

GRADING

Students will have a total of three free late days to use throughout the semester without any penalty. NOTE: valid excuses (e.g., medical excuses) do not count toward your three allotted “free” late days. Any late days used beyond these three will result in a deduction of 10 points per day.

For example, let’s say a student has already used three free late days earlier in the semester, and then turned in another homework assignment one day late. If the graded homework received a 88%, then it will be reduced to a 78% due to being a day late. If that student had turned in the assignment two days late (meaning, a grand total of five late days used), then that particular assignment would have received a 68%, due to being two days late.

We’ll be using the standard grade cutoffs:

[90, 100] = A
[80, 90) = B
[70, 80) = C
[60, 70) = D
[0, 60) = F

COURSE POLICIES

COLLABORATION

The homework assignments must be conducted individually. However, no single student should feel alone in the course. So, we encourage you to talk with and discuss the assignments with your fellow classmates, but this must be at the conceptual level. That is, no student should ever see another student’s solutions or code. Your code must be written exclusively by you. If you post or share your homework assignment online (even if it only contains the questions and not solutions), this violates our academic policy and you will be reported to the university. This includes posting your assignment on GitHub. Do not do this. In other words, your homework assignment is a private copy that only you can see. If you’re unsure if something is allowed, please speak with us first. Any violation to the above constitutes Academic Dishonesty, and will result in a zero on the assignment in question and a letter to file.

USE OF AI TOOLS

This class is about AI-based approaches to language processing. But use of AI for learning purposes comes with considerable tradeoffs: in our experience, students who try to use AI assistance to complete homeworks, summarize readings, or write pieces of the report trade off speed for actually learning the material in the class, and come away with a shallow understanding of course materials or none at all.

For this class, you MAY NOT use AI tools to write code for your homework assignments or draft any written deliverables (homeworks or project proposals / reports). This ensures you learn the content and skills of the course. You MAY use AI to generate code for your project, assist with research on related work, citation, and word-level thesaurus support.

To cite AI, you should provide an acknowledgement statement at the end of any deliverable you used Gen AI on. In the acknowledgement, you should describe how you used Gen AI and provide links to the prompts and outputs.

GETTING HELP

The general staff mailing list is nlp-staff@mit.edu. Please use this for all communication related to course logistics etc. If you have personal matters that you want to discuss directly with the instructors, please email Jacob (jda@mit.edu), Omar (okhattab@mit.edu) or Chris (cwt@mit.edu). If you need an extension for a homework assignment, please cc your GradSupport / Student Support Services contact on all correspondence.

QUICK ACCESS

Canvas: Lecture slides, homework assignments, and course announcements
General Staff Mailing List: Use nlp-staff@mit.edu for all communication related to course logistics
Course Notes: Updated throughout the semester
Lecture Slides
Panopto