banner art
Peter Miller, Communication, 1940s

CS 2731 Introduction to Natural Language Processing

University of Pittsburgh, Fall 2023
Time MW 2:30-3:45pm
Location Sennott Square 6110
Instructor Michael Miller Yoder, PhD
You can call me "Michael" (preferred), "Prof. Yoder", or "Dr. Yoder"
Instructor contact mmy29@pitt.edu
Instructor office hours W 1:30-2:30pm and by appointment, Sennott Square 6505
TA Sabit (Pantho) Hassan
TA office hours Tu 2:45-3:45pm, Sennott Square 5106
Textbook [J+M] Jurafsky and Martin, Speech and Language Processing, 3e draft, 2023-01-07 (free online)

Schedule

Subject to change. Last revised 2023-11-30. All due dates are at 11:59pm ET except when indicated (reading quizzes and discussion posts due at 12pm).

Session Date Topic Readings Assignments
Module 1: Introduction and text processing
1 08-28 M Course, NLP intro Project survey out
2 08-30 W Text normalization J+M 2 Project survey due 08-31
09-04 M Labor Day. No class.
Module 2: Text classification and representation learning
3 09-06 W Bag-of-words, tf-idf, PPMI J+M 6.3-6.7 Reading quiz due 12pm;
HW1 out
4 09-11 M Naive Bayes J+M 4-4.5 Reading quiz due 12pm;
Project teams matched
5 09-13 W Classifier evaluation J+M 4.7-4.10,
Bender & Friedman 2018 (data statements)
Mitchell et al. 2019 (model cards)
Discussion post due 12pm;
HW1 due 09-17
6 09-18 M Logistic regression part 1 J+M 5-5.3 Reading quiz due 12pm;
Project area and type of contribution form out
7 09-20 W Logistic regression part 2 J+M 5.4-5.9, 5.11 Reading quiz due 12pm;
Project area and type of contribution form due 09-21
8 09-25 M Vector semantics, static word embeddings J+M 6-6.2, 6.8-6.13,
Blodgett et al. 2020
Discussion post due 12pm;
HW2 out
9 09-27 W Feedforward neural networks J+M 7-7.1, 7.3-7.4, 7.6, 7.8 Reading quiz due 12pm
Module 3: Language modeling
10 10-02 M N-gram language models part 1 J+M 3-3.2 Reading quiz due 12pm
11 10-04 W N-gram language models part 2, RNNs part 1 J+M 3.3-3.6, 3.9 Reading quiz due 12pm;
HW2 due 10-05
12 10-09 M RNNs part 2, encoder-decoder J+M 9-9.2, 9.6-9.9 Reading quiz due 12pm
13 10-11 W Transformers part 1, beam search J+M 10-10.2, 10.4 Reading quiz due 12pm;
Project proposal due 10-12
14 10-16 M Transformers part 2, pretraining, BERT and GPT J+M 10.7, 11-11.3.2,
Yiu et al. 2023
Discussion post due 12pm
15 10-18 W Project proposal presentations HW3 out 10-20
16 10-23 M BERT/LLMs discussion and lab day Bring your laptop to class
17 10-25 W Project work time Bring your laptop to class
Module 4: Sequence labeling
18 10-30 M POS tagging, NER, HMMs part 1 J+M 8-8.4.4 Reading quiz due 12pm;
HW4 out
19 11-01 W HMMs part 2, Viterbi alg, neural sequence labeling J+M 8.4.5-8.4.6, 9.3.1, 11.3.3-11.3.4 Reading quiz due 12pm;
HW3 due 11-02
Module 5: Parsing
20 11-06 M Constituency parsing, CFGs J+M 17-17.3, 17.8.1 Reading quiz due 12pm
21 11-08 W Dependency parsing J+M 18-18.2, 18.4-18.5 Reading quiz due 12pm;
HW4 due 11-09
Module 6: Application areas
22 11-13 M Machine translation part 1 J+M 13-13.2,
Bender 2019 (optional, for discussion post)
Optional discussion post (extra credit) due 12pm
23 11-15 W Machine translation part 2 J+M 13.3-13.7 Project basic working systems due 11-16
Thanksgiving Recess 11-19 to 11-26
24 11-27 M Speech technologies, ASR, TTS J+M 16-16.3, 16.5-16.8
25 11-29 W Dialogue, chatbots part 1 J+M 15-15.2 Project peer review due 11-30
26 12-04 M Dialogue, chatbots part 2 J+M 15.3-15.7
27 12-06 W Computational social science, digital humanities
28 12-13 W Final project presentations Final projects due 12-14

Assessments

Description Percentage of final grade
Final project
Survey response 1%
Project area and type of contribution form 2%
 Proposal 7%
 Basic working system 5%
 Final report and code 31%
46%
Homeworks (4 total, 11% each)
Homework 1: Vector space word similarity
Homework 2: Text classification
Homework 3: Language modeling
Homework 4: Sequence labeling
44%
Quizzes, discussion posts 10%

Course description

Computer programs that automatically process human language, such as chatbots, translation systems, and speech recognition systems, have become a part of everyday life. This course provides an introduction to the artificial intelligence research field that brought about these systems: natural language processing (NLP). Students will become familiar with foundational tasks in NLP such as language modeling, text classification, and sequence modeling. The course will cover both classic and contemporary approaches to these tasks, as well as how they are applied in language technologies. Topics of ethics, fairness, and bias in AI are incorporated throughout the course.

Learning objectives

The overarching learning objective of this course is for students to be able to structure an NLP system to get a desired outcome from language data that may be required in a future job or research problem. This ability requires the development of many constituent skills. At the end of the course, students will be able to:

Learning resources

Textbook: Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd edition draft, 2023-01-07. Available completely free online: https://web.stanford.edu/~jurafsky/slp3/

Software and programming languages: Python and associated data science libraries (pandas, numpy, scipy) are the preferred software for completing coding portions of homework assignments. Students wishing to use non-Python tools for homeworks should ask the instructor first. Final projects may be completed with any programming language or tools.

Tutorials on Python and data science:

Course infrastructure

The most recent syllabus, including a schedule, will be posted on the course website. This syllabus will contain links to homework and final project descriptions. Homeworks and the final project should be submitted through Canvas. Quizzes and discussion boards (including prompts) will be on Canvas. Course announcements will be given on Canvas, and questions should be submitted through Canvas (or over email to the instructor or TA).

Policies

Grading scale

Range Letter grade
93.0 – 100% A
90.0 – <93.0% A-
86.7 – <90.0% B+
83.3 – <86.7% B
80.0 – <83.3% B-
76.7 – <80.0% C+
73.3 – <76.7% C
70.0 – <73.3% C-
66.7 – <70.0% D+
63.3 – <66.7% D
60.0 – <63.3% D-
< 60% F

The instructor reserves the right to change the grading scale depending on class performance, but only in the direction of raising grades for students. Feel free to stop by the instructor’s office hours or make an additional appointment anytime to talk about any issues you might have with your grade.

Late work policy

Please contact the instructor and TA before the deadline if you need an extension due to unforeseen circumstances. We are happy to extend deadlines for deaths and funerals, illnesses, mental health crises or episodes, weddings, important religious and national holidays, job interviews, and other circumstances. There is no shame in asking; we care about your well-being more than we care about deadlines.

Unless you let us know beforehand (or an adverse event occurred very close to the deadline), the late penalty is 2.5% per day up to 5 days, including weekend days and holidays, for homework assignments and project milestones. Reading and discussion posts will be given half credit if they are submitted between 12 noon, when they are due, and the 2:30pm class time.

Academic integrity policy

Students in this course will be expected to comply with the University of Pittsburgh’s Policy on Academic Integrity. Any student suspected of violating this obligation for any reason during the semester will be required to participate in the procedural process, initiated at the instructor level, as outlined in the University Guidelines on Academic Integrity To learn more about Academic Integrity, visit the Academic Integrity Guide for an overview of the topic. For hands-on practice, complete the Academic Integrity Modules.

Generative AI policy

You are welcome to use generative AI programs (ChatGPT, DALL-E, etc.) as a student in this course. Since much of this course is about developing such tools in NLP, using currently available tools could not only aid you in the coursework but also expose you to the current capabilities and limitations of such systems.

However, your ethical responsibilities as a student remain the same. You must follow the University of Pittsburgh’s Policy on Academic Integrity. Here are some principles to keep in mind that can help you determine whether or not a specific use of generative AI is acceptable in this course (for all forms of generation: writing, code, images or other forms). Please ask the instructor if you are not sure about a specific use. You will not be blamed or retaliated against for asking.

Adapted from faculty in the Carnegie Mellon University Heinz College of Information Systems and Public Policy, with guidance from the Carnegie Mellon University Eberly Center for Teaching Excellence.

Disability rights

The teaching staff of this course view disabilities as deficits not in disabled people but in the institutions and societies that are structured to disadvantage disabled people. If you have a disability (visible or invisible), please let us know as soon as possible (you don’t need to tell us the nature of the disability). You are encouraged to work with Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890, drsrecep@pitt.edu, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will work with you to determine reasonable accommodations for this course. This might include lecture materials that are usable by people with visual disabilities, sign language interpretation, captioning, flexible due dates, etc.

Adapted from policies by David Mortensen and Lori Levin at Carnegie Mellon University.

Religious Observances

The observance of religious holidays (activities observed by a religious group of which a student is a member) and cultural practices are an important reflection of diversity. As your instructor, I am committed to providing equivalent educational opportunities to students of all belief systems. At the beginning of the semester, you should review the course requirements to identify foreseeable conflicts with assignments, exams, or other required attendance. Please contact me as early as possible to allow time for us to discuss and make fair and reasonable adjustments to the schedule and/or tasks.

Statement on scholarly discourse

In this course we will be discussing some complex issues on which all of us have strong feelings and, in many cases, unfounded attitudes. It is essential that we approach this endeavor with our minds open to evidence that may conflict with our presuppositions. Moreover, it is vital that we treat each other’s opinions and comments with courtesy even when they diverge and conflict with our own. We must avoid personal attacks and the use of ad hominem arguments to invalidate each other’s positions. Instead, we must develop a culture of civil argumentation, wherein all positions have the right to be defended and argued against in intellectually reasoned ways. It is this standard that everyone must accept in order to stay in this class; a standard that applies to all inquiry in the university, but whose observance is especially important in a course whose subject matter is so emotionally charged.

Adapted from a California State University course: Race, Racism and Critical Thinking.

Gender-inclusive language statement

Language is gender-inclusive and non-sexist when we use words that affirm and respect how people describe, express, and experience their gender. Gender-inclusive/non-sexist language acknowledges people of all genders (for example, first year student versus freshman, chair versus chairman, humankind versus mankind, everyone versus ladies and gentlemen, etc.). It also affirms non-binary gender identifications, and recognizes both gender identity and expression. Identities including trans, intersex, and genderqueer reflect personal descriptions, expressions, and experiences. Just as sexist language excludes women’s experiences, gendered language excludes the experiences of individuals whose identifies may not fit the gender binary, and/or who may not identify with the sex they were assigned at birth. Students, faculty, and staff have the right to control their own identity and to be referred to by the name and pronouns with which they identify. People also have the right to maintain their privacy regarding information they do not wish to share about their identities, including gender identity and pronouns.

Source: University of Pittsburgh School of Social Work

Student wellness

College/Graduate school can be an exciting and challenging time for students. Taking time to maintain your well-being and seek appropriate support can help you achieve your goals and lead a fulfilling life. It can be helpful to remember that we all benefit from assistance and guidance at times, and there are many resources available to support your well-being while you are at Pitt. You are encouraged to visit Thrive@Pitt to learn more about well-being and the many campus resources available to help you thrive.

If you or anyone you know experiences overwhelming academic stress, persistent difficult feelings and/or challenging life events, you are strongly encouraged to seek support. In addition to reaching out to friends and loved ones, consider connecting with a faculty member you trust for assistance connecting to helpful resources.

The University Counseling Center is also here for you. You can call 412-648-7930 at any time to connect with a clinician. If you or someone you know is feeling suicidal, please call the University Counseling Center at any time at 412-648-7930. You can also contact Resolve Crisis Network at 888-796-8226.

Equity and inclusion

The University of Pittsburgh does not tolerate any form of discrimination, harassment, or retaliation based on disability, race, color, religion, national origin, ancestry, genetic information, marital status, familial status, sex, age, sexual orientation, veteran status or gender identity or other factors as stated in the University’s Title IX policy. The University is committed to taking prompt action to end a hostile environment that interferes with the University’s mission. For more information about policies, procedures, and practices, visit the Civil Rights & Title IX Compliance web page.

I ask that everyone in the class strive to help ensure that other members of this class can learn in a supportive and respectful environment. If there are instances of the aforementioned issues, please contact the Title IX Coordinator, by calling 412-648-7860, or emailing titleixcoordinator@pitt.edu. Reports can also be filed online. You may also choose to report this to a faculty/staff member; they are required to communicate this to the University’s Office of Diversity and Inclusion. If you wish to maintain complete confidentiality, you may also contact the University Counseling Center (412-648-7930).