banner art
Peter Miller, Communication, 1940s

CS 2731 / ISSP 2230 Introduction to Natural Language Processing

University of Pittsburgh, Spring 2024
Time MW 3:00-4:15pm
Location Sennott Square 6110
Instructor Michael Miller Yoder, PhD
Please call me "Michael"
Instructor contact mmy29@pitt.edu
Instructor office hours W 1-2pm and by appointment, Sennott Square 6505
TA Bhiman Kumar Baghel
TA office hours M 9-11am online (see Zoom link on Canvas under 'Syllabus'), and by appointment
Textbook (free online) [J+M] Jurafsky and Martin, Speech and Language Processing, 3e draft, 2023-01-07
[J+M2024] Jurafsky and Martin, Speech and Language Processing, 3e draft, 2024-02-03

Schedule

Subject to change. Last revised 2024-03-28. All due dates are at 11:59pm ET except when indicated.

Session Date Topic Readings Assignments
Module 1: Introduction and text processing
1 01-08 M Course, NLP intro Project survey out
2 01-10 W Text normalization J+M 2-2.4, 2.6
01-15 M MLK Day. No class.
Module 2: Text classification and representation learning
3 01-17 W Bag-of-words, tf-idf, PPMI J+M 6.3-6.7 Reading quiz due 1pm;
HW1 out;
Project survey due 01-18
4 01-22 M Naive Bayes J+M 4-4.5 Reading quiz due 1pm;
Project teams matched
5 01-24 W Classifier evaluation J+M 4.7-4.10,
Bender & Friedman 2018 (data statements)
Mitchell et al. 2019 (model cards)
Discussion post due 1pm;
Project pre-proposal form out
6 01-29 M Logistic regression part 1 J+M 5-5.3 Reading quiz due 1pm;
HW2 out 01-30
7 01-31 W Logistic regression part 2 J+M 5.4-5.9, 5.11 Reading quiz due 1pm;
HW1 due 02-01
8 02-05 M Vector semantics, static word embeddings J+M 6-6.2, 6.8-6.13,
Blodgett et al. 2020
Discussion post due 1pm;
Project pre-proposal form due
9 02-07 W Feedforward neural networks J+M 7-7.1, 7.3-7.4, 7.6, 7.8 Reading quiz due 1pm
Module 3: Language models and conditional language models
10 02-12 M N-gram language models part 1 J+M 3-3.2 Reading quiz due 1pm
11 02-14 W N-gram language models part 2, RNNs part 1 J+M 3.3-3.6, 3.9 Reading quiz due 1pm;
HW2 due 02-15
12 02-19 M RNNs part 2, encoder-decoder J+M 9-9.2, 9.6-9.9 Reading quiz due 1pm;
HW3 out 02-20
13 02-21 W Transformers part 1, beam search J+M 10-10.2, 10.4 Reading quiz due 1pm;
Project proposal and literature review due 02-22
14 02-26 M Transformers part 2, pretraining, BERT and GPT J+M 10.7, 11-11.3.2,
Yiu et al. 2023
Discussion post due 1pm
15 02-28 W BERT/LLMs discussion and lab day Bring a laptop to class
16 03-04 M Project proposal presentations HW4 out 03-05
17 03-06 W Project work time Bring a laptop to class;
HW3 due 03-10
Spring Break 03-10 to 03-17
Module 4: Sequence labeling
18 03-18 M POS tagging, NER, HMMs part 1 J+M 8-8.4.4 Reading quiz due 11:59pm
19 03-20 W HMMs part 2, Viterbi alg, neural sequence labeling J+M 8.4.5-8.4.6, 9.3.1, 11.3.3-11.3.4 Reading quiz due 11:59pm
Module 5: Parsing
20 03-25 M Constituency parsing, CFGs J+M 17-17.3, 17.8.1 Reading quiz due 11:59pm;
HW4 due
21 03-27 W Dependency parsing J+M 18-18.2, 18.4-18.5 Reading quiz due 11:59pm;
Project peer review due
Module 6: Application areas
22 04-01 M Machine translation part 1 J+M 13-13.2,
Bender 2019
23 04-03 W Machine translation part 2 J+M 13.3-13.7
J+M2024 13.3, 13.5-13.8
Project basic working systems due 04-04
24 04-08 M Speech technologies, ASR, TTS J+M 16-16.3, 16.5-16.8
J+M2024 16-16.3, 16.5-16.8
25 04-10 W Dialogue, chatbots part 1 J+M 15-15.2
J+M2024 15-15.1, 15.4
26 04-15 M Dialogue, chatbots part 2 J+M 15.3-15.7
J+M2024 15.2-15.3, 15.5-15.6
27 04-17 W Computational social science, digital humanities
28 04-24 W Final project presentations Final projects due 04-25

Assessments

Description Points Percentage of final grade
Final project total 222 44.4
Survey response 5 1
Project pre-proposal form 10 2
 Proposal and literature review 35 7
 Peer review 2 0.4
 Basic working system report 30 6
 Final report 140 28
Homework assignments total 224 44.8
 Each homework of 4 total 56 11.2
Reading quizzes total 33 6.6
 Each reading quiz of 13 total, 2 lowest scores dropped 3 0.6
Discussion posts total 21 4.2
 Each discussion post of 3 total required 7 1.4
Grand total 500 100

Course description

Computer programs that automatically process human language, such as chatbots, translation systems, and speech recognition systems, have become a part of everyday life. This course provides an introduction to the artificial intelligence research field that brought about these systems: natural language processing (NLP). Students will become familiar with foundational tasks in NLP such as language modeling, text classification, and sequence modeling. The course will cover both classic and contemporary approaches to these tasks, as well as how they are applied in language technologies. Topics of ethics, fairness, and bias in AI are incorporated throughout the course.

Learning objectives

The overarching learning objective of this course is for students to be able to structure an NLP system to get a desired outcome from language data that may be required in a future job or research problem. This ability requires the development of many constituent skills. At the end of the course, students will be able to:

Learning resources

Textbook: Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd edition draft, 2023-01-07 or 2024-02-03. Available completely free online: https://web.stanford.edu/~jurafsky/slp3/

Software and programming languages: Python and associated data science libraries (pandas, numpy, scipy) are the preferred software for completing coding portions of homework assignments. Students wishing to use non-Python tools for homeworks should ask the instructor first. Final projects may be completed with any programming language or tools.

Tutorials on Python and data science:

Course infrastructure

The most recent syllabus, including a schedule, will be posted on the course website. This syllabus will contain links to homework and final project descriptions. Homeworks and the final project should be submitted through Canvas. Quizzes and discussion boards (including prompts) will be on Canvas. Course announcements will be given on Canvas, and questions should be submitted through Canvas (or over email to the instructor or TA).

Policies

Grading scale

Range Letter grade
93.0 – 100% A
90.0 – <93.0% A-
86.7 – <90.0% B+
83.3 – <86.7% B
80.0 – <83.3% B-
76.7 – <80.0% C+
73.3 – <76.7% C
70.0 – <73.3% C-
66.7 – <70.0% D+
63.3 – <66.7% D
60.0 – <63.3% D-
< 60% F

The instructor reserves the right to change the grading scale depending on class performance, but only in the direction of raising grades for students. Feel free to stop by the instructor’s office hours or make an additional appointment anytime to talk about any issues you might have with your grade.

Late work and assignment resubmission policy

Please contact the instructor and TA before the deadline if you need an extension due to unforeseen circumstances. We are happy to extend deadlines for deaths and funerals, illnesses, mental health crises or episodes, weddings, important religious and national holidays, job interviews, and other circumstances. There is no shame in asking; we care about your well-being more than we care about deadlines.

Unless you let us know beforehand (or an adverse event occurred very close to the deadline), the late penalty is 2.5% per day, including weekend days and holidays, for all assignments. The latest you may turn assignments in is 2 weeks after the deadline, excluding the final project report, which must be turned in by the deadline.

If you are unsatisfied with your grade on an assignment and wish to resubmit work, talk with the instructor. Resubmissions are handled case by case, but are generally accepted in cases where parts of the assignment are missing.

Academic integrity policy

Students in this course will be expected to comply with the University of Pittsburgh’s Policy on Academic Integrity. Any student suspected of violating this obligation for any reason during the semester will be required to participate in the procedural process, initiated at the instructor level, as outlined in the University Guidelines on Academic Integrity To learn more about Academic Integrity, visit the Academic Integrity Guide for an overview of the topic. For hands-on practice, complete the Academic Integrity Modules.

Generative AI policy

You are welcome to use generative AI programs (ChatGPT, DALL-E, etc.) as a student in this course. Since much of this course is about developing such tools in NLP, using currently available tools could not only aid you in the coursework but also expose you to the current capabilities and limitations of such systems.

However, your ethical responsibilities as a student remain the same. You must follow the University of Pittsburgh’s Policy on Academic Integrity. Here are some principles to keep in mind that can help you determine whether or not a specific use of generative AI is acceptable in this course (for all forms of generation: writing, code, images or other forms). Please ask the instructor if you are not sure about a specific use. You will not be blamed or retaliated against for asking.

Adapted from faculty in the Carnegie Mellon University Heinz College of Information Systems and Public Policy, with guidance from the Carnegie Mellon University Eberly Center for Teaching Excellence.

Disability rights

The teaching staff of this course view disabilities as deficits not in disabled people but in the institutions and societies that are structured to disadvantage disabled people. If you have a disability (visible or invisible), please let us know as soon as possible (you don’t need to tell us the nature of the disability). You are encouraged to work with Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890, drsrecep@pitt.edu, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will work with you to determine reasonable accommodations for this course. This might include lecture materials that are usable by people with visual disabilities, sign language interpretation, captioning, flexible due dates, etc.

Adapted from policies by David Mortensen and Lori Levin at Carnegie Mellon University.

Religious Observances

The observance of religious holidays (activities observed by a religious group of which a student is a member) and cultural practices are an important reflection of diversity. As your instructor, I am committed to providing equivalent educational opportunities to students of all belief systems. At the beginning of the semester, you should review the course requirements to identify foreseeable conflicts with assignments, exams, or other required attendance. Please contact me as early as possible to allow time for us to discuss and make fair and reasonable adjustments to the schedule and/or tasks.

Statement on scholarly discourse

In this course we will be discussing some complex issues on which all of us have strong feelings and, in many cases, unfounded attitudes. It is essential that we approach this endeavor with our minds open to evidence that may conflict with our presuppositions. Moreover, it is vital that we treat each other’s opinions and comments with courtesy even when they diverge and conflict with our own. We must avoid personal attacks and the use of ad hominem arguments to invalidate each other’s positions. Instead, we must develop a culture of civil argumentation, wherein all positions have the right to be defended and argued against in intellectually reasoned ways. It is this standard that everyone must accept in order to stay in this class; a standard that applies to all inquiry in the university, but whose observance is especially important in a course whose subject matter is so emotionally charged.

Adapted from a California State University course: Race, Racism and Critical Thinking.

Student wellness

College/Graduate school can be an exciting and challenging time for students. Taking time to maintain your well-being and seek appropriate support can help you achieve your goals and lead a fulfilling life. It can be helpful to remember that we all benefit from assistance and guidance at times, and there are many resources available to support your well-being while you are at Pitt. You are encouraged to visit Thrive@Pitt to learn more about well-being and the many campus resources available to help you thrive.

If you or anyone you know experiences overwhelming academic stress, persistent difficult feelings and/or challenging life events, you are strongly encouraged to seek support. In addition to reaching out to friends and loved ones, consider connecting with a faculty member you trust for assistance connecting to helpful resources.

The University Counseling Center is also here for you. You can call 412-648-7930 at any time to connect with a clinician. If you or someone you know is feeling suicidal, please call the University Counseling Center at any time at 412-648-7930. You can also contact Resolve Crisis Network at 888-796-8226.

Equity and inclusion

The University of Pittsburgh does not tolerate any form of discrimination, harassment, or retaliation based on disability, race, color, religion, national origin, ancestry, genetic information, marital status, familial status, sex, age, sexual orientation, veteran status or gender identity or other factors as stated in the University’s Title IX policy. The University is committed to taking prompt action to end a hostile environment that interferes with the University’s mission. For more information about policies, procedures, and practices, visit the Civil Rights & Title IX Compliance web page.

I ask that everyone in the class strive to help ensure that other members of this class can learn in a supportive and respectful environment. If there are instances of the aforementioned issues, please contact the Title IX Coordinator, by calling 412-648-7860, or emailing titleixcoordinator@pitt.edu. Reports can also be filed online. You may also choose to report this to a faculty/staff member; they are required to communicate this to the University’s Office of Diversity and Inclusion. If you wish to maintain complete confidentiality, you may also contact the University Counseling Center (412-648-7930).