CS 2731 Introduction to Natural Language Processing

School of Computing and Information, University of Pittsburgh
Fall 2025

Peter Miller, Communication, 1940s

Time MW 2:30-3:45pm
Location Sennott Square 5129
Instructor Michael Miller Yoder, PhD. Please call me “Michael”
Instructor contact mmyoder@pitt.edu or through Canvas messages
Instructor office hours By appointment in person at SENSQ 6309 or on Zoom
Book an appointment
TA Zhaoyi (Joey) Hou
TA contact joey.hou@pitt.edu
TA office hours By appointment
Textbook (free online) [J+M] Jurafsky and Martin, Speech and Language Processing, 3e draft, 2025-01-12

Schedule

Subject to change. Last revised 2025-08-20. All due dates are at 11:59pm ET except when indicated.

Session Date Topic Readings Assignments
Module 1: Introduction and text processing
1 08-25 M Course, NLP intro HW0 out
2 08-27 W Text processing J+M 2-2.3, 2.5-2.7 HW0 due 08-28
09-01 M Labor Day. No class.
3 09-03 W Machine learning intro, NLP tasks and applications Project idea form out 09-04
Module 2: N-grams and statistical NLP
4 09-08 M Bag-of-words and n-grams J+M 6.3-6.7
5 09-10 W N-gram language models J+M 3-3.6, 3.8 In-class quiz;
HW1 out;
Project idea form due 09-11
6 09-15 M Text classification J+M 4 (intro), 4.7-4.10
7 09-17 W Project match day, CRCD tutorial HW1 due 09-18
8 09-22 M Logistic regression part 1 J+M 5-5.3
9 09-24 W Logistic regression part 2 J+M 5.4-5.6, 5.9, 5.11 In-class quiz;
HW2 out
Module 3: Neural networks and word2vec
10 09-29 M Vector semantics, word2vec J+M 6-6.2, 6.8-6.11, 6.13
11 10-01 W Neural networks part 1 J+M 7-7.1, 7.3 In-class quiz
12 10-06 M Neural networks part 2 J+M 7.4-7.5, 7.8, 8-8.2
Module 4: LLMs
13 10-08 W Transformers part 1 J+M 9-9.2 In-class quiz;
HW2 due 10-09
14 10-13 M Transformers part 2, introduction to LLMs J+M 9.4-9.6, 10-10.3, 10.6-10.7
15 10-15 W Project peer group feedback Project proposal due 10-16
16 10-20 M Project proposal presentations
17 10-22 W BERT J+M 11-11.2, 11.4-11.6 In-class quiz;
HW3 out
18 10-27 M Prompting and post-training LLMs J+M 12-12.5, 12.8
19 10-29 W Guest lecture
20 11-03 M Reasoning in LLMs DeepSeek-R1,
ACL 2023 Tutorial: Complex Reasoning in Natural Language
Module 5: Sequence labeling and parsing
21 11-05 W Sequence labeling J+M 17-17.3, 11.5 In-class quiz;
HW3 due 11-06;
HW4 out
22 11-10 M Dependency parsing J+M 19-19.2, 19.4-19.5
Module 6: NLP applications and ethics
23 11-12 W Machine translation J+M 13-13.3, 13.6-13.8 Project progress report due 11-13
24 11-17 M Information retrieval, RAG J+M 14-14.3.1, 14.5
25 11-19 W Dialogue systems, chatbots J+M 15 HW4 due 11-20
Thanksgiving Break 11-24 to 11-27
26 12-01 M Social factors, bias and ethics in NLP Hovy and Spruit 2016, Blodgett et al. 2020
27 12-03 W Project work time (in class)
28 12-08 M Project presentations Project report due 12-09

Assessments

Description Points Percentage of final grade
Homework assignments total 430 43
 Homework 0 6 0.6
 Homework 1 105 10.5
 Homework 2 105 10.5
 Homework 3 105 10.5
 Homework 4 105 10.5
Project total 410 41
 Project idea form 5 0.5
Project proposal 85 8.5
 Project progress report 85 8.5
 Project report 235 23.5
Quizzes total 60 6
 Each quiz of 6 total, lowest score dropped 12 1.2
Participation total 100 10
 Attendance 60 6
 Engagement 40 4
Grand total 1000 100

Participation grade

In-class, collaborative activities are better learning experiences when students come to class and participate. To encourage participation, there is a participation grade worth 10% of the total course grade. The majority of that grade comes from attendance, which will be taken via Top Hat on randomly selected class sessions. The rest of the grade will be assigned based on whether a student asked questions in class or otherwise (such as during office hours), or partipated in in-class activites. If you did any of this basic engagement, full credit will be awarded.

Course description

Computer programs that automatically process human language, such as chatbots, translation systems, and speech recognition systems, have become a part of everyday life. This course provides an introduction to the subfield of artificial intelligence that brought about these systems: natural language processing (NLP). Students will become familiar with foundational tasks in NLP such as language modeling, text classification, and sequence modeling. The course will cover both classic and contemporary approaches to these tasks, as well as how they are applied in language technologies. Topics of ethics, fairness, and bias in AI are incorporated throughout the course.

Learning objectives

The overarching learning objective of this course is for students to be able to design, implement, and evaluate natural language processing (NLP) systems to get desired output from language data. This skill will prepare students for NLP projects in future jobs or research problems. Being able to develop NLP systems requires many constituent skills. At the end of the course, students will be able to:

  • Relate a new problem to the most relevant existing NLP tasks, such as text classification, text generation, sequence modeling, language modeling, information retrieval, machine translation, dialogue systems, etc.
  • Choose relevant baseline machine learning approaches to try on a new task
  • Explain the basics of language structure that are relevant to NLP. These include syntax and semantics from linguistics
  • Preprocess text data into a machine-readable format
  • Define and scope an objective in terms of a machine learning or NLP system. This includes determining if human annotation is needed and if machine learning is needed
  • Extract features from text that are required for running machine learning models
  • Choose suitable ML algorithms for a new NLP task
  • Evaluate machine learning algorithms, choices of training data and other NLP system decisions
  • Identify potential ethical pitfalls (such as imbalanced training data, model amplification of biases) in an NLP system and ways to address them
  • Communicate motivation, key components, and implications of an approach to NLP tasks in writing

Prerequisites

  • CS 1501: Algorithms and Data Structures 2 (grade C or better)
  • Some basic Python knowledge will be assumed. If you are not familiar with Python, see the learning resources below.

Learning resources

Textbook: Dan Jurafsky and James H. Martin, Speech and Language Processing, 3rd edition draft, 2025-01-12. Available completely free online: https://web.stanford.edu/~jurafsky/slp3/

Software and programming languages: Python and associated data science libraries (pandas, numpy, scipy) are the preferred software for completing coding portions of homework assignments. Some basic knowledge of Python is a prerequisite of the course, as some of the homework assignments require Python. The project may be completed with any programming language or tools.

Tutorials on Python and data science:

Course infrastructure and communication

The most recent syllabus, including a schedule, is posted here on the course website. This syllabus will contain links to homework and final project descriptions. Homework assignments and the project should be submitted through Canvas. Course announcements will be given on Canvas, and questions should be submitted through Canvas (or over email to teaching staff).

Feel free to email or send a Canvas message to teaching staff about any concerns or questions at any time. Teaching staff will respond during hours that work best for them; please feel no obligation to respond outside of your regular working hours.

Policies

Grading scale

Range Letter grade
92.5 – 100% A
90.0 – <92.5% A-
87.5 – <90.0% B+
82.5 – <87.5% B
80.0 – <82.5% B-
77.5 – <80.0% C+
72.5 – <77.5% C
70.0 – <72.5% C-
67.5 – <70.0% D+
62.5 – <67.5% D
60.0 – <62.5% D-
< 60% F

Feel free to contact the instructor or schedule an office hours appointment to talk about any issues you might have with your grade.

Late work policy

Students are granted 5 total late days across all homework assignments and quizzes without penalty. After those five late days, you will be penalized 10% for each day that your submission is late up to a maximum of 40%. Group project work will be penalized 10% for each day late up to a maximum of 40%. No late work will be accepted for the project report.

Assignment resubmission policy

If you are unsatisfied with your grade on an assignment and wish to resubmit work, talk with the instructor. Resubmissions are handled case by case, but are generally accepted in cases where parts of the assignment are missing (sections of the rubric are 0). Updated or added text in resubmitted reports must be highlighted in yellow. Resubmissions are subject to an automatic 10% deduction and must be submitted by 11:59pm on the last day of class. Only 1 resubmission per homework assignment will be accepted.

Academic integrity policy

Students in this course will be expected to comply with the University of Pittsburgh’s Policy on Academic Integrity. Any student suspected of violating this obligation for any reason during the semester will be required to participate in the procedural process, initiated at the instructor level, as outlined in the University Guidelines on Academic Integrity. To learn more about Academic Integrity, visit the Academic Integrity Guide for an overview of the topic. For hands-on practice, complete the Academic Integrity Modules.

Generative AI policy

You are allowed to use generative AI programs (ChatGPT, etc.) as a student in this course in limited circumstances. Since much of this course is about developing such tools in NLP, using currently available tools can expose you to the current capabilities and limitations of such systems.

However, your ethical responsibilities as a student remain the same. You must follow the University of Pittsburgh’s Policy on Academic Integrity. Here are some principles to keep in mind that can help you determine whether or not a specific use of generative AI is acceptable in this course (for all forms of generation: writing, code, images or other forms). Please ask the instructor if you are not sure about a specific use. You will not be blamed or retaliated against for asking.

  • Use as an aid, not for a finished product. LLMs could be used in this course to generate ideas, draft bibliographies, study guides, or for revising existing writing. Use for drafting entire homework or project reports is not acceptable, even if students revise this draft, since being able to communicate NLP procedures and research is a learning objective. Also keep in mind that language models have no notion of reality and will hallucinate facts and citations.

  • Cite its use. The University of Pittsburgh’s academic integrity policy applies to all uncited or improperly cited use of content, whether that work is created by human beings alone or in collaboration with a generative AI. If you use a generative AI tool to develop content for an assignment, you are required to cite the tool’s contribution to your work. In practice, cutting and pasting content from any source without citation is plagiarism. Likewise, paraphrasing content from a generative AI without citation is plagiarism. Similarly, using any generative AI tool without appropriate acknowledgement will be treated as plagiarism. See the APA guidelines on how to cite ChatGPT. Citing your use of LLMs will also inform teaching staff on how such tools are being used in education for developing better future policies.

  • You are responsible for the work you turn in. As we will discuss in this course, LLMs and other generative AI systems can and do generate biased, socially problematic language and assert unfounded claims. Ultimately the text you submit will be treated as reflecting your own work, and you are responsible for it.

Adapted from faculty in the Carnegie Mellon University Heinz College of Information Systems and Public Policy, with guidance from the Carnegie Mellon University Eberly Center for Teaching Excellence.

Disability rights

The teaching staff of this course view disabilities as deficits not in disabled people but in the institutions and societies that are structured to disadvantage disabled people. If you have a disability (visible or invisible), please let us know as soon as possible. You don’t need to tell us the nature of the disability. You are encouraged to work with Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890, drsrecep@pitt.edu, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will work with you to determine reasonable accommodations for this course. This might include lecture materials that are usable by people with visual disabilities, sign language interpretation, captioning, flexible due dates, etc.

Adapted from policies by David Mortensen and Lori Levin at Carnegie Mellon University.

Religious Observances

The observance of religious holidays (activities observed by a religious group of which a student is a member) and cultural practices are an important reflection of diversity. As your instructor, I am committed to providing equivalent educational opportunities to students of all belief systems. At the beginning of the semester, you should review the course requirements to identify foreseeable conflicts with assignments or other required attendance. Please contact me as early as possible to allow time for us to discuss and make fair and reasonable adjustments to the schedule and/or tasks.

Statement on scholarly discourse

In this course we will be discussing some complex issues on which all of us have strong feelings and, in many cases, unfounded attitudes. It is essential that we approach this endeavor with our minds open to evidence that may conflict with our presuppositions. Moreover, it is vital that we treat each other’s opinions and comments with courtesy even when they diverge and conflict with our own. We must avoid personal attacks and the use of ad hominem arguments to invalidate each other’s positions. Instead, we must develop a culture of civil argumentation, wherein all positions have the right to be defended and argued against in intellectually reasoned ways.

Adapted from a California State University course: Race, Racism and Critical Thinking.

Student wellness

College can be an exciting and challenging time for students. Taking time to maintain your well-being and seek appropriate support can help you achieve your goals and lead a fulfilling life. It can be helpful to remember that we all benefit from assistance and guidance at times, and there are many resources available to support your well-being while you are at Pitt. You are encouraged to visit Thrive@Pitt to learn more about well-being and the many campus resources available to help you thrive.

If you or anyone you know experiences overwhelming academic stress, persistent difficult feelings and/or challenging life events, you are strongly encouraged to seek support. In addition to reaching out to friends and loved ones, consider connecting with a faculty member you trust for assistance connecting to helpful resources.

The University Counseling Center is also here for you. You can call 412-648-7930 at any time to connect with a clinician. If you or someone you know is feeling suicidal, please call the University Counseling Center at any time at 412-648-7930. You can also contact Resolve Crisis Network at 888-796-8226.

Equity and inclusion

The University of Pittsburgh does not tolerate any form of discrimination, harassment, or retaliation based on disability, race, color, religion, national origin, ancestry, genetic information, marital status, familial status, sex, age, sexual orientation, veteran status or gender identity or other factors as stated in the University’s Title IX policy. The University is committed to taking prompt action to end a hostile environment that interferes with the University’s mission. For more information about policies, procedures, and practices, visit the Civil Rights & Title IX Compliance web page.

I ask that everyone in the class strive to help ensure that other members of this class can learn in a supportive and respectful environment. If there are instances of the aforementioned issues, please contact the Title IX Coordinator, by calling 412-648-7860, or emailing titleixcoordinator@pitt.edu. Reports can also be filed online. You may also choose to report this to a faculty/staff member; they are required to communicate this to the University’s Office of Diversity and Inclusion. If you wish to maintain complete confidentiality, you may also contact the University Counseling Center (412-648-7930).