Project
Last revised 2025-08-20.
A major component of this course is a hands-on final project guided by students’ own interests. In this project, students will demonstrate an ability to summarize current approaches and challenges in a subfield of NLP and implement some sort of contribution (however small) to this NLP area of research or practice.
Projects will be done in groups of 2-4 students. Groups will be formed during an in-class project match day, largely based on interest in the same project ideas.
Project idea form
Due 09-11.
Fill out project ideas you might be interested in working on in this form. You can fill out ideas from the example projects listed below or one of your own ideas. For your own ideas, consider what what research you’re interested in, what system you’d like to build that processes language in some form, interesting text datasets you’d like to work on, really anything! It is best if your idea has a dataset in mind, but this is not required.
You can fill out as many ideas as you’d like with this form. Ideas do not have to be fully sketched out. Submitting an idea does mean you will necessarily work on it. These ideas will be presented to all students anonymously. Each student must submit at least one idea for credit on this assignment, even if it’s just chosen from the example projects.
Example projects
Some of these projects are drawn from “shared tasks” where NLP researchers compete for the best performance on certain datasets. Others are based on ideas and projects from prior students and from the instructor.
1. Text classification
- Classify adversarial prompts for LLMs based on attack type, using publicly available red-teaming datasets.
- Given a review of a restaurant, determine what type of restaurant it is from this Yelp dataset
- Given a short essay in response to a troubling news article, predict the level of empathy. See WASSA 2024 shared task Track 3.
- Predict emotion labels from tweets across many languages. See WASSA 2024 shared task
- Given a news article and a list of “entities” (people, organizations, etc), predict roles such as protagonist, antagonist, and innocent. See SemEval 2025 Task 10, Subtask 1 on entity framing
- Predict news genre or media “frames” such as morality, economic, or crime and punishment from news articles in multiple languages. See SemEval 2023 Task 3, Subtasks 1 or 2
- Predict whether text was written by humans or generated by AI. Tasks include predicting for data across languages and for academic essays. See GenAI Content Detection Workshop, Task 1 or 2
- Classify tweets as sexist or not, or predict the “intent” of sexist tweets as direct, reported or judgemental. See EXIST 2024 Task 1 or Task 2
- Predict if similar words are redundant or not with the Semantic Pleonasm corpus developed right here at Pitt.
- From a set of descriptions of characters, develop a classifier to predict which ones will generate the most fanfiction. This could be a lens into online community and media norms.
- Predict “speech acts”, intentions behind utterances, based on emojis with a dataset assembled by former students in the class.
2. Machine translation
- Train translation models for literary text and evaluate on a dataset of Korean-English webnovels.
- Translate customer service chats in between languages. See the WMT 2024 Chat Shared Task
- Translate code-mixed Hinglish to English. See the WMT 2022 Code-mixed Machine Translation Task
- Create a system to automatically correct (post-edit) machine translations. See the WMT 2022 Automatic Post-Editing Shared Task
3. Information retrieval and extraction
- Given a query, retrieve the most relevant passages from regulatory documents: https://www.codabench.org/competitions/3527/
- Extract important entities from scientific articles with the SCIRex dataset
4. Question answering
- Train a system to predict abstract terms related to a passage and answer multiple choice questions. See SemEval 2021 Task 4.
5. Analysis and annotation of datasets
- Visualize similarities in US state legislature bill texts and predict bill passage using data from LegiScan (example repo here).
- Develop an annotation guide and start annotating a new dataset of online gaming voice chat for hate speech, abusive, and offensive language.
- Hate speech is culturally specific, yet the majority of NLP work focuses on English in North American and European contexts. A quantitative analysis of different features of datasets annotated for hate speech in multiple languages and from multiple cultural contexts would illuminate global similarities and culturally specific contexts.
- Fanfiction, online writing by fans of media works, is known for celebrating queer identity but still may center the experiences of white authors and characters. Use FanfictionNLP to compare representations of characters of color to white characters in fanfiction at scale.
- Quantitative analysis of hateful, white supremacist narratives usually centers on contemporary online discourse. Yet many white supremacist language and narratives has its roots before online discourse. Compare narratives, topics and themes presented in historic and contemporary white supremacist discourse with data provided by the instructor.
- Explore similarities and differences between language in podcasts and Reddit communities based on those podcasts using a dataset assembled by former students in the class.
- Computational analysis of Nakba narratives. See workshop and datasets.
- Examine the framing of different entities in police Facebook posts from the Plain View Project.
- Analyze how different newspapers cover topics differently in English-language editorials from Sri Lankan newspapers. Data is provided by the instructor and a collaborator at Carnegie Mellon University.
6. Survey papers
- Survey how NLP is used and applied in other fields before and after LLMs. What has been our most useful contributions to scholars in the social sciences, physical sciences, or humanities? This survey would assemble papers across disciplines for mentions of NLP and summarize what is most useful, what is lacking, and what approaches from NLP could be helpful to others.
- Computational social science using NLP generally relies on data from online communities. But this is missing non-online interactions and the practices of those who are not active online. Survey datasets and approaches that use quantitative and computational techniques on recordings of offline linguistic interaction.
- A growing area of research in computational social science aims to capture the framing and portrayal of entities across large text corpora (such as in news media). Survey existing approaches and challenges.
7. Other
- Evaluate LLMs for their factuality in summarization of class reflections using a dataset provided by the instructor and Prof. Diane Litman.
- Evaluate the fairness of quality scores automatically assigned to sutdent reflections using a dataset provided by the instructor and Prof. Diane Litman.
- New identity terms are commonly developed in online communities, some of them hateful. Develop methods to find in-group hate jargon and identity terms.
- Build networks of characters and predict relations among characters in fiction using this dataset.
- Stancetaking, a concept from sociolinguistics, is when speakers take an evaluative position toward the concept (which are often nuanced, e.g. “No, I actually don’t like Taylor Swift’s music that much, but she’s great as a person”). Develop automated methods for identifying the “stance object”, who the speaker is evaluating, likely from Reddit data.
- Automatically summarize movies based on their subtitles from this dataset developed by former students in the class.
Project group match day
In class 09-17.
Students will form groups of 2-4 people around a list of potential projects submitted by the class in the project idea form.
Project peer group feedback
In class 10-15.
In class before the proposal is due, you will be matched with another group who will review your proposal and provide guided feedback in class.
Project proposal
Due 10-16.
Please submit one per group on Canvas. There is no required length or format for this report, but it is recommended to use the ACL format that the final report will be formatted in. This proposal will contain answers to a series of questions TBD. It will include a peer review where you will rate your own performance and the performance of other group members through the form here.
Project proposal presentation
In class 10-20.
Groups will make a brief presentation to the class outlining their proposed project, with Q&A and opportunities for feedback from other students. Please plan for maximum 5-minute presentations not including Q&A, which will be held right afterward for each group. Slides will be added to a shared PowerPoint presentation. Presentations are not graded. Cover at least these key points:
- Project motivation
- Briefly, what 1-2 other related papers have done
- What data you are planning to use
- What approach/methods you plan to take
- How you will evaluate your approach
Progress report
Due 11-13.
A brief progress report of a basic working system. This report should be in the ACL format that the final report will be in.
Part 1: Data basic statistics and exploratory analysis
In this part, please provide the following information about your dataset. It’s fine to be working with multiple datasets; just complete this for each one or for a final dataset you will be using if you are combining datasets.
- The number of rows (datapoints) in the dataset and what each datapoint corresponds to. If you are splitting the dataset into a training, test, and possible dev sets, how many rows are in each?
- The number of columns in the dataset you will be using and what each corresponds to.
- If applicable, the distribution of the target labels you are predicting. So for a binary sentiment classification task, how many rows in each set (except the test set) are marked negative or positive sentiment? This can be in a table or graph format.
- Optionally, any other distribution or data visualization that you think is helpful for understanding your dataset or task.
Part 2: Some kind of result
Please provide one (hopefully quantitative) result from your work so far. A good example would be a performance metric result from your baseline approach on a dev or test set, but it could also be some sort of other finding you have so far. But if you’re not that far yet, you can also provide an example of working input and output from your system or part of a system, some sort of plot or other output. You can be up front about challenges you are facing for which you might need help; to get a good grade, I’ll just be looking for some sort of output from a working system or part of a system. If you are confused what this means for your project, contact the instructor.
Part 3: Open questions and challenges
Please describe any open questions or challenges your group has at this point. Will you need any resources other than the ones provided in class (OpenAI API access, CRCD access) or have any other questions? Also describe if the roles for each of your team members have changed since the proposal and if so, what the new roles are.
Final presentation
In class 12-08.
Groups will present their finished work to the group, with Q&A and feedback opportunities from students. Please prepare a maximum 8-minute presentation. Cover at least these key points:
- Project motivation (briefly)
- Task description, including example input and output
- Data
- Methods
- Results or findings
Final report
Due 12-09.
At the end of the course, groups will provide a written report of their project. This project includes a quantitative comparison between at least two NLP systems on a clearly specified task or tasks. One of these is generally a more traditional NLP approach and the other involves LLMs, though your group’s project may vary if you have discussed this with the instructor.
At the end of the course, groups will provide a written report of their project. This report will be in the ACL format found here (Overleaf template here). The report should be a maximum of 8 pages, not including limitations, ethics, group member task breakdown, references sections or appendices. Outstanding reports would be of a quality and structure that could be submitted to an NLP workshop or conference, but other types of projects can also achieve an A. There is flexibility in section names, but please provide information about the following aspects of the project: 1. Project motivation 2. Literature review. Please provide full citations in a references sections for works cited throughout the paper (not just URLs). 3. Data 4. Methods. Please clearly specify which techniques are novel/your own versus methods directly or indirectly from prior work (which is also fine). 5. Results 6. Discussion 9. Future work. This is a good place to describe things you thought about but never had time to complete! 7. Limitations (doesn’t count toward page limit) 8. Ethical issues (doesn’t count toward page limit) 10. Group member task breakdown (doesn’t count toward page limit). This section details the high-level tasks that each group member completed. 11. References (doesn’t count toward page limit) 12. Appendices (optional, doesn’t count toward page limit). Additional figures or explanation in one or more appendices is allowed, but they will not necessarily be considered in grading.
How your project will be graded
To get an A, your group’s project should make progress toward an achievable, concrete contribution specified in your project proposal. The project does not necessarily need to be successful in the sense that it outperforms baselines or contributes to our knowledge of a phenomenon. Sometimes ideas don’t work, and that’s okay. But you need to provide evidence of progress toward that contribution. If you are building a dataset, for example, the dataset needs to be built in some form, even if it is as not as large or as useful as you may have hoped. If you are evaluating a new method for a task, you must have an implementation that tests that method against other baselines, even if it doesn’t perform as well as you would have hoped or you didn’t get to evaluate it against all the baselines you wanted to. If you are doing a survey, you must distill a sufficient number of papers into themes that comprehensively describe a research area, even if you don’t end up finding groundbreaking gaps in knowledge that must be addressed. Feel free to take on more risky ideas, but only if you know you’ll have something to show for it at the end. Teaching staff will guide you toward scoping projects that should fulfill this goal in the planning phase through the proposal.