Project
Last revised 2025-02-27.
A major component of this course is a hands-on final project. In this project, students will demonstrate an ability to build and evaluate an NLP system that takes in language data and automatically produces some sort of output.
Projects will be done in groups of 2-4 students. Groups will be assigned by teaching staff based on interests, skills, and group preferences from students.
Project idea submission form
Due 01-30.
With this form, you can fill out project ideas you might be interested in working on. You can fill out ideas from the example projects listed below or one of your own ideas. For your own ideas, consider what computer system you’d like to build that processes language in some form, interesting text datasets you’d like to work on, really anything! It is best if your idea has a dataset in mind, but this is not required.
You can fill out as many ideas as you’d like with this form. Ideas do not have to be fully sketched out. Submitting an idea does mean you will necessarily work on it. These ideas will be presented to all students anonymously. Each student must submit at least one idea for credit on this assignment, even if it’s just chosen from the example projects.
Example projects
Many of these projects are drawn from “shared tasks” where NLP researchers compete for the best performance on certain datasets. The instructor will provide data for these projects, though it may still require further preprocessing for use.
1. Text classification
- Given a short essay in response to a troubling news article, predict the level of empathy. See WASSA 2024 shared task Track 3.
- Predict emotion labels from tweets across many languages. See WASSA 2024 shared task
- Given a news article and a list of “entities” (people, organizations, etc), predict roles such as protagonist, antagonist, and innocent. See SemEval 2025 Task 10, Subtask 1 on entity framing
- Predict news genre or media “frames” such as morality, economic, or crime and punishment from news articles in multiple languages. See SemEval 2023 Task 3, Subtasks 1 or 2
- Predict whether text was written by humans or generated by AI. Tasks include predicting for data across languages and for academic essays. See GenAI Content Detection Workshop, Task 1 or 2
- Classify tweets as sexist or not, or predict the “intent” of sexist tweets as direct, reported or judgemental. See EXIST 2024 Task 1 or Task 2
- Predict if similar words are redundant or not with the Semantic Pleonasm corpus developed right here at Pitt.
- Predict whether bills will pass in US state legislatures (Minnesota, Pennsylvania, or Virginia) based solely on the text, or in combination with other metadata such as the party sponsoring the bill.
2. Machine translation
- Translate customer service chats in between languages. See the WMT 2024 Chat Shared Task
- Translate code-mixed Hinglish to English. See the WMT 2022 Code-mixed Machine Translation Task
- Create a system to automatically correct (post-edit) machine translations. See the WMT 2022 Automatic Post-Editing Shared Task
3. Information retrieval and extraction
- Given a query, retrieve the most relevant passages from regulatory documents: https://www.codabench.org/competitions/3527/
- Extract important entities from scientific articles with the SCIRex dataset
4. Question answering
- Train a system to predict abstract terms related to a passage and answer multiple choice questions. See SemEval 2021 Task 4.
Project group match day
In class 02-05.
Students will form groups of 2-4 people around the following list of potential projects. Note that this list of project ideas is much greater than the final number of groups will be, so not all project ideas will have groups.
Project idea list
- Predict whether text was written by humans or generated by AI. Tasks include predicting for data across languages and for academic essays. See GenAI Content Detection Workshop, Task 1 or 2 and dataset. This project is the same as project 1.5 in the example projects list above.
- Train a system to predict abstract terms related to a passage and answer multiple choice questions. See SemEval 2021 Task 4 and dataset. This project is the same as project 4.1 in the example projects list above.
- Classify adversarial prompts for LLMs based on attack type, using publicly available red-teaming datasets.
- Build an information retrieval system that finds relevant legal precedents for new cases.
- Given a review of a restaurant, determine what type of restaurant it is from this Yelp dataset
- Determine if a YouTube comment is like-farming/baiting. Examples include, “Only people from TikTok are allowed to like this comment,” “For every like this gets I’ll do a pushup,” “Like if you’re a true fan” etc). This can be important since spam accounts tend to copy and paste legitimate messages with problematic usernames or profile pictures.
- Compare hate speech by groups outside of the US with hateful language used by US groups. This project would use NLP techniques to analyze narrative, topic and style differences.
- Analyze how women’s commentary online on computer science topics (for instance on Stack Overflow or YouTube) are received by other users. This could involve sentiment analysis.
- Classify tweet replies as bots or not.
- Predict bullish/bearish sentiment toward companies from sites like r/WSB or Yahoo Finance.
- Develop language technologies for endangered languages and less-commonly spoken languages. This could involve building language models or machine translation systems for Ligurian, or automatically labeling syntax in Yupik or other languages
- Predict if a given text is a “dad joke” using this dataset.
- Compare sentiment toward bikes and bicyclists across biking-oriented and non-biking-oriented subreddits, with a focus on Pittsburgh.
- Build a system to automatically simplify complex articles or papers into simple, easily digestible versions.
Project proposal
Due 02-28.
Please submit one per group on Canvas. There is no required length or format for this report. This proposal will be a report with answers to a series of questions. It will include a peer review where you will rate your own performance and the performance of other group members through the form here.
- What is the problem or task you are focusing on?
- What is the format of the input and output of this task? For example, each input could be a sentence of text and the output could be a label from a discrete set of possible labels. Provide at least one example of input and output from your data.
- What data are you using? Please explain where these datasets are from and how they were constructed. Provide links to any URLs if the data is hosted online or links to papers if the dataset is published somewhere. If the data has labels or “gold” text that you are predicting or generating, where do those labels come from?
- What approach are you taking to building a NLP system to handle this task? What software packages are you planning to use to build this system? Except in some cases, the approach should draw on statistical approaches we’ve covered in class so far, such as n-gram representations of text. Talk to the instructor if you are not sure about this.
- How are you evaluating your approach? What performance metrics are you going to use?
- What kinds of ethical issues may be raised by your model or data?
- What are the proposed steps needed for completion of (your proposed part) of the project? This should be in some detail, for example, loading and potentially cleaning the data, training models, trying different parameters, evaluating models, etc.
- What are roles and tasks of each person in the group? Though group members will contribute in various capacities, it is best if each person is responsible for at least one aspect of the project.
Project proposal presentation
In class 03-10.
Groups will make a brief presentation to the class outlining their proposed project, with Q&A and opportunities for feedback from other students. Please plan for maximum 5-minute presentations not including Q&A, which will be held right afterward for each group. Please add your slides to this shared PowerPoint presentation. Presentations are not graded. Cover at least these key points:
- Project motivation
- What data you are planning to use
- What approach/methods you plan to take
- How you will evaluate your approach
Progress report
Due 03-27.
The progress report will contain a substantive update on your group’s progress using traditional (usually n-gram based) approaches on your task, as well as a description of how you will use LLMs for your task. You do not have to repeat information from the project proposal except for basic descriptions of the project and Part 1 information if you already provided it in the proposal. Here are the details:
Part 1: Data basic statistics and exploratory analysis
In this part, please provide the following information about your dataset. It’s fine to be working with multiple datasets; just complete this for each one or for a final dataset you will be using if you are combining datasets.
- The number of rows (datapoints) in the dataset and what each datapoint corresponds to. If you are splitting the dataset into a training, test, and possible dev sets, how many rows are in each?
- The number of columns in the dataset you will be using and what each corresponds to.
- If applicable, the distribution of the target labels you are predicting. So for a binary sentiment classification task, how many rows in each set (except the test set) are marked negative or positive sentiment? This can be in a table or graph format.
- Optionally, any other distribution or data visualization that you think is helpful for understanding your dataset or task.
Part 2: A result from baseline (traditional) approach
In your proposal, you described an initial baseline approach to your task, which for most groups was using n-gram features in some way. Please provide one (hopefully quantitative) result from your work so far in this direction. Ideally this would be a performance metric result from your baseline approach on a dev or test set. But if you’re not that far yet, you can also provide an example of working input and output from your system or part of a system, some sort of plot or other output. You can be up front about challenges you are facing for which you might need help; to get a good grade, I’ll just be looking for some sort of output from a working system or part of a system. If you are confused what this means for your project, contact the instructor.
Part 3: LLM proposal
In the project, you will be comparing your baseline system’s performance to that of an LLM. Please describe how you might use an LLM programmatically to attempt your task. The simplest way to do this would be in a “zero-shot” setting where you simply ask the LLM to do the task, but even that requires setting up and passing your data to the LLM and evaluating it. Please describe what you plan to do and which LLM you plan on using. You can also propose using more advanced approaches such as in-context learning (few-shot prompting), chain-of-thought prompting, prompt optimization or fine-tuning. Not all groups have to use an LLM here if you have already talked to the instructor; in that case, please describe the rest of the approach you will be taking to complete the project.
Part 4: Open questions and challenges
Please describe any open questions or challenges your group has at this point. Will you need any resources other than the ones provided in class (OpenAI API access, CRCD access) or have any other questions? Also describe if the roles for each of your team members have changed since the proposal and if so, what the new roles are.
Deliverable
Assemble your results and writing for each part in a document to submit as a PDF on Canvas. There is no required format for this document other than being in PDF format.
Final presentation
In class TBD.
Requirements TBD. Groups will present their finished work to the group, with Q&A and feedback opportunities from students. Please prepare a maximum 5-minute presentation.
Final report
Due 04-24.
Requirements TBD. At the end of the course, groups will provide a written report of their project.