CSCE 689, Fall 2025

Special Topics in Trustworthy Natural Language Processing

Course Information

Lectures

Instructor

Grading

Assignments

Late Policy

Schedule

Week Date Topic Readings Presenter
W18/25 Course Overview [slides] Instructor
8/27 NLP Basics [slides] Machine Learning Basics, Word Representations Instructor
W29/1 Labor Day (No Class)
9/3 NLP Basics Language Modeling Convolutional Neural Network Instructor
W39/8 NLP Basics Recurrent Neural Network, Sequence-to-Sequence Instructor
9/10 NLP Basics Attention, Transformers Instructor
9/11 LaTeX Assignment Due
W49/15 NLP Basics Contextualized Representations, Pre-Training, Text Similarity Instructor
9/17 NLP Basics Large Language Models, Vision-Language Models Instructor
W59/22 Human Preference Alignment Training language models to follow instructions with human feedback, NeurIPS 2022
Direct Preference Optimization: Your Language Model is Secretly a Reward Model, NeurIPS 2023
SimPO: Simple Preference Optimization with a Reference-Free Reward, NeurIPS 2024
Understanding R1-Zero-Like Training: A Critical Perspective, arXiv 2025
Instructor
9/24 Bias Detection and Mitigation Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS 2016
Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, EMNLP 2017
BLIND: Bias Removal With No Demographics, ACL 2023
On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, ACL 2023
Instructor
W69/29 AI-Generated Text Detection Defending Against Neural Fake News, NeurIPS 2019
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, ICML 2023
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024
A Watermark for Large Language Models, ICML 2023
Instructor
10/1 Adversarial Attacks and Jailbreaking Universal Adversarial Triggers for Attacking and Analyzing NLP, EMNLP 2019
BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP 2020
Towards Robustness Against Natural Language Word Substitutions, ICLR 2021
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models, NeurIPS 2024
Student
10/2 Literature Review Due
W710/6 Backdoor Attacks and Data Poisoning Concealed Data Poisoning Attacks on NLP Models, NAACL 2021
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models, arXiv 2024
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge, ICLR 2025
Student
10/8 Invited Talk (Remote) Title: TBD
Speaker: I-Hung Hsu, Research Scientist at Google
10/9 Proposal Due
W810/13 Fall Break (No Class)
10/15 Project Highlight Presentations
W910/20 Multimodal Models When and why vision-language models behave like bags-of-words, and what to do about it?, ICLR 2023
What's "up" with vision-language models? Investigating their struggle with spatial reasoning, EMNLP 2023
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, CVPR 2024
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas, ICML 2025
Student
10/22 In-Context Learning Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?, EMNLP 2022
Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning, EMNLP-Findings 2023
What Makes a Good Order of Examples in In-Context Learning, ACL 2024
Revisiting Demonstration Selection Strategies in In-Context Learning, ACL 2024
Student
W1010/27 Position Bias Lost in the Middle: How Language Models Use Long Contexts, TACL 2023
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization, ACL-Findings 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding, NeurIPS 2024
Eliminating Position Bias of Language Models: A Mechanistic Approach, ICLR 2025
Student
10/29 Long-Context Language Models Focused Transformer: Contrastive Training for Context Scaling, NeurIPS 2023
Extending Context Window of Large Language Models via Positional Interpolation, arXiv 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models, NAACL 2024
YaRN: Efficient Context Window Extension of Large Language Models, ICLR 2024
Student
W1111/3 Hallucinations FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation, EMNLP 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023
How Language Model Hallucinations Can Snowball, ICML 2024
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps, EMNLP 2024
Student
11/5 Multilingual Models Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models, ACL 2024
How do Large Language Models Handle Multilingualism?, NeurIPS 2024
Do Multilingual LLMs Think In English?, arXiv 2025
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning, ACL 2025
Student
11/6 Midterm Report Due
W1211/10 Model Explainability Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic 2023
Inference to the Best Explanation in Large Language Models, ACL 2024
Are self-explanations from Large Language Models faithful?, ACL 2024
Multi-Level Explanations for Generative Language Models, ACL 2025
Student
11/12 Model Reasoning Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023
Faithful Logical Reasoning via Symbolic Chain-of-Thought, ACL 2024
ProcessBench: Identifying Process Errors in Mathematical Reasoning, ACL 2025
s1: Simple test-time scaling, arXiv 2025
Student
W1311/17 Model Editing Locating and Editing Factual Associations in GPT, NeurIPS 2022
Mass-Editing Memory in a Transformer, ICLR 2023
PMET: Precise Model Editing in a Transformer, AAAI 2024
A Unified Framework for Model Editing, EMNLP-Findings 2024
Student
11/19 Tool-Augmented Language Models Toolformer: Language Models Can Teach Themselves to Use Tools, NeurIPS 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, NeurIPS 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, ICLR 2024
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets, ICLR 2024
Student
W1411/24 Invited Talk (Remote) Title: TBD
Speaker: Fei Wang, Research Scientist at Google
11/26 Reading day (No Class)
W1512/1 Project Presentations
12/3 Project Presentations
W1612/9 Final Report Due