CSCE 689 - Special Topics in Trustworthy Natural Language Processing

CSCE 689, Fall 2025

Special Topics in Trustworthy Natural Language Processing

Course Information

Lectures

Time: Monday/Wednesday 4:10pm – 5:25pm
Location: HRBB 126

Instructor

Kuan-Hao Huang
Email: khhuang [at] tamu [dot] edu
Office: PETR 219
Office Hour: Wednesday 2pm – 3pm

Grading

Assignments

LaTeX Assignment (1%) [Due: 9/11]
Paper Summary (10%)
Feedback Form (10%)
Topic Study (30%)

Literature Review (15%) [Due: 10/2]
Topic Presentation (15%)

Course Project (49%)

Project Proposal (5%) [Due: 10/9]
Project Highlight Presentation (5%) [Due: 10/22]
Midterm Report (10%) [Due: 11/6]
Final Presentation (12%) [Due: 12/1]
Final Report (17%) [Due: 12/9]

Late Policy

Literature Review, Project Proposal, Midterm Report, Final Report

1 day late: 10% penalty
2 days late: 20% penalty
3 days late: 30% penalty
4 days late: 50% penalty
5 or more days late: 100% penalty

LaTeX Assignment, Topic Presentation Slides, Feedback Form, and Others

No late submissions allowed

Schedule

Week	Date	Topic	Readings	Presenter
W1	8/25	Course Overview		Instructor
	8/27	NLP Basics	Machine Learning Basics, Word Representations	Instructor
W2	9/1	Labor Day (No Class)
	9/3	NLP Basics	Word Representations, Language Modeling	Instructor
W3	9/8	NLP Basics	Convolutional Neural Network, Recurrent Neural Network	Instructor
	9/10	NLP Basics	Attention, Transformers	Instructor
	9/11	LaTeX Assignment Due
W4	9/15	NLP Basics	Contextualized Representations, Pre-Training, Large Language Models	Instructor
	9/17	NLP Basics	Text Similarity, Retrieval-Augmented Generation, Vision-Language Models	Instructor
W5	9/22	Human Preference Alignment	Training language models to follow instructions with human feedback, NeurIPS 2022 Direct Preference Optimization: Your Language Model is Secretly a Reward Model, NeurIPS 2023 SimPO: Simple Preference Optimization with a Reference-Free Reward, NeurIPS 2024 Understanding R1-Zero-Like Training: A Critical Perspective, arXiv 2025	Instructor
	9/24	Bias Detection and Mitigation	Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS 2016 Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, EMNLP 2017 BLIND: Bias Removal With No Demographics, ACL 2023 On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, ACL 2023	Instructor
W6	9/29	AI-Generated Text Detection	Defending Against Neural Fake News, NeurIPS 2019 DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, ICML 2023 Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024 A Watermark for Large Language Models, ICML 2023	Instructor
	10/1	Adversarial Attacks and Jailbreaking	Universal Adversarial Triggers for Attacking and Analyzing NLP, EMNLP 2019 BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP 2020 Towards Robustness Against Natural Language Word Substitutions, ICLR 2021 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models, NeurIPS 2024	Student
	10/2	Literature Review Due
W7	10/6	Backdoor Attacks and Data Poisoning	Concealed Data Poisoning Attacks on NLP Models, NAACL 2021 BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models, arXiv 2024 Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations, NAACL-Findings 2025 BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge, ICLR 2025	Student
	10/8	Invited Talk (Remote)	Title: Beyond Single-Step: Evolving LLM Reasoning with Step-wise Learning and Persistent Memory Speaker: I-Hung Hsu, Research Scientist at Google
	10/9	Proposal Due
W8	10/13	Fall Break (No Class)
	10/15	Multimodal Models	When and why vision-language models behave like bags-of-words, and what to do about it?, ICLR 2023 What's "up" with vision-language models? Investigating their struggle with spatial reasoning, EMNLP 2023 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, CVPR 2024 Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas, ICML 2025	Student
W9	10/20	In-Context Learning	Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?, EMNLP 2022 Not All Demonstration Examples are Equally Beneficial: Reweighting Demonstration Examples for In-Context Learning, EMNLP-Findings 2023 What Makes a Good Order of Examples in In-Context Learning, ACL 2024 Revisiting Demonstration Selection Strategies in In-Context Learning, ACL 2024	Student
	10/22	Project Highlight Presentations (Remote)
W10	10/27	Position Bias	Lost in the Middle: How Language Models Use Long Contexts, TACL 2023 Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization, ACL-Findings 2024 Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding, NeurIPS 2024 Eliminating Position Bias of Language Models: A Mechanistic Approach, ICLR 2025	Student
	10/29	Long-Context Language Models	Focused Transformer: Contrastive Training for Context Scaling, NeurIPS 2023 Extending Context Window of Large Language Models via Positional Interpolation, arXiv 2023 LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models, NAACL 2024 YaRN: Efficient Context Window Extension of Large Language Models, ICLR 2024	Student
W11	11/3	Hallucinations	FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation, EMNLP 2023 SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023 How Language Model Hallucinations Can Snowball, ICML 2024 Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps, EMNLP 2024	Student
	11/5	Multilingual Models	Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models, ACL 2024 How do Large Language Models Handle Multilingualism?, NeurIPS 2024 Do Multilingual LLMs Think In English?, arXiv 2025 Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning, ACL 2025	Student
	11/6	Midterm Report Due
W12	11/10	Model Explainability	Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, Anthropic 2023 Inference to the Best Explanation in Large Language Models, ACL 2024 Are self-explanations from Large Language Models faithful?, ACL 2024 Multi-Level Explanations for Generative Language Models, ACL 2025	Student
	11/12	Model Reasoning	Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023 Faithful Logical Reasoning via Symbolic Chain-of-Thought, ACL 2024 ProcessBench: Identifying Process Errors in Mathematical Reasoning, ACL 2025 s1: Simple test-time scaling, arXiv 2025	Student
W13	11/17	Model Editing	Locating and Editing Factual Associations in GPT, NeurIPS 2022 Mass-Editing Memory in a Transformer, ICLR 2023 PMET: Precise Model Editing in a Transformer, AAAI 2024 A Unified Framework for Model Editing, EMNLP-Findings 2024	Student
	11/19	Tool-Augmented Language Models	Toolformer: Language Models Can Teach Themselves to Use Tools, NeurIPS 2023 HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, NeurIPS 2023 ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, ICLR 2024 CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets, ICLR 2024	Student
W14	11/24	Invited Talk (Remote)	Title: From Risk to Resilience: Addressing Misalignment in (Multimodal) Large Language Models Speaker: Fei Wang, Research Scientist at Google
	11/26	Reading day (No Class)
W15	12/1	Project Presentations (Remote)
	12/3	Project Presentations (Remote)
W16	12/9	Final Report Due