CSCE 689 - Special Topics in Trustworthy Natural Language Processing

CSCE 689, Fall 2024

Special Topics in Trustworthy Natural Language Processing

Course Information

Lectures

Time: Monday/Wednesday/Friday 3:00pm – 3:50pm
Location: HRBB 126

Instructor

Kuan-Hao Huang
Email: khhuang [at] tamu [dot] edu
Office: PETR 219
Office Hour: Wednesday 1pm – 2pm

Grading

Assignments

Paper Summary (15%)
Paper Presentation (15%)
Paper Presentation Peer Feedback (10%)
Team Project: Proposal (10%) [Due: 9/25]
Team Project: Midterm Report (10%) [Due: 10/27]
Team Project: Final Presentation (20%)
Team Project: Final Report (20%) [Due: 12/8]

Late Policy

Paper Summary and Paper Presentation: No late submission
Others:
- 1 day late: 10% penalty
- 2 days late: 20% penalty
- 3 days late: 30% penalty
- 4 days late: 50% penalty
- 5 or more days late: 100% penalty

Schedule

Week	Date	Topic	Details	Note
W1	8/19	Course Overview
	8/21	Natural Language Processing Basics	Common NLP Tasks, Training Pipelines, Word Representations
	8/23	Natural Language Processing Basics	Word Representations, Tokenization
W2	8/26	Natural Language Processing Basics	Tokenization, Convolutional Neural Network, Recurrent Neural Network, Long Short-Term Memory
	8/28	Natural Language Processing Basics	Long Short-Term Memory, Attention, Transformers
	8/30	Natural Language Processing Basics	Transformers, Contextualized Representations, Pre-Training
W3	9/2	Labor Day (No Class)
	9/4	Natural Language Processing Basics	Pre-Training, Language Models
	9/6	Natural Language Processing Basics	Large Language Models, Prompting, In-Context Learning, Instruction Tuning
W4	9/9	Adversarial Attacks and Defenses	[Instructor] Generating Natural Language Adversarial Examples, EMNLP 2018 [Instructor] BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP 2020 [Instructor] Universal Adversarial Triggers for Attacking and Analyzing NLP, EMNLP 2019	Summary Due
	9/11	Adversarial Attacks and Defenses	[Instructor] Certified Robustness to Adversarial Word Substitutions, EMNLP 2019 [Instructor] Towards Robustness Against Natural Language Word Substitutions, ICLR 2021 [Instructor] Universal and Transferable Adversarial Attacks on Aligned Language Models, arXiv 2023
	9/13	Adversarial Attacks and Defenses	[Student] Adversarial Example Generation with Syntactically Controlled Paraphrase Networks, NAACL 2018 [Student] Jailbreaking Black Box Large Language Models in Twenty Queries, arXiv 2023
W5	9/16	Backdoor Attacks and Data Poisoning	[Instructor] Weight Poisoning Attacks on Pre-trained Models, ACL 2020 [Instructor] Concealed Data Poisoning Attacks on NLP Models, NAACL 2021 [Instructor] Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer, EMNLP 2021	Summary Due
	9/18	Backdoor Attacks and Data Poisoning	[Instructor] Poisoning Language Models During Instruction Tuning, ICML 2023 [Instructor] Rethinking Stealthiness of Backdoor Attack against NLP Models, EMNLP 2021 [Instructor] ONION: A Simple and Effective Defense Against Textual Backdoor Attacks, EMNLP 2021
	9/20	Backdoor Attacks and Data Poisoning	[Student] Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder, EMNLP-Findings 2020 [Student] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models, EMNLP 2021
W6	9/23	AI-Generated Text Detection	[Instructor] Defending Against Neural Fake News, NeurIPS 2019 [Instructor] DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, ICML 2023 [Instructor] Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024	Summary Due
	9/25	AI-Generated Text Detection	[Instructor] A Watermark for Large Language Models, ICML 2023 [Instructor] SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation, NAACL 2024
	9/27	AI-Generated Text Detection	[Student] RADAR: Robust AI-Text Detection via Adversarial Learning, NeurIPS 2023 [Student] Paraphrasing Evades Detectors of AI-Generated Text, But Retrieval is An Effective Defense, NeurIPS 2023
W7	9/30	Model Uncertainty	[Instructor] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2016 [Instructor] Calibration of Pre-trained Transformers, EMNLP 2020 [Instructor] Uncertainty Estimation in Autoregressive Structured Prediction, ICLR 2021	Summary Due
	10/2	Model Uncertainty	[Instructor] Teaching Models to Express Their Uncertainty in Words, TMLR 2022 [Instructor] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback, EMNLP 2023 [Instructor] R-Tuning: Instructing Large Language Models to Say `I Don't Know', NAACL 2024
	10/4	Model Uncertainty	[Student] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation, ICLR 2023 [Student] Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling, ICML 2024
W8	10/7	Fall Break (No Class)
	10/9	Invited Talk (Remote)	Machine Unlearning: the general theory and LLM practice for privacy Speaker: Eli Chien
	10/11	Team Project Highlights	Team Project Highlights
W9	10/14	Model Explainability and Interpretability	[Instructor] Rationalizing Neural Predictions, EMNLP 2016 [Instructor] “Why Should I Trust You?”: Explaining the Predictions of Any Classifier, EMNLP 2016 [Instructor] Towards Explainable NLP: A Generative Explanation Framework for Text Classification, ACL 2019	Summary Due
	10/16	Model Explainability and Interpretability	[Instructor] A Unified Approach to Interpreting Model Predictions, NeurIPS 2017 [Instructor] Understanding Black-box Predictions via Influence Functions, ICML 2017 [Instructor] Chain-of-Thought Prompting Elicits Reasoning, arXiv 2022
	10/18	Model Explainability and Interpretability	[Student] Reframing Human-AI Collaboration for Generating Free-Text Explanations, NAACL 2022 [Student] Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023
W10	10/21	Bias Detection and Mitigation	[Instructor] Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS 2016 [Instructor] The Woman Worked as a Babysitter: On Biases in Language Generation, EMNLP 2019 [Instructor] On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, ACL 2023	Summary Due
	10/23	Bias Detection and Mitigation	[Instructor] Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, EMNLP 2017 [Instructor] Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal, ACL 2022 [Instructor] BLIND: Bias Removal With No Demographics, ACL 2023
	10/25	Bias Detection and Mitigation	[Student] Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection, ACL 2020 [Student] From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models, ACL 2023
W11	10/28	Human Preference Alignment	[Instructor] Fine-Tuning Language Models from Human Preferences, arXiv 2019 [Instructor] Training language models to follow instructions with human feedback, NeurIPS 2022	Summary Due
	10/30	Human Preference Alignment	[Instructor] Direct Preference Optimization: Your Language Model is Secretly a Reward Model, NeurIPS 2023 [Instructor] mDPO: Conditional Preference Optimization for Multimodal Large Language Models, arXiv 2024 [Instructor] KTO: Model Alignment as Prospect Theoretic Optimization, ICML 2024
	11/1	Human Preference Alignment	[Student] SimPO: Simple Preference Optimization with a Reference-Free Reward, arXiv 2024 [Student] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, ICML 2024
W12	11/4	Hallucinations and Misinformation Control	[Instructor] Do Language Models Know When They're Hallucinating References?, EACL 2024 [Instructor] How Language Model Hallucinations Can Snowball, ICML 2024 [Instructor] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023	Summary Due
	11/6	Hallucinations and Misinformation Control	[Instructor] Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024 [Instructor] FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation, EMNLP 2023
	11/8	Hallucinations and Misinformation Control	[Student] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency, EMNLP 2023 [Student] Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension, ICML 2024
W13	11/11	Robustness of Multimodal Models (Remote)	[Instructor] Learning Transferable Visual Models From Natural Language Supervision, ICML 2021 [Instructor] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, ICML 2022 [Instructor] Visual Instruction Tuning, NeurIPS 2023	Summary Due
	11/13	Robustness of Multimodal Models (Remote)	[Instructor] When and why vision-language models behave like bags-of-words, and what to do about it?, ICLR 2023 [Instructor] Text encoders bottleneck compositionality in contrastive vision-language models, EMNLP 2023 [Instructor] Paxion: Patching Action Knowledge in Video-Language Foundation Models, NeurIPS 2023
	11/15	Robustness of Multimodal Models	[Student] Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models, ICML 2024 [Student] On the Robustness of Large Multimodal Models Against Image Adversarial Attacks, CVPR 2024
W14	11/18	Robustness of Multimodal Models	[Student] CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV 2023 [Student] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, CVPR 2024
	11/20	Project Presentations
	11/22	Project Presentations
W15	11/25	Project Presentations
	11/27	Reading Day (No Class)
	11/29	Thanksgiving (No Class)