CSCE 689 - Special Topics in Trustworthy Natural Language Processing

Course Information

Lectures

Instructor

Grading

Assignments

Late Policy

Schedule

Week Date Topic Details Note
W18/19 Course Overview [slides]
8/21 Natural Language Processing Basics [slides] Common NLP Tasks, Training Pipelines, Word Representations
8/23 Natural Language Processing Basics [slides] Word Representations, Tokenization
W28/26 Natural Language Processing Basics [slides] Tokenization, Convolutional Neural Network, Recurrent Neural Network, Long Short-Term Memory
8/28 Natural Language Processing Basics [slides] Long Short-Term Memory, Attention, Transformers
8/30 Natural Language Processing Basics [slides] Transformers, Contextualized Representations, Pre-Training
W39/2 Labor Day (No Class)
9/4 Natural Language Processing Basics [slides] Pre-Training, Language Models
9/6 Natural Language Processing Basics [slides] Large Language Models, Prompting, In-Context Learning, Instruction Tuning
W49/9 Adversarial Attacks and Defenses [slides] [Instructor] Generating Natural Language Adversarial Examples, EMNLP 2018
[Instructor] BERT-ATTACK: Adversarial Attack Against BERT Using BERT, EMNLP 2020
[Instructor] Universal Adversarial Triggers for Attacking and Analyzing NLP, EMNLP 2019
Summary Due
9/11 Adversarial Attacks and Defenses [slides] [Instructor] Certified Robustness to Adversarial Word Substitutions, EMNLP 2019
[Instructor] Towards Robustness Against Natural Language Word Substitutions, ICLR 2021
[Instructor] Universal and Transferable Adversarial Attacks on Aligned Language Models, arXiv 2023
9/13 Adversarial Attacks and Defenses [Student] Adversarial Example Generation with Syntactically Controlled Paraphrase Networks, NAACL 2018
[Student] Jailbreaking Black Box Large Language Models in Twenty Queries, arXiv 2023
W59/16 Backdoor Attacks and Data Poisoning [slides] [Instructor] Weight Poisoning Attacks on Pre-trained Models, ACL 2020
[Instructor] Concealed Data Poisoning Attacks on NLP Models, NAACL 2021
[Instructor] Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer, EMNLP 2021
Summary Due
9/18 Backdoor Attacks and Data Poisoning [slides] [Instructor] Poisoning Language Models During Instruction Tuning, ICML 2023
[Instructor] Rethinking Stealthiness of Backdoor Attack against NLP Models, EMNLP 2021
[Instructor] ONION: A Simple and Effective Defense Against Textual Backdoor Attacks, EMNLP 2021
9/20 Backdoor Attacks and Data Poisoning [Student] Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder, EMNLP-Findings 2020
[Student] RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models, EMNLP 2021
W69/23 AI-Generated Text Detection [slides] [Instructor] Defending Against Neural Fake News, NeurIPS 2019
[Instructor] DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, ICML 2023
[Instructor] Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, ICLR 2024
Summary Due
9/25 AI-Generated Text Detection [slides] [Instructor] A Watermark for Large Language Models, ICML 2023
[Instructor] SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation, NAACL 2024
9/27 AI-Generated Text Detection [Student] RADAR: Robust AI-Text Detection via Adversarial Learning, NeurIPS 2023
[Student] Paraphrasing Evades Detectors of AI-Generated Text, But Retrieval is An Effective Defense, NeurIPS 2023
W79/30 Model Uncertainty [slides] [Instructor] Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2016
[Instructor] Calibration of Pre-trained Transformers, EMNLP 2020
[Instructor] Uncertainty Estimation in Autoregressive Structured Prediction, ICLR 2021
Summary Due
10/2 Model Uncertainty [slides] [Instructor] Teaching Models to Express Their Uncertainty in Words, TMLR 2022
[Instructor] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback, EMNLP 2023
[Instructor] R-Tuning: Instructing Large Language Models to Say `I Don't Know', NAACL 2024
10/4 Model Uncertainty [Student] Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation, ICLR 2023
[Student] Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling, ICML 2024
W810/7 Fall Break (No Class)
10/9 Invited Talk (Remote) Machine Unlearning: the general theory and LLM practice for privacy
Speaker: Eli Chien
10/11 Team Project Highlights Team Project Highlights
W910/14 Model Explainability and Interpretability [slides] [Instructor] Rationalizing Neural Predictions, EMNLP 2016
[Instructor] “Why Should I Trust You?”: Explaining the Predictions of Any Classifier, EMNLP 2016
[Instructor] Towards Explainable NLP: A Generative Explanation Framework for Text Classification, ACL 2019
Summary Due
10/16 Model Explainability and Interpretability [slides] [Instructor] A Unified Approach to Interpreting Model Predictions, NeurIPS 2017
[Instructor] Understanding Black-box Predictions via Influence Functions, ICML 2017
[Instructor] Chain-of-Thought Prompting Elicits Reasoning, arXiv 2022
10/18 Model Explainability and Interpretability [Student] Reframing Human-AI Collaboration for Generating Free-Text Explanations, NAACL 2022
[Student] Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023
W1010/21 Bias Detection and Mitigation [slides] [Instructor] Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NeurIPS 2016
[Instructor] The Woman Worked as a Babysitter: On Biases in Language Generation, EMNLP 2019
[Instructor] On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning, ACL 2023
Summary Due
10/23 Bias Detection and Mitigation [slides] [Instructor] Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, EMNLP 2017
[Instructor] Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal, ACL 2022
[Instructor] BLIND: Bias Removal With No Demographics, ACL 2023
10/25 Bias Detection and Mitigation [Student] Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection, ACL 2020
[Student] From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models, ACL 2023
W1110/28 Human Preference Alignment [slides] [Instructor] Fine-Tuning Language Models from Human Preferences, arXiv 2019
[Instructor] Training language models to follow instructions with human feedback, NeurIPS 2022
Summary Due
10/30 Human Preference Alignment [slides] [Instructor] Direct Preference Optimization: Your Language Model is Secretly a Reward Model, NeurIPS 2023
[Instructor] mDPO: Conditional Preference Optimization for Multimodal Large Language Models, arXiv 2024
[Instructor] KTO: Model Alignment as Prospect Theoretic Optimization, ICML 2024
11/1 Human Preference Alignment [Student] SimPO: Simple Preference Optimization with a Reference-Free Reward, arXiv 2024
[Student] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, ICML 2024
W1211/4 Hallucinations and Misinformation Control [slides] [Instructor] Do Language Models Know When They're Hallucinating References?, EACL 2024
[Instructor] How Language Model Hallucinations Can Snowball, ICML 2024
[Instructor] SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models, EMNLP 2023
Summary Due
11/6 Hallucinations and Misinformation Control [slides] [Instructor] Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation, ICLR 2024
[Instructor] FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation, EMNLP 2023
11/8 Hallucinations and Misinformation Control [Student] SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency, EMNLP 2023
[Student] Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension, ICML 2024
W1311/11 Robustness of Multimodal Models (Remote) [slides] [Instructor] Learning Transferable Visual Models From Natural Language Supervision, ICML 2021
[Instructor] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, ICML 2022
[Instructor] Visual Instruction Tuning, NeurIPS 2023
Summary Due
11/13 Robustness of Multimodal Models (Remote) [slides] [Instructor] When and why vision-language models behave like bags-of-words, and what to do about it?, ICLR 2023
[Instructor] Text encoders bottleneck compositionality in contrastive vision-language models, EMNLP 2023
[Instructor] Paxion: Patching Action Knowledge in Video-Language Foundation Models, NeurIPS 2023
11/15 Robustness of Multimodal Models [Student] Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models, ICML 2024
[Student] On the Robustness of Large Multimodal Models Against Image Adversarial Attacks, CVPR 2024
W1411/18 Robustness of Multimodal Models [Student] CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning, ICCV 2023
[Student] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs, CVPR 2024
11/20 Project Presentations
11/22 Project Presentations
W1511/25 Project Presentations
11/27 Reading Day (No Class)
11/29 Thanksgiving (No Class)