Adversarial Retrieval and LLMs Syllabus (Advanced Topic)
Course Organization
This course examines how large language models handle — and fail to handle — factual knowledge, and how adversaries exploit these failure modes in information retrieval and generation systems. Lectures are organized in four modules that move from internal model mechanics outward to ecosystem-level attacks.
For a broader overview of the Trust and Safety space, see the Trust & Safety class.
Before taking the class, read through the LLM background literature (considered pre-requisite)
Lectures and Delivery Order
Lecture 1: Memorization, Generalization, and Specialization in LLMs
- Source: Memorization, Generalization, and Specialization in LLMs
Introduces the core tension at the heart of the course: LLMs memorize training data (enabling recall but risking privacy leakage and stale knowledge) while also generalizing (enabling zero-shot tasks but introducing hallucinations). Covers finetuning vs. zero-shot prompting on QA tasks, the VoLTA vision-language model as an extended case, and why retrieval-augmented generation (RAG) reduces memorization-driven errors. Establishes the vocabulary for lectures 2–4.
Lecture 2: LLM Hallucinations and Knowledge Conflicts
- Source: LLM Hallucinations and Knowledge Conflicts
Deepens the hallucination picture by distinguishing faithfulness hallucinations (model contradicts its context) from factuality hallucinations (model contradicts the world). Introduces knowledge conflicts — situations where parametric knowledge, retrieved knowledge, and real-world facts diverge — and discusses how RLHF safety tuning interacts with faithfulness. Covers entity substitution frameworks and conflict-inducing dataset construction.
Lecture 3: Adversarial Adaptation in Information Systems
- Source: Adversarial Adaptation In Information Systems
Broadens scope from model internals to the adversarial information ecosystem. Uses a "means, motives, and opportunities" framework to analyze how actors adapt content to manipulate search rankings, social platforms, and recommendation systems. Covers SEO manipulation, social bot adaptation, memorialization hacking, and the trustworthiness/pluralism tradeoff that constrains platform interventions. This lecture is the conceptual bridge between the model-focused material (Lectures 1–2) and the attack-focused material (Lecture 4).
Lecture 4: Adversarial Attacks on IR Systems
- Source: Adversarial Attacks on IR Systems
Catalogues specific technical attacks against information retrieval systems: malicious text and image encoding, gradient-based multi-view topic attacks, poisoned corpus attacks, and RAG-specific poisoning. Applies the means/motives/opportunities framework to SEO attack vectors, including evidence that unreliable news sites are disproportionately linked by paid schemes. Covers the AREA (Adversarial REtrieval Attack) literature.
Tutorial Sessions
Tutorial: How Do I Make a Good Classifier?
A Python-focused practical guide to binary classification: data collection, annotation and inter-rater reliability (Cohen's Kappa, Fleiss' Kappa, Krippendorff's Alpha), preprocessing, class imbalance handling, model selection, hyperparameter tuning, and evaluation (precision, recall, F1, ROC-AUC). Best delivered as a lab session after Lecture 2, when students have encountered hallucination/conflict classification tasks in context.
AI Worksheets
Extended worksheet collection supporting the readings and lectures. Covers alignment, reasoning, RAG, vision-language models, and adversarial scenarios.
Trust
- Hallucinations + misinformation (SegSub)
- Typologies
- Generality vs specialization
- Need for RAG to stay up to date...
Safety
- Adversarial information retrieval
- Jailbreaks of agentic AI (benchmark papers, sudo rm -rf agent security)
- Ethics
Readings
See Advanced Topics/Adversarial Retrieval and LLMs/LLM background literature for primers on model design and training processes.
Alignment
- Readings:
- Optional:
Reasoning
- Readings:
- Optional:
Claims against Reasoning
- Readings:
- Optional:
Retrieval-Augmented Generation
- REALM: Retrieval-Augmented Language Model Pre-Training (v)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (v)
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (v)
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
- When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively
Hallucination
Vision-Language Models
- Readings:
- Optional: