Adversarial Retrieval and LLMs Syllabus (Advanced Topic)

Course Organization

This course examines how large language models handle — and fail to handle — factual knowledge, and how adversaries exploit these failure modes in information retrieval and generation systems. Lectures are organized in four modules that move from internal model mechanics outward to ecosystem-level attacks.

For a broader overview of the Trust and Safety space, see the Trust & Safety class.

Before taking the class, read through the LLM background literature (considered pre-requisite)

Lectures and Delivery Order

Lecture 1: Memorization, Generalization, and Specialization in LLMs

Source: Memorization, Generalization, and Specialization in LLMs
Introduces the core tension at the heart of the course: LLMs memorize training data (enabling recall but risking privacy leakage and stale knowledge) while also generalizing (enabling zero-shot tasks but introducing hallucinations). Covers finetuning vs. zero-shot prompting on QA tasks, the VoLTA vision-language model as an extended case, and why retrieval-augmented generation (RAG) reduces memorization-driven errors. Establishes the vocabulary for lectures 2–4.

Lecture 2: LLM Hallucinations and Knowledge Conflicts

Source: LLM Hallucinations and Knowledge Conflicts
Deepens the hallucination picture by distinguishing faithfulness hallucinations (model contradicts its context) from factuality hallucinations (model contradicts the world). Introduces knowledge conflicts — situations where parametric knowledge, retrieved knowledge, and real-world facts diverge — and discusses how RLHF safety tuning interacts with faithfulness. Covers entity substitution frameworks and conflict-inducing dataset construction.

Lecture 3: Adversarial Adaptation in Information Systems

Source: Adversarial Adaptation In Information Systems
Broadens scope from model internals to the adversarial information ecosystem. Uses a "means, motives, and opportunities" framework to analyze how actors adapt content to manipulate search rankings, social platforms, and recommendation systems. Covers SEO manipulation, social bot adaptation, memorialization hacking, and the trustworthiness/pluralism tradeoff that constrains platform interventions. This lecture is the conceptual bridge between the model-focused material (Lectures 1–2) and the attack-focused material (Lecture 4).

Lecture 4: Adversarial Attacks on IR Systems

Source: Adversarial Attacks on IR Systems
Catalogues specific technical attacks against information retrieval systems: malicious text and image encoding, gradient-based multi-view topic attacks, poisoned corpus attacks, and RAG-specific poisoning. Applies the means/motives/opportunities framework to SEO attack vectors, including evidence that unreliable news sites are disproportionately linked by paid schemes. Covers the AREA (Adversarial REtrieval Attack) literature.

Tutorial Sessions

Tutorial: How Do I Make a Good Classifier?
A Python-focused practical guide to binary classification: data collection, annotation and inter-rater reliability (Cohen's Kappa, Fleiss' Kappa, Krippendorff's Alpha), preprocessing, class imbalance handling, model selection, hyperparameter tuning, and evaluation (precision, recall, F1, ROC-AUC). Best delivered as a lab session after Lecture 2, when students have encountered hallucination/conflict classification tasks in context.

AI Worksheets
Extended worksheet collection supporting the readings and lectures. Covers alignment, reasoning, RAG, vision-language models, and adversarial scenarios.

Zotero

Trust

Hallucinations + misinformation (SegSub)
Typologies
Generality vs specialization
Need for RAG to stay up to date...

Safety

Adversarial information retrieval
Jailbreaks of agentic AI (benchmark papers, sudo rm -rf agent security)
Ethics

Readings

See Advanced Topics/Adversarial Retrieval and LLMs/LLM background literature for primers on model design and training processes.

Alignment

Reasoning

Claims against Reasoning

Readings:
- On the Planning Abilities of Large Language Models - A Critical Investigation
- Chain of Thoughtlessness? An Analysis of CoT in Planning
- Recent Trends and Developments after O1, e.g.,
  - https://www.arxiv.org/abs/2409.13373
  - https://cdn.openai.com/o1-system-card.pdf
Optional:
- On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
- On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models