Curriculum

Trust & Safety: Platforms, Policies, and Products

The three course assignments are scaffolded to follow this arc: the Discord Bot covers reactive moderation (Weeks 2–6), the Bluesky Labeler covers proactive moderation (Weeks 6–10), and the Podcast Factchecking project covers applied T&S research (Weeks 10–15).


Assignment Overview

# Assignment Format Milestones Approx. Due
1 Discord Bot Individual (M1) + Group (M2, M3) M1: Abuse study + policy; M2: Reporting flow + bot; M3: Smart classifier (extra credit) M1: Wk 4 / M2: Wk 6 / M3: Wk 6
2 Bluesky Moderation Individual or small group M1–M4: Core labeler; M5: Policy proposal M1–M4: Wk 9 / M5: Wk 10
3 Podcast Factchecking Individual Data collection, analysis + report Data + analysis: Wk 13 / Report: Wk 15

Week-by-Week Schedule


Week 1 — Foundations of Trust & Safety

Lecture 1: Introduction to Trust & Safety

Lecture 2: Large-Scale T&S Systems in Practice

  • Source: Large Scale Trust & Safety Systems
  • Content: how industry T&S infrastructure is organized at scale; the organizational modules and overarching problems; provides the map for the rest of the semester
  • Note: this lecture situates everything covered in Lectures 3–30; use it as a semester roadmap that students can refer back to throughout the course.

Reading: 📘 tsbook ch1"Fighting the Forever War" (Stamos, Grossman, Pfefferkorn)

Active Learning: Play Moderator Mayhem individually before class; debrief in-class on the ambiguity revealed by the game.

Note on scope: Use Dotmocracy in this first week to have students vote on which abuse types the class will cover in depth. This shapes the harm-specific weeks (Weeks 7–9, 14) and ensures the class addresses what students find most pressing.


Week 2 — Content Policy and Moderation Frameworks

Lecture 3: Pitfalls of Binary Classification

  • Source: Pitfalls of Binary Classification
  • Content: enforcing platform policies against concrete examples; borderline cases; contrasting policies for the same harm; why binary framing breaks down at the margins
  • Active learning strategies built in:
    • "Set It Up" — classify 3 examples against a reference policy; identify the obvious violation and the obvious non-violation
    • Think-pair-share — discuss borderline cases in pairs, then share with class (use clickers for voting)
    • Contrasting cases — given two policies for the same harm, diagnose where each breaks down (false positives, false negatives, ambiguity, multiple thresholds needed)
  • Reference policies: Pitfalls of Binary Classification links to Political Ad Policies case

Lecture 4: Content Moderation I — History, Models, and the Anti-Censorship Ethos

  • Source: Content Moderation
  • Content: early internet norms; anti-censorship ethos and how it shaped moderation; models for content moderation; framing effects of regulatory language; scale considerations

Assigned: Discord Bot Milestone 1 — Abuse Study and Content Policy (individual; due end of Week 4)

Reading: TSPA T&S Fundamentals handbook; TSPA library selections at instructor discretion


Week 3 — Content Moderation at Scale and Community Governance

Lecture 5: Content Moderation II — Commercial Moderators, Community Norms, and Scale

  • Source: Content Moderation (continuation)
  • Content: what commercial content moderation looks like at platforms like Twitch; moderator actions (remove, warn, ban, shadow block); relationship between moderators and platform admins; the psychological burden on moderators
  • Exercise: 🔗 Content Moderation exercises doc
  • Short case video: 🎥 Case video / Lesson plan

Lecture 6: Community Moderation and Self-Governance

Reading: TSPA Handbook, Content Moderation and Operations


Week 4 — Metrics and Measurement

Lecture 7: Technical Lab — Discord Bot Setup Workshop

  • Source: Assignments/Discord Bot/discord_bot_assignment/ starter code
  • Content: what makes a good user reporting flow; tradeoffs between specificity and usability; the behind-the-scenes moderator review flow; multi-tier review; outcomes (remove, warn, shadow-block, escalate)
  • Lab: Python async programming; discord.py API; bot initialization; the mod channel pattern; forwarding messages; emoji reactions for moderator workflows
  • Lab: students fork the repo and get their bots running in their group channels; TAs available

Lecture 8: Metrics and Measurement I — What to Measure and Why

  • Source: Metrics and Measurement
  • Also available: 🔗 Google Slides (fall 2023)
  • Content: what is a metric; generic platform metrics (DAU, retention) vs. T&S-specific metrics; defining "success" in T&S; prevalence estimates; common industry metrics; developing new metrics
  • Exercise: 🔗 Metrics exercises doc — students design metrics for their chosen Discord Bot abuse type

DUE: Discord Bot M1 — Abuse Study and Content Policy

ASSIGNED: Discord Bot M2 — Content Moderation Bot (group; due end of Week 6)

Forming groups: after M1 is submitted, form interdisciplinary groups of 4–5. Ideally mix students with technical and policy backgrounds if the course is cross-listed.

Reading: Metrics exercises doc; T&S reading list Module C


Lecture 9: Governments and the Internet I — U.S. Law and Section 230

  • Source: Government Regulation
  • Content: why the U.S. became the center of the internet economy; overall U.S. approach to internet policy; Section 230 ("the 26 words that created the internet") — text, scope, limits, and contested interpretations
  • In-class activity: read Section 230 aloud in full; discuss its implications for T&S enforcement

Lecture 10: Governments and the Internet II — International Regulation and Copyright

Reading: T&S reading list Module B, The Twenty-Six Words That Created the Internet by Jeff Kosseff


Week 6 — Proactive Moderation and the Bluesky Labeler

Lecture 11: Proactive Moderation — Classifiers, Hash-Matching, and Labeling Systems

  • Source: 📓 Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-lecture.ipynb
  • Content: text classification pipelines; perceptual hashing (PHash, PhotoDNA); the labeler architecture on Bluesky (AppView → Relay → PDS); differences between proactive automated detection and reactive user-reporting
  • In-class coding demo: run the Bluesky starter code; attach a test label; view it in the Bluesky UI

Lecture 12: Decentralized Moderation and the AT Protocol

  • Content: federated/decentralized platform architectures; how AT Protocol enables user-configurable filtering; trade-offs between centralized and decentralized moderation; third-party labelers as a governance model
  • Reference: Bluesky moderation architecture docs; list of labelers

DUE: Discord Bot M2 — Content Moderation Bot. Extra credit: Discord Bot M3 Smart Classifier

ASSIGNED: Bluesky Moderation M1–M4 (setup, T&S words, news citation, perceptual hashing; due end of Week 9)

Reading: Weapons of Math Destruction by Cathy O'Neil


Week 7 — Harassment and Hate Speech

Lecture 13: Harassment and Hate Speech I — Definitions, Scale, and Policy

  • Source: Harassment and Hate Speech
  • Content: spectrum of harassment; hate speech definitions and jurisdictional variation; which identities are most targeted; the role of anonymity and pseudonymity; platform policy evolution

Lecture 14: Harassment and Hate Speech II — Automated Detection and Exercises

Reading: 📘 tsbook ch6Harassment (full chapter)


Week 8 — Terrorism, Radicalization, and Extremism

Lecture 15: Terrorism, Radicalization, and Extremism

Lecture 16: CVE, Counter-Speech, and Policy Responses

Reading: T&S reading list Module G on Terrorism.


Week 9 — Platform Manipulation and Identity

Lecture 17: Authentication, Identity, and Platform Manipulation

Lecture 18: Spam, Fraud, and Account Integrity

  • Content: spam taxonomy; online fraud (scams, phishing, impersonation); account takeovers; the arms race between spammers and platforms; ML-based abuse detection at account level
  • Reading: 📘 tsbook ch2 — Spam and Online Fraud (Nelly Agbogu case study is an excellent discussion anchor)

DUE: Bluesky M1–M4 (T&S words labeler, news citation labeler, perceptual hash dog labeler)
ASSIGNED: Bluesky M5 — Policy Proposal Labeler (due end of Week 10)

Reading: T&S reading list Module K on Authenticity.


Week 10 — Misinformation I: Definitions, Spread, and Detection

Lecture 19: Misinformation — Definitions, Spread, and Platform Responses

Lecture 20: Misinformation Detection Tutorial

  • Source: Misinformation Detection Tutorial (IC2S2 2025)
  • Content: hands-on NLP tutorial — claim-level check-worthiness detection using the CT24 dataset; full pipeline from data loading through feature engineering, classifier training (logistic regression, BERT-based), evaluation (precision, recall, F1), and error analysis; AI-generated misinformation detection techniques; Podcast Factchecking preview using the PodChecker system (Irmetova et al., 2026)
  • This is a lab-style session; students should have Python and the tutorial dependencies installed before class.

Reading: T&S reading list Module F on the Information Environment.

DUE: Bluesky M5
ASSIGNED: Podcast Factchecking (data + analysis due Week 13; final report due Week 15)

Coordinating M5 with earlier work: students should implement the same abuse type they researched for Discord Bot M1, making the policy proposal in M5 a direct technical extension of the written analysis from Week 4. Consider also requiring students to apply a counter-intervention framing from Lecture 21 (Adversarial Adaptation) to their M5 policy design.


Week 11 — Misinformation II: Source Credibility, Interventions, and Adversarial Limits

Lecture 21: Source Credibility and Misinformation Source Detection

  • Source: Source Credibility
  • Content: what makes a source credible; SEO-based misinformation source detection using CommonCrawl webgraphs; backlinking patterns as credibility signals; multi-class classification of news domains; feature importances for predicting credibility vs. political reliability; limitations (implied content, domain decay, propaganda vs. opinion)

Lecture 22: Intervention Effectiveness — Misinformation and Search Rankings

  • Source: Intervention Effectiveness — Misinformation and Search Rankings
  • Content: small-scale PageRank-based interventions; personalized PageRank and authority-based reranking; large-scale link scheme removal; "multi-category" scheme removal as a more precise intervention tool; traffic estimates from CommonCrawl vs. SimilarWeb; design principles for robust interventions; open problems and future directions

Reading: Sample of papers from the special topic reading list on Misinformation, at least 1 paper per category.


Week 12 — Attack Surfaces and Technical Defense

Lecture 23: Adversarial Adaptation and the Limitations of Interventions

  • Source: Adversarial Adaptation and the Limitations of Interventions
  • Content: how adversaries adapt to interventions over time (SEO gaming, platform manipulation, bot evolution); the credibility–pluralism tradeoff — credibility-based filtering reduces source diversity; assortativity in news transition matrices; Wasserstein distances to quantify polarization effects; principles for adversarially robust policy design
  • This lecture directly sets up the proactive moderation problem introduced in Week 11: reactive interventions are always lagging, which motivates automated proactive systems.

Lecture 24: Types of Attack Surfaces I — Safety Perspective

  • Source: Types of Attack Surfaces
  • Content: attack surface taxonomy; how bad actors exploit platform features; API abuse; content injection; account compromise vectors

Reading: T&S reading list Module L on Attack Surfaces


Week 13 — Emerging Technologies

Lecture 25: Types of Attack Surfaces II — Security Perspective

Lecture 26: Emerging Topics I — AI in Trust and Safety

DUE: Podcast Factchecking data + analysis
ASSIGNED: Course paper writeup. Reference: Trust & Safety Journal for graduate-level research directions, and ICWSM / IC2S2 for computational social science.

Reading: T&S reading list Module M on Emerging Technologies


Week 14 — Emerging Technologies II

Lecture 27: Emerging Topics II — Adversarial Retrieval and LLMs

  • Source: Emerging Topics — Adversarial Retrieval
  • Content: how adversaries manipulate retrieval-augmented generation (RAG) systems and search indexes; corpus poisoning and gradient-based attacks on IR systems; SEO manipulation as an information operation; connecting the misinformation interventions from Weeks 9–10 to the LLM attack surface; adversarial means, motives, and opportunities in the AI era

Lecture 28: Emerging Topics III — LLM Hallucinations and Knowledge Conflicts

  • Source: Emerging Topics — LLM Hallucinations and Knowledge Conflicts
  • Content: faithfulness vs. factuality hallucinations; knowledge conflicts between parametric memory, retrieved context, and ground truth; RLHF safety tuning and jailbreaking; detection and mitigation approaches; implications for T&S practitioners deploying LLM-based moderation or fact-checking systems

Reading: Sample of papers from the special topic reading list on Adversarial Retrieval and LLMs, at least 1 paper per category.


Week 15 — Research Methods and Student Presentations

Lecture 29: Project Presentations I

  • Format: groups present Discord Bot M3 results (or Podcast Factchecking for individually structured courses); guest judges from industry where possible (see Consortium member list in the README for potential invitees)
  • ~8 minutes per group + Q&A; rubric focuses on policy motivation, technical implementation, testing and evaluation, and ethical reflection

Lecture 30: Project Presentations II + Course Debrief

  • Format: remaining presentations + open debrief
  • Discussion: what has changed in T&S since the semester began? What did the class get wrong? What questions remain?
  • Optional: play Trust & Safety Tycoon as a closing reflection on organizational complexity

DUE: Reports due


Child Safety, Sexual Exploitation, and Platform Well-Being

⚠️ Content Warning: These topics cover deeply sensitive material. They are for further reading only and are not examinable or covered directly in class. They are included here for completeness. If this content causes any distress, please reach out for support. Links to the student centers for mental health and wellbeing are provided.

Child and Adult Sexual Exploitation

  • Source: 🔗 Google Slides
  • Content: CSAM definitions and legal landscape; PhotoDNA and hash-matching at scale; NCMEC partnerships and reporting obligations; grooming detection; sextortion; proxy content for classroom exercises
  • Exercises: 🔗 CASE exercises doc
  • Short case video: 🔗 CASE case video
  • Reading: 📘 tsbook ch7Child Sexual Exploitation

Suicide, Self-Harm, and Platform Well-Being


Reading Resources

  • Primary textbook: 📘 tsbook — The book is a living draft; chapters on misinformation, extremism, and emerging tech may be added.
  • Consortium reading list: 🔗 Full reading list (Google Docs) — organized by module, aligns directly with lecture sequence above.
  • TSPA curriculum: T&S Fundamentals and Library

Credits

The lectures draw on the Trust & Safety Teaching Consortium materials.
Likewise, assignments are accredited where previous materials have been drawn on.


Active Learning Strategies Summary

Strategy Week Description
Moderator Mayhem 1 Individual in-game decisions; debrief reveals policy ambiguity
Dotmocracy 1 Students vote on which abuse types the class covers in depth
"Set It Up" 2 Classify 3 cases against a reference policy; identify clear and unclear violations
Think-pair-share 2, 7 Borderline case analysis; clicker voting to reveal class distribution
Contrasting cases 2 Two policies for the same harm; diagnose where each breaks down
Structured debate 7 Safety vs. censorship; platform deplatforming decisions (e.g., Jan. 6th)
Misinformation Detection Lab 10 Hands-on CT24 check-worthiness classifier (Lecture 20 tutorial session)
Fishbowl (optional) 2–7 Students act out cases; rest of class makes moderation determinations. High risk — requires careful topic selection and opt-out option.
Trust & Safety Tycoon 15 Closing game on organizational tradeoffs
Full-course exercise 1–15 🔗 Full-course exercise doc from Consortium Introduction module