Curriculum
Trust & Safety: Platforms, Policies, and Products
The three course assignments are scaffolded to follow this arc: the Discord Bot covers reactive moderation (Weeks 2–6), the Bluesky Labeler covers proactive moderation (Weeks 6–10), and the Podcast Factchecking project covers applied T&S research (Weeks 10–15).
Assignment Overview
| # | Assignment | Format | Milestones | Approx. Due |
|---|---|---|---|---|
| 1 | Discord Bot | Individual (M1) + Group (M2, M3) | M1: Abuse study + policy; M2: Reporting flow + bot; M3: Smart classifier (extra credit) | M1: Wk 4 / M2: Wk 6 / M3: Wk 6 |
| 2 | Bluesky Moderation | Individual or small group | M1–M4: Core labeler; M5: Policy proposal | M1–M4: Wk 9 / M5: Wk 10 |
| 3 | Podcast Factchecking | Individual | Data collection, analysis + report | Data + analysis: Wk 13 / Report: Wk 15 |
Week-by-Week Schedule
Week 1 — Foundations of Trust & Safety
Lecture 1: Introduction to Trust & Safety
- Source: Introduction to Trust and Safety
- Content: purpose and history of T&S; high-level taxonomy of abuse types (CSAM, hate speech, spam, terrorism, self-harm, fraud, etc.); reactive vs. proactive models; building T&S teams; overview of automated technologies
- Supplement: 🎥 20-min lecture on intelligence in T&S by Inbal Goldberger (ActiveFence)
- Short case video: 🎥 Case video / Transcript
Lecture 2: Large-Scale T&S Systems in Practice
- Source: Large Scale Trust & Safety Systems
- Content: how industry T&S infrastructure is organized at scale; the organizational modules and overarching problems; provides the map for the rest of the semester
- Note: this lecture situates everything covered in Lectures 3–30; use it as a semester roadmap that students can refer back to throughout the course.
Reading: 📘 tsbook ch1 — "Fighting the Forever War" (Stamos, Grossman, Pfefferkorn)
Active Learning: Play Moderator Mayhem individually before class; debrief in-class on the ambiguity revealed by the game.
Note on scope: Use Dotmocracy in this first week to have students vote on which abuse types the class will cover in depth. This shapes the harm-specific weeks (Weeks 7–9, 14) and ensures the class addresses what students find most pressing.
Week 2 — Content Policy and Moderation Frameworks
Lecture 3: Pitfalls of Binary Classification
- Source: Pitfalls of Binary Classification
- Content: enforcing platform policies against concrete examples; borderline cases; contrasting policies for the same harm; why binary framing breaks down at the margins
- Active learning strategies built in:
- "Set It Up" — classify 3 examples against a reference policy; identify the obvious violation and the obvious non-violation
- Think-pair-share — discuss borderline cases in pairs, then share with class (use clickers for voting)
- Contrasting cases — given two policies for the same harm, diagnose where each breaks down (false positives, false negatives, ambiguity, multiple thresholds needed)
- Reference policies: Pitfalls of Binary Classification links to Political Ad Policies case
Lecture 4: Content Moderation I — History, Models, and the Anti-Censorship Ethos
- Source: Content Moderation
- Content: early internet norms; anti-censorship ethos and how it shaped moderation; models for content moderation; framing effects of regulatory language; scale considerations
Assigned: Discord Bot Milestone 1 — Abuse Study and Content Policy (individual; due end of Week 4)
Reading: TSPA T&S Fundamentals handbook; TSPA library selections at instructor discretion
Week 3 — Content Moderation at Scale and Community Governance
Lecture 5: Content Moderation II — Commercial Moderators, Community Norms, and Scale
- Source: Content Moderation (continuation)
- Content: what commercial content moderation looks like at platforms like Twitch; moderator actions (remove, warn, ban, shadow block); relationship between moderators and platform admins; the psychological burden on moderators
- Exercise: 🔗 Content Moderation exercises doc
- Short case video: 🎥 Case video / Lesson plan
Lecture 6: Community Moderation and Self-Governance
- Source: 🔗 Seering (KAIST) community moderation slides
- Content: community-based moderation models (Reddit, Discord, Wikipedia); how rules emerge; when community governance succeeds or fails
- Supplement: Seering KAIST syllabus for framing
Reading: TSPA Handbook, Content Moderation and Operations
Week 4 — Metrics and Measurement
Lecture 7: Technical Lab — Discord Bot Setup Workshop
- Source:
Assignments/Discord Bot/discord_bot_assignment/starter code - Content: what makes a good user reporting flow; tradeoffs between specificity and usability; the behind-the-scenes moderator review flow; multi-tier review; outcomes (remove, warn, shadow-block, escalate)
- Lab: Python async programming; discord.py API; bot initialization; the mod channel pattern; forwarding messages; emoji reactions for moderator workflows
- Lab: students fork the repo and get their bots running in their group channels; TAs available
Lecture 8: Metrics and Measurement I — What to Measure and Why
- Source: Metrics and Measurement
- Also available: 🔗 Google Slides (fall 2023)
- Content: what is a metric; generic platform metrics (DAU, retention) vs. T&S-specific metrics; defining "success" in T&S; prevalence estimates; common industry metrics; developing new metrics
- Exercise: 🔗 Metrics exercises doc — students design metrics for their chosen Discord Bot abuse type
DUE: Discord Bot M1 — Abuse Study and Content Policy
ASSIGNED: Discord Bot M2 — Content Moderation Bot (group; due end of Week 6)
Forming groups: after M1 is submitted, form interdisciplinary groups of 4–5. Ideally mix students with technical and policy backgrounds if the course is cross-listed.
Reading: Metrics exercises doc; T&S reading list Module C
Week 5 — Government Regulation and Legal Frameworks
Lecture 9: Governments and the Internet I — U.S. Law and Section 230
- Source: Government Regulation
- Content: why the U.S. became the center of the internet economy; overall U.S. approach to internet policy; Section 230 ("the 26 words that created the internet") — text, scope, limits, and contested interpretations
- In-class activity: read Section 230 aloud in full; discuss its implications for T&S enforcement
Lecture 10: Governments and the Internet II — International Regulation and Copyright
- Source: Government Regulation (continuation)
- Supplement: 🔗 Copyright Safe Harbors and DMCA slides by Justin Francese (U of Oregon)
- Content: EU Digital Services Act; Australia's eSafety Commissioner model; notice-and-takedown; copyright and safe harbor; how different regulatory frameworks shape platform behavior
- Supplement: 🎥 16-min lecture on Australia's eSafety Commissioner
Reading: T&S reading list Module B, The Twenty-Six Words That Created the Internet by Jeff Kosseff
Week 6 — Proactive Moderation and the Bluesky Labeler
Lecture 11: Proactive Moderation — Classifiers, Hash-Matching, and Labeling Systems
- Source: 📓
Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-lecture.ipynb - Content: text classification pipelines; perceptual hashing (PHash, PhotoDNA); the labeler architecture on Bluesky (AppView → Relay → PDS); differences between proactive automated detection and reactive user-reporting
- In-class coding demo: run the Bluesky starter code; attach a test label; view it in the Bluesky UI
Lecture 12: Decentralized Moderation and the AT Protocol
- Content: federated/decentralized platform architectures; how AT Protocol enables user-configurable filtering; trade-offs between centralized and decentralized moderation; third-party labelers as a governance model
- Reference: Bluesky moderation architecture docs; list of labelers
DUE: Discord Bot M2 — Content Moderation Bot. Extra credit: Discord Bot M3 Smart Classifier
ASSIGNED: Bluesky Moderation M1–M4 (setup, T&S words, news citation, perceptual hashing; due end of Week 9)
Reading: Weapons of Math Destruction by Cathy O'Neil
Week 7 — Harassment and Hate Speech
Lecture 13: Harassment and Hate Speech I — Definitions, Scale, and Policy
- Source: Harassment and Hate Speech
- Content: spectrum of harassment; hate speech definitions and jurisdictional variation; which identities are most targeted; the role of anonymity and pseudonymity; platform policy evolution
Lecture 14: Harassment and Hate Speech II — Automated Detection and Exercises
- Source: Harassment and Hate Speech (continuation)
- Supplement: 🔗 McLester hate speech and harassment slides (UAB)
- Supplement: 🎥 25-min lecture on online hate speech by Mark Schneider (UPenn)
- Exercises: 🔗 Harassment and Hate Speech exercises doc
- In-class structured debate: safety (public harm) vs. censorship (freedom of speech); how do different platforms' decisions reflect this tradeoff?
Reading: 📘 tsbook ch6 — Harassment (full chapter)
Week 8 — Terrorism, Radicalization, and Extremism
Lecture 15: Terrorism, Radicalization, and Extremism
- Source: Terrorism Radicalization and Extremism
- Also available: 🔗 Google Slides (fall 2023)
- Content: definitions; radicalization models; online recruitment pipelines; counter-terrorism vs. counter-violent extremism (CVE); the live-streaming of attacks; role of platform algorithms in amplification
- Supplement: 🎥 30-min lecture on terrorism, radicalization, and extremism by Marten Risius (U of Queensland)
Lecture 16: CVE, Counter-Speech, and Policy Responses
- Source: Terrorism Radicalization and Extremism (exercises + discussion)
- Exercises: 🔗 Terrorism, Radicalization, and Extremism exercises doc
- Case study: GIFCT hash-sharing database; Jan. 6th Capitol Riot deplatforming decisions
Reading: T&S reading list Module G on Terrorism.
Week 9 — Platform Manipulation and Identity
Lecture 17: Authentication, Identity, and Platform Manipulation
- Source: Authentication, Identity, and Platform Manipulation
- Content: authentication models; real-name vs. pseudonymous policies; coordinated inauthentic behavior (CIB); sockpuppet networks; astroturfing; state-sponsored information operations
- Exercises: 🔗 Auth/Identity exercises doc
- Supplement: 🔗 McLester investigations and intelligence lecture
Lecture 18: Spam, Fraud, and Account Integrity
- Content: spam taxonomy; online fraud (scams, phishing, impersonation); account takeovers; the arms race between spammers and platforms; ML-based abuse detection at account level
- Reading: 📘 tsbook ch2 — Spam and Online Fraud (Nelly Agbogu case study is an excellent discussion anchor)
DUE: Bluesky M1–M4 (T&S words labeler, news citation labeler, perceptual hash dog labeler)
ASSIGNED: Bluesky M5 — Policy Proposal Labeler (due end of Week 10)
Reading: T&S reading list Module K on Authenticity.
Week 10 — Misinformation I: Definitions, Spread, and Detection
Lecture 19: Misinformation — Definitions, Spread, and Platform Responses
- Source: Misinformation
- Content: the mis/dis/mal taxonomy (Wardle & Derakhshan information disorder framework); how false information spreads through social networks; the role of algorithmic amplification; platform interventions (labels, friction, removal, amplification reduction); the fact-checking ecosystem
- Supplement: 🎥 45-min lecture on misinformation by Sarah Shirazyan (Stanford Law)
Lecture 20: Misinformation Detection Tutorial
- Source: Misinformation Detection Tutorial (IC2S2 2025)
- Content: hands-on NLP tutorial — claim-level check-worthiness detection using the CT24 dataset; full pipeline from data loading through feature engineering, classifier training (logistic regression, BERT-based), evaluation (precision, recall, F1), and error analysis; AI-generated misinformation detection techniques; Podcast Factchecking preview using the PodChecker system (Irmetova et al., 2026)
- This is a lab-style session; students should have Python and the tutorial dependencies installed before class.
Reading: T&S reading list Module F on the Information Environment.
DUE: Bluesky M5
ASSIGNED: Podcast Factchecking (data + analysis due Week 13; final report due Week 15)
Coordinating M5 with earlier work: students should implement the same abuse type they researched for Discord Bot M1, making the policy proposal in M5 a direct technical extension of the written analysis from Week 4. Consider also requiring students to apply a counter-intervention framing from Lecture 21 (Adversarial Adaptation) to their M5 policy design.
Week 11 — Misinformation II: Source Credibility, Interventions, and Adversarial Limits
Lecture 21: Source Credibility and Misinformation Source Detection
- Source: Source Credibility
- Content: what makes a source credible; SEO-based misinformation source detection using CommonCrawl webgraphs; backlinking patterns as credibility signals; multi-class classification of news domains; feature importances for predicting credibility vs. political reliability; limitations (implied content, domain decay, propaganda vs. opinion)
Lecture 22: Intervention Effectiveness — Misinformation and Search Rankings
- Source: Intervention Effectiveness — Misinformation and Search Rankings
- Content: small-scale PageRank-based interventions; personalized PageRank and authority-based reranking; large-scale link scheme removal; "multi-category" scheme removal as a more precise intervention tool; traffic estimates from CommonCrawl vs. SimilarWeb; design principles for robust interventions; open problems and future directions
Reading: Sample of papers from the special topic reading list on Misinformation, at least 1 paper per category.
Week 12 — Attack Surfaces and Technical Defense
Lecture 23: Adversarial Adaptation and the Limitations of Interventions
- Source: Adversarial Adaptation and the Limitations of Interventions
- Content: how adversaries adapt to interventions over time (SEO gaming, platform manipulation, bot evolution); the credibility–pluralism tradeoff — credibility-based filtering reduces source diversity; assortativity in news transition matrices; Wasserstein distances to quantify polarization effects; principles for adversarially robust policy design
- This lecture directly sets up the proactive moderation problem introduced in Week 11: reactive interventions are always lagging, which motivates automated proactive systems.
Lecture 24: Types of Attack Surfaces I — Safety Perspective
- Source: Types of Attack Surfaces
- Content: attack surface taxonomy; how bad actors exploit platform features; API abuse; content injection; account compromise vectors
Reading: T&S reading list Module L on Attack Surfaces
Week 13 — Emerging Technologies
Lecture 25: Types of Attack Surfaces II — Security Perspective
- Source: Types of Attack Surfaces (continuation)
- Content: platform defenses; CAPTCHAs and bot detection; rate limiting; shadow-banning; detection pipelines
- Exercises: 🔗 Attack Surfaces exercises doc
Lecture 26: Emerging Topics I — AI in Trust and Safety
- Source: Emerging Topics — AI in Trust and Safety
- Also available: 🔗 Google Slides (January 2024)
- Content: AI and ML in T&S (generative AI, detection, red-teaming); AR/VR harm areas; emerging platforms and harm surfaces; T&S career pathways; what skills employers look for
- Interactive demo: AI-generated content identification exercise (slides include an interactive breakout)
- Exercises: 🔗 Emerging Technologies exercises doc
DUE: Podcast Factchecking data + analysis
ASSIGNED: Course paper writeup. Reference: Trust & Safety Journal for graduate-level research directions, and ICWSM / IC2S2 for computational social science.
Reading: T&S reading list Module M on Emerging Technologies
Week 14 — Emerging Technologies II
Lecture 27: Emerging Topics II — Adversarial Retrieval and LLMs
- Source: Emerging Topics — Adversarial Retrieval
- Content: how adversaries manipulate retrieval-augmented generation (RAG) systems and search indexes; corpus poisoning and gradient-based attacks on IR systems; SEO manipulation as an information operation; connecting the misinformation interventions from Weeks 9–10 to the LLM attack surface; adversarial means, motives, and opportunities in the AI era
Lecture 28: Emerging Topics III — LLM Hallucinations and Knowledge Conflicts
- Source: Emerging Topics — LLM Hallucinations and Knowledge Conflicts
- Content: faithfulness vs. factuality hallucinations; knowledge conflicts between parametric memory, retrieved context, and ground truth; RLHF safety tuning and jailbreaking; detection and mitigation approaches; implications for T&S practitioners deploying LLM-based moderation or fact-checking systems
Reading: Sample of papers from the special topic reading list on Adversarial Retrieval and LLMs, at least 1 paper per category.
Week 15 — Research Methods and Student Presentations
Lecture 29: Project Presentations I
- Format: groups present Discord Bot M3 results (or Podcast Factchecking for individually structured courses); guest judges from industry where possible (see Consortium member list in the README for potential invitees)
- ~8 minutes per group + Q&A; rubric focuses on policy motivation, technical implementation, testing and evaluation, and ethical reflection
Lecture 30: Project Presentations II + Course Debrief
- Format: remaining presentations + open debrief
- Discussion: what has changed in T&S since the semester began? What did the class get wrong? What questions remain?
- Optional: play Trust & Safety Tycoon as a closing reflection on organizational complexity
DUE: Reports due
Child Safety, Sexual Exploitation, and Platform Well-Being
⚠️ Content Warning: These topics cover deeply sensitive material. They are for further reading only and are not examinable or covered directly in class. They are included here for completeness. If this content causes any distress, please reach out for support. Links to the student centers for mental health and wellbeing are provided.
Child and Adult Sexual Exploitation
- Source: 🔗 Google Slides
- Content: CSAM definitions and legal landscape; PhotoDNA and hash-matching at scale; NCMEC partnerships and reporting obligations; grooming detection; sextortion; proxy content for classroom exercises
- Exercises: 🔗 CASE exercises doc
- Short case video: 🔗 CASE case video
- Reading: 📘 tsbook ch7 — Child Sexual Exploitation
Suicide, Self-Harm, and Platform Well-Being
- Source: 🔗 Google Slides (fall 2023)
- Content: safe messaging guidelines; the role of algorithmic amplification in self-harm content; contagion effects; platform design for well-being; tension between supporting at-risk users and removing harmful content; mental health resources for moderators
- Supplement: 🎥 45-min lecture by Katherine Keyes (Columbia University)
- Exercises: 🔗 Suicide, Self-Harm, and Well-Being exercises doc
Reading Resources
- Primary textbook: 📘 tsbook — The book is a living draft; chapters on misinformation, extremism, and emerging tech may be added.
- Consortium reading list: 🔗 Full reading list (Google Docs) — organized by module, aligns directly with lecture sequence above.
- TSPA curriculum: T&S Fundamentals and Library
Credits
The lectures draw on the Trust & Safety Teaching Consortium materials.
Likewise, assignments are accredited where previous materials have been drawn on.
Active Learning Strategies Summary
| Strategy | Week | Description |
|---|---|---|
| Moderator Mayhem | 1 | Individual in-game decisions; debrief reveals policy ambiguity |
| Dotmocracy | 1 | Students vote on which abuse types the class covers in depth |
| "Set It Up" | 2 | Classify 3 cases against a reference policy; identify clear and unclear violations |
| Think-pair-share | 2, 7 | Borderline case analysis; clicker voting to reveal class distribution |
| Contrasting cases | 2 | Two policies for the same harm; diagnose where each breaks down |
| Structured debate | 7 | Safety vs. censorship; platform deplatforming decisions (e.g., Jan. 6th) |
| Misinformation Detection Lab | 10 | Hands-on CT24 check-worthiness classifier (Lecture 20 tutorial session) |
| Fishbowl (optional) | 2–7 | Students act out cases; rest of class makes moderation determinations. High risk — requires careful topic selection and opt-out option. |
| Trust & Safety Tycoon | 15 | Closing game on organizational tradeoffs |
| Full-course exercise | 1–15 | 🔗 Full-course exercise doc from Consortium Introduction module |