Trust & Safety: Platforms, Policies, and Products

The three course assignments are scaffolded to follow this arc: the Discord Bot covers reactive moderation (Weeks 2–6), the Bluesky Labeler covers proactive moderation (Weeks 6–10), and the Podcast Factchecking project covers applied T&S research (Weeks 10–15).

Assignment Overview

#	Assignment	Format	Milestones	Approx. Due
1	Discord Bot	Individual (M1) + Group (M2, M3)	M1: Abuse study + policy; M2: Reporting flow + bot; M3: Smart classifier (extra credit)	M1: Wk 4 / M2: Wk 6 / M3: Wk 6
2	Bluesky Moderation	Individual or small group	M1–M4: Core labeler; M5: Policy proposal	M1–M4: Wk 9 / M5: Wk 10
3	Podcast Factchecking	Individual	Data collection, analysis + report	Data + analysis: Wk 13 / Report: Wk 15

Week-by-Week Schedule

Week 1 — Foundations of Trust & Safety

Lecture 1: Introduction to Trust & Safety

Source: Introduction to Trust and Safety
Content: purpose and history of T&S; high-level taxonomy of abuse types (CSAM, hate speech, spam, terrorism, self-harm, fraud, etc.); reactive vs. proactive models; building T&S teams; overview of automated technologies
Supplement: 🎥 20-min lecture on intelligence in T&S by Inbal Goldberger (ActiveFence)
Short case video: 🎥 Case video / Transcript

Lecture 2: Large-Scale T&S Systems in Practice

Source: Large Scale Trust & Safety Systems
Content: how industry T&S infrastructure is organized at scale; the organizational modules and overarching problems; provides the map for the rest of the semester
Note: this lecture situates everything covered in Lectures 3–30; use it as a semester roadmap that students can refer back to throughout the course.

Reading: 📘 tsbook ch1 — "Fighting the Forever War" (Stamos, Grossman, Pfefferkorn)

Active Learning: Play Moderator Mayhem individually before class; debrief in-class on the ambiguity revealed by the game.

Note on scope: Use Dotmocracy in this first week to have students vote on which abuse types the class will cover in depth. This shapes the harm-specific weeks (Weeks 7–9, 14) and ensures the class addresses what students find most pressing.

Week 2 — Content Policy and Moderation Frameworks

Lecture 3: Pitfalls of Binary Classification

Source: Pitfalls of Binary Classification
Content: enforcing platform policies against concrete examples; borderline cases; contrasting policies for the same harm; why binary framing breaks down at the margins
Active learning strategies built in:
- "Set It Up" — classify 3 examples against a reference policy; identify the obvious violation and the obvious non-violation
- Think-pair-share — discuss borderline cases in pairs, then share with class (use clickers for voting)
- Contrasting cases — given two policies for the same harm, diagnose where each breaks down (false positives, false negatives, ambiguity, multiple thresholds needed)
Reference policies: Pitfalls of Binary Classification links to Political Ad Policies case

Lecture 4: Content Moderation I — History, Models, and the Anti-Censorship Ethos

Source: Content Moderation
Content: early internet norms; anti-censorship ethos and how it shaped moderation; models for content moderation; framing effects of regulatory language; scale considerations

Assigned: Discord Bot Milestone 1 — Abuse Study and Content Policy (individual; due end of Week 4)

Reading: TSPA T&S Fundamentals handbook; TSPA library selections at instructor discretion

Week 3 — Content Moderation at Scale and Community Governance

Lecture 5: Content Moderation II — Commercial Moderators, Community Norms, and Scale

Source: Content Moderation (continuation)
Content: what commercial content moderation looks like at platforms like Twitch; moderator actions (remove, warn, ban, shadow block); relationship between moderators and platform admins; the psychological burden on moderators
Exercise: 🔗 Content Moderation exercises doc
Short case video: 🎥 Case video / Lesson plan

Lecture 6: Community Moderation and Self-Governance

Source: 🔗 Seering (KAIST) community moderation slides
Content: community-based moderation models (Reddit, Discord, Wikipedia); how rules emerge; when community governance succeeds or fails
Supplement: Seering KAIST syllabus for framing

Reading: TSPA Handbook, Content Moderation and Operations

Week 4 — Metrics and Measurement

Lecture 7: Technical Lab — Discord Bot Setup Workshop

Source: Assignments/Discord Bot/discord_bot_assignment/ starter code
Content: what makes a good user reporting flow; tradeoffs between specificity and usability; the behind-the-scenes moderator review flow; multi-tier review; outcomes (remove, warn, shadow-block, escalate)
Lab: Python async programming; discord.py API; bot initialization; the mod channel pattern; forwarding messages; emoji reactions for moderator workflows
Lab: students fork the repo and get their bots running in their group channels; TAs available

Lecture 8: Metrics and Measurement I — What to Measure and Why

Source: Metrics and Measurement
Also available: 🔗 Google Slides (fall 2023)
Content: what is a metric; generic platform metrics (DAU, retention) vs. T&S-specific metrics; defining "success" in T&S; prevalence estimates; common industry metrics; developing new metrics
Exercise: 🔗 Metrics exercises doc — students design metrics for their chosen Discord Bot abuse type

DUE: Discord Bot M1 — Abuse Study and Content Policy

ASSIGNED: Discord Bot M2 — Content Moderation Bot (group; due end of Week 6)

Forming groups: after M1 is submitted, form interdisciplinary groups of 4–5. Ideally mix students with technical and policy backgrounds if the course is cross-listed.

Reading: Metrics exercises doc; T&S reading list Module C

Week 5 — Government Regulation and Legal Frameworks

Lecture 9: Governments and the Internet I — U.S. Law and Section 230

Source: Government Regulation
Content: why the U.S. became the center of the internet economy; overall U.S. approach to internet policy; Section 230 ("the 26 words that created the internet") — text, scope, limits, and contested interpretations
In-class activity: read Section 230 aloud in full; discuss its implications for T&S enforcement

Lecture 10: Governments and the Internet II — International Regulation and Copyright

Source: Government Regulation (continuation)
Supplement: 🔗 Copyright Safe Harbors and DMCA slides by Justin Francese (U of Oregon)
Content: EU Digital Services Act; Australia's eSafety Commissioner model; notice-and-takedown; copyright and safe harbor; how different regulatory frameworks shape platform behavior
Supplement: 🎥 16-min lecture on Australia's eSafety Commissioner

Reading: T&S reading list Module B, The Twenty-Six Words That Created the Internet by Jeff Kosseff

Week 6 — Proactive Moderation and the Bluesky Labeler

Lecture 11: Proactive Moderation — Classifiers, Hash-Matching, and Labeling Systems

Source: 📓 Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-lecture.ipynb
Content: text classification pipelines; perceptual hashing (PHash, PhotoDNA); the labeler architecture on Bluesky (AppView → Relay → PDS); differences between proactive automated detection and reactive user-reporting
In-class coding demo: run the Bluesky starter code; attach a test label; view it in the Bluesky UI

Lecture 12: Decentralized Moderation and the AT Protocol

Content: federated/decentralized platform architectures; how AT Protocol enables user-configurable filtering; trade-offs between centralized and decentralized moderation; third-party labelers as a governance model
Reference: Bluesky moderation architecture docs; list of labelers

DUE: Discord Bot M2 — Content Moderation Bot. Extra credit: Discord Bot M3 Smart Classifier

ASSIGNED: Bluesky Moderation M1–M4 (setup, T&S words, news citation, perceptual hashing; due end of Week 9)

Reading: Weapons of Math Destruction by Cathy O'Neil

Week 7 — Harassment and Hate Speech

Lecture 13: Harassment and Hate Speech I — Definitions, Scale, and Policy

Source: Harassment and Hate Speech
Content: spectrum of harassment; hate speech definitions and jurisdictional variation; which identities are most targeted; the role of anonymity and pseudonymity; platform policy evolution

Lecture 14: Harassment and Hate Speech II — Automated Detection and Exercises

Source: Harassment and Hate Speech (continuation)
Supplement: 🔗 McLester hate speech and harassment slides (UAB)
Supplement: 🎥 25-min lecture on online hate speech by Mark Schneider (UPenn)
Exercises: 🔗 Harassment and Hate Speech exercises doc
In-class structured debate: safety (public harm) vs. censorship (freedom of speech); how do different platforms' decisions reflect this tradeoff?

Reading: 📘 tsbook ch6 — Harassment (full chapter)

Week 8 — Terrorism, Radicalization, and Extremism

Lecture 15: Terrorism, Radicalization, and Extremism

Source: Terrorism Radicalization and Extremism
Also available: 🔗 Google Slides (fall 2023)
Content: definitions; radicalization models; online recruitment pipelines; counter-terrorism vs. counter-violent extremism (CVE); the live-streaming of attacks; role of platform algorithms in amplification
Supplement: 🎥 30-min lecture on terrorism, radicalization, and extremism by Marten Risius (U of Queensland)

Lecture 16: CVE, Counter-Speech, and Policy Responses

Source: Terrorism Radicalization and Extremism (exercises + discussion)
Exercises: 🔗 Terrorism, Radicalization, and Extremism exercises doc
Case study: GIFCT hash-sharing database; Jan. 6th Capitol Riot deplatforming decisions

Reading: T&S reading list Module G on Terrorism.

Week 9 — Platform Manipulation and Identity

Lecture 17: Authentication, Identity, and Platform Manipulation

Source: Authentication, Identity, and Platform Manipulation
Content: authentication models; real-name vs. pseudonymous policies; coordinated inauthentic behavior (CIB); sockpuppet networks; astroturfing; state-sponsored information operations
Exercises: 🔗 Auth/Identity exercises doc
Supplement: 🔗 McLester investigations and intelligence lecture

Lecture 18: Spam, Fraud, and Account Integrity

Content: spam taxonomy; online fraud (scams, phishing, impersonation); account takeovers; the arms race between spammers and platforms; ML-based abuse detection at account level
Reading: 📘 tsbook ch2 — Spam and Online Fraud (Nelly Agbogu case study is an excellent discussion anchor)

DUE: Bluesky M1–M4 (T&S words labeler, news citation labeler, perceptual hash dog labeler)
ASSIGNED: Bluesky M5 — Policy Proposal Labeler (due end of Week 10)

Reading: T&S reading list Module K on Authenticity.

Week 10 — Misinformation I: Definitions, Spread, and Detection

Lecture 19: Misinformation — Definitions, Spread, and Platform Responses

Source: Misinformation
Content: the mis/dis/mal taxonomy (Wardle & Derakhshan information disorder framework); how false information spreads through social networks; the role of algorithmic amplification; platform interventions (labels, friction, removal, amplification reduction); the fact-checking ecosystem
Supplement: 🎥 45-min lecture on misinformation by Sarah Shirazyan (Stanford Law)

Lecture 20: Misinformation Detection Tutorial

Source: Misinformation Detection Tutorial (IC2S2 2025)
Content: hands-on NLP tutorial — claim-level check-worthiness detection using the CT24 dataset; full pipeline from data loading through feature engineering, classifier training (logistic regression, BERT-based), evaluation (precision, recall, F1), and error analysis; AI-generated misinformation detection techniques; Podcast Factchecking preview using the PodChecker system (Irmetova et al., 2026)
This is a lab-style session; students should have Python and the tutorial dependencies installed before class.

Reading: T&S reading list Module F on the Information Environment.

DUE: Bluesky M5
ASSIGNED: Podcast Factchecking (data + analysis due Week 13; final report due Week 15)

Coordinating M5 with earlier work: students should implement the same abuse type they researched for Discord Bot M1, making the policy proposal in M5 a direct technical extension of the written analysis from Week 4. Consider also requiring students to apply a counter-intervention framing from Lecture 21 (Adversarial Adaptation) to their M5 policy design.

Week 11 — Misinformation II: Source Credibility, Interventions, and Adversarial Limits

Lecture 21: Source Credibility and Misinformation Source Detection

Source: Source Credibility
Content: what makes a source credible; SEO-based misinformation source detection using CommonCrawl webgraphs; backlinking patterns as credibility signals; multi-class classification of news domains; feature importances for predicting credibility vs. political reliability; limitations (implied content, domain decay, propaganda vs. opinion)

Lecture 22: Intervention Effectiveness — Misinformation and Search Rankings

Source: Intervention Effectiveness — Misinformation and Search Rankings
Content: small-scale PageRank-based interventions; personalized PageRank and authority-based reranking; large-scale link scheme removal; "multi-category" scheme removal as a more precise intervention tool; traffic estimates from CommonCrawl vs. SimilarWeb; design principles for robust interventions; open problems and future directions

Reading: Sample of papers from the special topic reading list on Misinformation, at least 1 paper per category.

Week 12 — Attack Surfaces and Technical Defense

Lecture 23: Adversarial Adaptation and the Limitations of Interventions

Source: Adversarial Adaptation and the Limitations of Interventions
Content: how adversaries adapt to interventions over time (SEO gaming, platform manipulation, bot evolution); the credibility–pluralism tradeoff — credibility-based filtering reduces source diversity; assortativity in news transition matrices; Wasserstein distances to quantify polarization effects; principles for adversarially robust policy design
This lecture directly sets up the proactive moderation problem introduced in Week 11: reactive interventions are always lagging, which motivates automated proactive systems.

Lecture 24: Types of Attack Surfaces I — Safety Perspective

Source: Types of Attack Surfaces
Content: attack surface taxonomy; how bad actors exploit platform features; API abuse; content injection; account compromise vectors

Reading: T&S reading list Module L on Attack Surfaces

Week 13 — Emerging Technologies

Lecture 25: Types of Attack Surfaces II — Security Perspective

Source: Types of Attack Surfaces (continuation)
Content: platform defenses; CAPTCHAs and bot detection; rate limiting; shadow-banning; detection pipelines
Exercises: 🔗 Attack Surfaces exercises doc

Lecture 26: Emerging Topics I — AI in Trust and Safety

Source: Emerging Topics — AI in Trust and Safety
Also available: 🔗 Google Slides (January 2024)
Content: AI and ML in T&S (generative AI, detection, red-teaming); AR/VR harm areas; emerging platforms and harm surfaces; T&S career pathways; what skills employers look for
Interactive demo: AI-generated content identification exercise (slides include an interactive breakout)
Exercises: 🔗 Emerging Technologies exercises doc

DUE: Podcast Factchecking data + analysis
ASSIGNED: Course paper writeup. Reference: Trust & Safety Journal for graduate-level research directions, and ICWSM / IC2S2 for computational social science.

Reading: T&S reading list Module M on Emerging Technologies

Week 14 — Emerging Technologies II

Lecture 27: Emerging Topics II — Adversarial Retrieval and LLMs

Source: Emerging Topics — Adversarial Retrieval
Content: how adversaries manipulate retrieval-augmented generation (RAG) systems and search indexes; corpus poisoning and gradient-based attacks on IR systems; SEO manipulation as an information operation; connecting the misinformation interventions from Weeks 9–10 to the LLM attack surface; adversarial means, motives, and opportunities in the AI era

Lecture 28: Emerging Topics III — LLM Hallucinations and Knowledge Conflicts

Source: Emerging Topics — LLM Hallucinations and Knowledge Conflicts
Content: faithfulness vs. factuality hallucinations; knowledge conflicts between parametric memory, retrieved context, and ground truth; RLHF safety tuning and jailbreaking; detection and mitigation approaches; implications for T&S practitioners deploying LLM-based moderation or fact-checking systems

Reading: Sample of papers from the special topic reading list on Adversarial Retrieval and LLMs, at least 1 paper per category.

Week 15 — Research Methods and Student Presentations

Lecture 29: Project Presentations I

Format: groups present Discord Bot M3 results (or Podcast Factchecking for individually structured courses); guest judges from industry where possible (see Consortium member list in the README for potential invitees)
~8 minutes per group + Q&A; rubric focuses on policy motivation, technical implementation, testing and evaluation, and ethical reflection

Lecture 30: Project Presentations II + Course Debrief

Format: remaining presentations + open debrief
Discussion: what has changed in T&S since the semester began? What did the class get wrong? What questions remain?
Optional: play Trust & Safety Tycoon as a closing reflection on organizational complexity

DUE: Reports due

Child Safety, Sexual Exploitation, and Platform Well-Being

⚠️ Content Warning: These topics cover deeply sensitive material. They are for further reading only and are not examinable or covered directly in class. They are included here for completeness. If this content causes any distress, please reach out for support. Links to the student centers for mental health and wellbeing are provided.

Child and Adult Sexual Exploitation

Source: 🔗 Google Slides
Content: CSAM definitions and legal landscape; PhotoDNA and hash-matching at scale; NCMEC partnerships and reporting obligations; grooming detection; sextortion; proxy content for classroom exercises
Exercises: 🔗 CASE exercises doc
Short case video: 🔗 CASE case video
Reading: 📘 tsbook ch7 — Child Sexual Exploitation

Suicide, Self-Harm, and Platform Well-Being

Source: 🔗 Google Slides (fall 2023)
Content: safe messaging guidelines; the role of algorithmic amplification in self-harm content; contagion effects; platform design for well-being; tension between supporting at-risk users and removing harmful content; mental health resources for moderators
Supplement: 🎥 45-min lecture by Katherine Keyes (Columbia University)
Exercises: 🔗 Suicide, Self-Harm, and Well-Being exercises doc

Reading Resources

Primary textbook: 📘 tsbook — The book is a living draft; chapters on misinformation, extremism, and emerging tech may be added.
Consortium reading list: 🔗 Full reading list (Google Docs) — organized by module, aligns directly with lecture sequence above.
TSPA curriculum: T&S Fundamentals and Library

Credits

The lectures draw on the Trust & Safety Teaching Consortium materials.
Likewise, assignments are accredited where previous materials have been drawn on.

Active Learning Strategies Summary

Strategy	Week	Description
Moderator Mayhem	1	Individual in-game decisions; debrief reveals policy ambiguity
Dotmocracy	1	Students vote on which abuse types the class covers in depth
"Set It Up"	2	Classify 3 cases against a reference policy; identify clear and unclear violations
Think-pair-share	2, 7	Borderline case analysis; clicker voting to reveal class distribution
Contrasting cases	2	Two policies for the same harm; diagnose where each breaks down
Structured debate	7	Safety vs. censorship; platform deplatforming decisions (e.g., Jan. 6th)
Misinformation Detection Lab	10	Hands-on CT24 check-worthiness classifier (Lecture 20 tutorial session)
Fishbowl (optional)	2–7	Students act out cases; rest of class makes moderation determinations. High risk — requires careful topic selection and opt-out option.
Trust & Safety Tycoon	15	Closing game on organizational tradeoffs
Full-course exercise	1–15	🔗 Full-course exercise doc from Consortium Introduction module

Curriculum