Course Overview
Is this class for me?
Interests
- Ambiguities in ethics and platform policy → Try Moderator Mayhem
- Complexity of organizational problems and tradeoffs associated with trust & safety → Try Trust & Safety Tycoon
Prerequisites
- Requires coding in Python; intro CS background assumed
- For research components, familiarity with at least one of:
- HCI / user studies
- ML methods
- Data science and statistics
Alternative Special Topics
If you find your interest lies in a specific subtopic of Trust & Safety, checkout one of the related special topics to this class:
Learning Outcomes
Overall T&S:
- An understanding of the most pressing challenges for online global communication platforms
- Foundational knowledge of current research in Online Trust and Safety
- A draft policy proposal and a working implementation of that proposal
Content Moderation:
- Understand the breadth of models for content moderation (reactive, proactive, community-governed, automated)
- Conceptualize different approaches to moderating a space and reflect on how these models could evolve
- Critically assess how automated content moderation mechanisms handle borderline and ambiguous cases
Algorithmic Tradeoffs:
- Identify critical issues and ethical dilemmas in algorithmic systems, especially in T&S
- Analyze technical, social, and policy-based responses to online harms (misinformation, extremist content, harassment, and others)
- Develop in-depth knowledge of at least one selected harm type across the policy, technical, and organizational dimensions
Organizational Dynamics:
- Understand platform T&S operations through the following categories:
- Account vs. content moderation
- Methods of Access vs. Harm
- Organizational vs. technical complexity (Content-Neutral Outcomes)
Assignments
The three assignments are designed to build on one another: reactive moderation → proactive moderation → applied research.
1. Assignment 1 - Discord Bot — Reactive Moderation
Source: Trust and Safety Engineering (Stanford CS152) + Cornell Tech CS 5342
A three-milestone project in which you act as the T&S team at a social media platform:
- M1 (individual, Week 4): Abuse Research Report (2000–4000 words) covering one abuse type: description, actor/victim profiles, details, relevant technologies, and specific recommendations. Plus a Policy Comparison Table for three platforms.
- M2 (group, Week 6): Design and implement a user reporting flow and behind-the-scenes moderator flow as a Discord bot in Python.
- M3 (group, Week 6, extra credit): Extend the bot with automated detection — a classifier trained or prompted on your chosen abuse type.
Full spec: Assignments/Discord Bot/Discord Bot.md
2. Assignment 2 - Bluesky Moderation — Proactive Moderation
Source: Cornell Tech CS 5342
Build a Bluesky labeler — a service that attaches categorical labels to posts and accounts. Users who subscribe to your labeler can configure how labels affect what they see.
- M1: Labeler setup (AT Protocol, Bluesky account, starter code)
- M2: Label posts matching T&S-related words and domains (text matching)
- M3: Label posts linking to specific news sources (domain matching)
- M4: Label dog photos using perceptual hashing (image matching)
- M5 (policy proposal, Week 10): Extend your labeler to handle a harm of your choice; document your process, testing, and ethical analysis in a 10-minute video
Full spec: Assignments/Bluesky Moderation/Bluesky Moderation.md
Starter code: Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-assign3/
3. Assignment 3 - Podcast Factchecking — Applied T&S Research
Source: Irmetova, Liu, Teleki, Carragher, Zhang, & Caverlee (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts.
Collect and analyze podcast data through the lens of a fact-checking or trust-and-safety application. The reference implementation (PodChecker) provides a claim-extraction and credibility-analysis pipeline; students may extend, replicate, or critically analyze it using a different dataset or harm type.
Full spec: Assignments/Podcast Factchecking/Podcast Factchecking.md
Code: Assignments/Podcast Factchecking/PodChecker/
Curriculum
Texts
- Working textbook: tsbook (Stamos, Grossman, Pfefferkorn) — available chapters: Introduction, Spam/Fraud, Harassment, Child Sexual Exploitation
- TSPA resources: Handbook and Library
- Consortium: reading list
Lectures
The course lectures are modeled on the Teaching Trust & Safety Consortium.
See Curriculum for the full week-by-week schedule.
| Wk | Topic | Slides | Assignment Milestone |
|---|---|---|---|
| 1 | Foundations of T&S | 01 Introduction · 02 Large Scale | |
| 2 | Content Policy + Binary Classification | Pitfalls of Binary Classification · 04 Content Moderation | Discord Bot M1 assigned |
| 3 | Content Moderation at Scale + Community Governance | 04 Content Moderation (cont.) | |
| 4 | Reactive Moderation + Discord Lab + Metrics | 03 Metrics | Discord Bot M1 due; M2 assigned |
| 5 | Government Regulation + Legal Frameworks | 07 Government Regulation | |
| 6 | Proactive Moderation + Bluesky Intro | Bluesky lecture notebook | Discord Bot M2 due; Bluesky M1–M4 assigned |
| 7 | Harassment and Hate Speech | 06 Harassment | |
| 8 | Terrorism, Radicalization, Extremism | 05 Terrorism | |
| 9 | Platform Manipulation + Identity | 06 Authentication | Bluesky M1–M4 due; M5 assigned |
| 10 | Misinformation I: Definitions + Detection Tutorial | 08 Misinformation · 09 Detection Tutorial | Bluesky M5 due; Podcast assigned |
| 11 | Misinformation II: Source Credibility + Interventions | 10 Source Credibility · 11 Interventions | |
| 12 | Attack Surfaces + Adversarial Adaptation | 12 Adversarial Adaptation · 12 Attack Surfaces | |
| 13 | Emerging Technologies I: AI in T&S | 12 Attack Surfaces (cont.) · 13 Emerging Topics I | Podcast data + analysis due |
| 14 | Emerging Technologies II–III: Adversarial Retrieval · LLM Hallucinations | 14 Emerging Topics II · 15 Emerging Topics III | |
| 15 | Research Methods + Presentations | — | Podcast report due |
A Note on Sensitive Content
This course covers material that many students will find disturbing or personally resonant, including hate speech (Week 7) and terrorism (Week 8). Materials on child exploitation and suicide/self-harm are included as optional further reading and are not covered directly in class. Following the Consortium's guidance:
- Topics are announced at least one week in advance
- Students may skip any class covering sensitive content and associated readings without penalty
- Assignments and homeworks cannot be skipped, whoever students may choose the area of research for these
- Written summaries of key policy points are provided as alternatives for skipped sessions
- If you find course content affecting your well-being, please reach out — resources include the course teaching