Course Overview

Is this class for me?

Interests

Ambiguities in ethics and platform policy → Try Moderator Mayhem
Complexity of organizational problems and tradeoffs associated with trust & safety → Try Trust & Safety Tycoon

Prerequisites

Requires coding in Python; intro CS background assumed
For research components, familiarity with at least one of:
- HCI / user studies
- ML methods
- Data science and statistics

Alternative Special Topics

If you find your interest lies in a specific subtopic of Trust & Safety, checkout one of the related special topics to this class:

Learning Outcomes

Overall T&S:

An understanding of the most pressing challenges for online global communication platforms
Foundational knowledge of current research in Online Trust and Safety
A draft policy proposal and a working implementation of that proposal

Content Moderation:

Understand the breadth of models for content moderation (reactive, proactive, community-governed, automated)
Conceptualize different approaches to moderating a space and reflect on how these models could evolve
Critically assess how automated content moderation mechanisms handle borderline and ambiguous cases

Algorithmic Tradeoffs:

Identify critical issues and ethical dilemmas in algorithmic systems, especially in T&S
Analyze technical, social, and policy-based responses to online harms (misinformation, extremist content, harassment, and others)
Develop in-depth knowledge of at least one selected harm type across the policy, technical, and organizational dimensions

Organizational Dynamics:

Understand platform T&S operations through the following categories:
- Account vs. content moderation
- Methods of Access vs. Harm
- Organizational vs. technical complexity (Content-Neutral Outcomes)

Assignments

The three assignments are designed to build on one another: reactive moderation → proactive moderation → applied research.

1. Assignment 1 - Discord Bot — Reactive Moderation

Source: Trust and Safety Engineering (Stanford CS152) + Cornell Tech CS 5342

A three-milestone project in which you act as the T&S team at a social media platform:

M1 (individual, Week 4): Abuse Research Report (2000–4000 words) covering one abuse type: description, actor/victim profiles, details, relevant technologies, and specific recommendations. Plus a Policy Comparison Table for three platforms.
M2 (group, Week 6): Design and implement a user reporting flow and behind-the-scenes moderator flow as a Discord bot in Python.
M3 (group, Week 6, extra credit): Extend the bot with automated detection — a classifier trained or prompted on your chosen abuse type.

Full spec: Assignments/Discord Bot/Discord Bot.md

2. Assignment 2 - Bluesky Moderation — Proactive Moderation

Source: Cornell Tech CS 5342

Build a Bluesky labeler — a service that attaches categorical labels to posts and accounts. Users who subscribe to your labeler can configure how labels affect what they see.

M1: Labeler setup (AT Protocol, Bluesky account, starter code)
M2: Label posts matching T&S-related words and domains (text matching)
M3: Label posts linking to specific news sources (domain matching)
M4: Label dog photos using perceptual hashing (image matching)
M5 (policy proposal, Week 10): Extend your labeler to handle a harm of your choice; document your process, testing, and ethical analysis in a 10-minute video

Full spec: Assignments/Bluesky Moderation/Bluesky Moderation.md
Starter code: Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-assign3/

3. Assignment 3 - Podcast Factchecking — Applied T&S Research

Source: Irmetova, Liu, Teleki, Carragher, Zhang, & Caverlee (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts.

Collect and analyze podcast data through the lens of a fact-checking or trust-and-safety application. The reference implementation (PodChecker) provides a claim-extraction and credibility-analysis pipeline; students may extend, replicate, or critically analyze it using a different dataset or harm type.

Full spec: Assignments/Podcast Factchecking/Podcast Factchecking.md
Code: Assignments/Podcast Factchecking/PodChecker/

Curriculum

Texts

Working textbook: tsbook (Stamos, Grossman, Pfefferkorn) — available chapters: Introduction, Spam/Fraud, Harassment, Child Sexual Exploitation
TSPA resources: Handbook and Library
Consortium: reading list

Lectures

The course lectures are modeled on the Teaching Trust & Safety Consortium.
See Curriculum for the full week-by-week schedule.

Wk	Topic	Slides	Assignment Milestone
1	Foundations of T&S	01 Introduction · 02 Large Scale
2	Content Policy + Binary Classification	Pitfalls of Binary Classification · 04 Content Moderation	Discord Bot M1 assigned
3	Content Moderation at Scale + Community Governance	04 Content Moderation (cont.)
4	Reactive Moderation + Discord Lab + Metrics	03 Metrics	Discord Bot M1 due; M2 assigned
5	Government Regulation + Legal Frameworks	07 Government Regulation
6	Proactive Moderation + Bluesky Intro	Bluesky lecture notebook	Discord Bot M2 due; Bluesky M1–M4 assigned
7	Harassment and Hate Speech	06 Harassment
8	Terrorism, Radicalization, Extremism	05 Terrorism
9	Platform Manipulation + Identity	06 Authentication	Bluesky M1–M4 due; M5 assigned
10	Misinformation I: Definitions + Detection Tutorial	08 Misinformation · 09 Detection Tutorial	Bluesky M5 due; Podcast assigned
11	Misinformation II: Source Credibility + Interventions	10 Source Credibility · 11 Interventions
12	Attack Surfaces + Adversarial Adaptation	12 Adversarial Adaptation · 12 Attack Surfaces
13	Emerging Technologies I: AI in T&S	12 Attack Surfaces (cont.) · 13 Emerging Topics I	Podcast data + analysis due
14	Emerging Technologies II–III: Adversarial Retrieval · LLM Hallucinations	14 Emerging Topics II · 15 Emerging Topics III
15	Research Methods + Presentations	—	Podcast report due

A Note on Sensitive Content

This course covers material that many students will find disturbing or personally resonant, including hate speech (Week 7) and terrorism (Week 8). Materials on child exploitation and suicide/self-harm are included as optional further reading and are not covered directly in class. Following the Consortium's guidance:

Topics are announced at least one week in advance
Students may skip any class covering sensitive content and associated readings without penalty
Assignments and homeworks cannot be skipped, whoever students may choose the area of research for these
Written summaries of key policy points are provided as alternatives for skipped sessions
If you find course content affecting your well-being, please reach out — resources include the course teaching