Course Overview

Is this class for me?

Interests

Prerequisites

  • Requires coding in Python; intro CS background assumed
  • For research components, familiarity with at least one of:
    • HCI / user studies
    • ML methods
    • Data science and statistics

Alternative Special Topics

If you find your interest lies in a specific subtopic of Trust & Safety, checkout one of the related special topics to this class:


Learning Outcomes

Overall T&S:

  • An understanding of the most pressing challenges for online global communication platforms
  • Foundational knowledge of current research in Online Trust and Safety
  • A draft policy proposal and a working implementation of that proposal

Content Moderation:

  • Understand the breadth of models for content moderation (reactive, proactive, community-governed, automated)
  • Conceptualize different approaches to moderating a space and reflect on how these models could evolve
  • Critically assess how automated content moderation mechanisms handle borderline and ambiguous cases

Algorithmic Tradeoffs:

  • Identify critical issues and ethical dilemmas in algorithmic systems, especially in T&S
  • Analyze technical, social, and policy-based responses to online harms (misinformation, extremist content, harassment, and others)
  • Develop in-depth knowledge of at least one selected harm type across the policy, technical, and organizational dimensions

Organizational Dynamics:

  • Understand platform T&S operations through the following categories:
    • Account vs. content moderation
    • Methods of Access vs. Harm
    • Organizational vs. technical complexity (Content-Neutral Outcomes)

Assignments

The three assignments are designed to build on one another: reactive moderation → proactive moderation → applied research.

1. Assignment 1 - Discord Bot — Reactive Moderation

Source: Trust and Safety Engineering (Stanford CS152) + Cornell Tech CS 5342

A three-milestone project in which you act as the T&S team at a social media platform:

  • M1 (individual, Week 4): Abuse Research Report (2000–4000 words) covering one abuse type: description, actor/victim profiles, details, relevant technologies, and specific recommendations. Plus a Policy Comparison Table for three platforms.
  • M2 (group, Week 6): Design and implement a user reporting flow and behind-the-scenes moderator flow as a Discord bot in Python.
  • M3 (group, Week 6, extra credit): Extend the bot with automated detection — a classifier trained or prompted on your chosen abuse type.

Full spec: Assignments/Discord Bot/Discord Bot.md

2. Assignment 2 - Bluesky Moderation — Proactive Moderation

Source: Cornell Tech CS 5342

Build a Bluesky labeler — a service that attaches categorical labels to posts and accounts. Users who subscribe to your labeler can configure how labels affect what they see.

  • M1: Labeler setup (AT Protocol, Bluesky account, starter code)
  • M2: Label posts matching T&S-related words and domains (text matching)
  • M3: Label posts linking to specific news sources (domain matching)
  • M4: Label dog photos using perceptual hashing (image matching)
  • M5 (policy proposal, Week 10): Extend your labeler to handle a harm of your choice; document your process, testing, and ethical analysis in a 10-minute video

Full spec: Assignments/Bluesky Moderation/Bluesky Moderation.md
Starter code: Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-assign3/

3. Assignment 3 - Podcast Factchecking — Applied T&S Research

Source: Irmetova, Liu, Teleki, Carragher, Zhang, & Caverlee (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts.

Collect and analyze podcast data through the lens of a fact-checking or trust-and-safety application. The reference implementation (PodChecker) provides a claim-extraction and credibility-analysis pipeline; students may extend, replicate, or critically analyze it using a different dataset or harm type.

Full spec: Assignments/Podcast Factchecking/Podcast Factchecking.md
Code: Assignments/Podcast Factchecking/PodChecker/


Curriculum

Texts

  • Working textbook: tsbook (Stamos, Grossman, Pfefferkorn) — available chapters: Introduction, Spam/Fraud, Harassment, Child Sexual Exploitation
  • TSPA resources: Handbook and Library
  • Consortium: reading list

Lectures

The course lectures are modeled on the Teaching Trust & Safety Consortium.
See Curriculum for the full week-by-week schedule.

Wk Topic Slides Assignment Milestone
1 Foundations of T&S 01 Introduction · 02 Large Scale
2 Content Policy + Binary Classification Pitfalls of Binary Classification · 04 Content Moderation Discord Bot M1 assigned
3 Content Moderation at Scale + Community Governance 04 Content Moderation (cont.)
4 Reactive Moderation + Discord Lab + Metrics 03 Metrics Discord Bot M1 due; M2 assigned
5 Government Regulation + Legal Frameworks 07 Government Regulation
6 Proactive Moderation + Bluesky Intro Bluesky lecture notebook Discord Bot M2 due; Bluesky M1–M4 assigned
7 Harassment and Hate Speech 06 Harassment
8 Terrorism, Radicalization, Extremism 05 Terrorism
9 Platform Manipulation + Identity 06 Authentication Bluesky M1–M4 due; M5 assigned
10 Misinformation I: Definitions + Detection Tutorial 08 Misinformation · 09 Detection Tutorial Bluesky M5 due; Podcast assigned
11 Misinformation II: Source Credibility + Interventions 10 Source Credibility · 11 Interventions
12 Attack Surfaces + Adversarial Adaptation 12 Adversarial Adaptation · 12 Attack Surfaces
13 Emerging Technologies I: AI in T&S 12 Attack Surfaces (cont.) · 13 Emerging Topics I Podcast data + analysis due
14 Emerging Technologies II–III: Adversarial Retrieval · LLM Hallucinations 14 Emerging Topics II · 15 Emerging Topics III
15 Research Methods + Presentations Podcast report due

A Note on Sensitive Content

This course covers material that many students will find disturbing or personally resonant, including hate speech (Week 7) and terrorism (Week 8). Materials on child exploitation and suicide/self-harm are included as optional further reading and are not covered directly in class. Following the Consortium's guidance:

  • Topics are announced at least one week in advance
  • Students may skip any class covering sensitive content and associated readings without penalty
  • Assignments and homeworks cannot be skipped, whoever students may choose the area of research for these
  • Written summaries of key policy points are provided as alternatives for skipped sessions
  • If you find course content affecting your well-being, please reach out — resources include the course teaching