<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Trust & Safety Class]]></title><description><![CDATA[Obsidian digital garden]]></description><link>http://github.com/dylang/node-rss</link><image><url>site-lib/media/favicon.png</url><title>Trust &amp; Safety Class</title><link></link></image><generator>Webpage HTML Export plugin for Obsidian</generator><lastBuildDate>Fri, 01 May 2026 16:36:40 GMT</lastBuildDate><atom:link href="site-lib/rss.xml" rel="self" type="application/rss+xml"/><pubDate>Fri, 01 May 2026 16:36:35 GMT</pubDate><ttl>60</ttl><dc:creator></dc:creator><item><title><![CDATA[Curriculum]]></title><description><![CDATA[The three course assignments are scaffolded to follow this arc: the Discord Bot covers reactive moderation (Weeks 2–6), the Bluesky Labeler covers proactive moderation (Weeks 6–10), and the Podcast Factchecking project covers applied T&amp;S research (Weeks 10–15).Lecture 1: Introduction to Trust &amp; Safety
Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/01. Introduction_to_Trust_and_Safety.pdf" data-href="Lessons/Slides/01. Introduction_to_Trust_and_Safety.pdf" href="lessons/slides/01.-introduction_to_trust_and_safety.html" class="internal-link" target="_self" rel="noopener nofollow">Introduction to Trust and Safety</a>
Content: purpose and history of T&amp;S; high-level taxonomy of abuse types (CSAM, hate speech, spam, terrorism, self-harm, fraud, etc.); reactive vs. proactive models; building T&amp;S teams; overview of automated technologies
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1pjsrJxWQkv722YeqAU6wNajekc3um4S3/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1pjsrJxWQkv722YeqAU6wNajekc3um4S3/view?usp=sharing" target="_self">20-min lecture on intelligence in T&amp;S by Inbal Goldberger (ActiveFence)</a>
<br>Short case video: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1OWhVDHPRhhmjse5e5r_pFRGjJrSvXyYO/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1OWhVDHPRhhmjse5e5r_pFRGjJrSvXyYO/view?usp=sharing" target="_self">Case video</a> / <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1xKraaHQ7st4OqKwAGuQ7JdL_f2dpvRVL/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1xKraaHQ7st4OqKwAGuQ7JdL_f2dpvRVL/view?usp=sharing" target="_self">Transcript</a>
Lecture 2: Large-Scale T&amp;S Systems in Practice
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/02. Large Scale Trust &amp; Safety Systems.pdf" data-href="Lessons/Slides/02. Large Scale Trust &amp; Safety Systems.pdf" href="lessons/slides/02.-large-scale-trust-&amp;-safety-systems.html" class="internal-link" target="_self" rel="noopener nofollow">Large Scale Trust &amp; Safety Systems</a>
Content: how industry T&amp;S infrastructure is organized at scale; the organizational modules and overarching problems; provides the map for the rest of the semester
Note: this lecture situates everything covered in Lectures 3–30; use it as a semester roadmap that students can refer back to throughout the course.
<br>Reading: 📘 tsbook <a data-tooltip-position="top" aria-label="https://tsbook.org/ch1-introduction/" rel="noopener nofollow" class="external-link is-unresolved" href="https://tsbook.org/ch1-introduction/" target="_self">ch1</a> — "Fighting the Forever War" (Stamos, Grossman, Pfefferkorn)<br>Active Learning: Play <a data-tooltip-position="top" aria-label="https://moderatormayhem.engine.is/" rel="noopener nofollow" class="external-link is-unresolved" href="https://moderatormayhem.engine.is/" target="_self">Moderator Mayhem</a> individually before class; debrief in-class on the ambiguity revealed by the game.<br>
Note on scope: Use <a data-tooltip-position="top" aria-label="https://dotmocracy.org/" rel="noopener nofollow" class="external-link is-unresolved" href="https://dotmocracy.org/" target="_self">Dotmocracy</a> in this first week to have students vote on which abuse types the class will cover in depth. This shapes the harm-specific weeks (Weeks 7–9, 14) and ensures the class addresses what students find most pressing.
Lecture 3: Pitfalls of Binary Classification
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/InteractiveSlides/03. Pitfalls of Binary Classification" data-href="Lessons/InteractiveSlides/03. Pitfalls of Binary Classification" href=".html" class="internal-link" target="_self" rel="noopener nofollow">Pitfalls of Binary Classification</a>
Content: enforcing platform policies against concrete examples; borderline cases; contrasting policies for the same harm; why binary framing breaks down at the margins
Active learning strategies built in: "Set It Up" — classify 3 examples against a reference policy; identify the obvious violation and the obvious non-violation
Think-pair-share — discuss borderline cases in pairs, then share with class (use clickers for voting)
Contrasting cases — given two policies for the same harm, diagnose where each breaks down (false positives, false negatives, ambiguity, multiple thresholds needed) <br>Reference policies: <a data-tooltip-position="top" aria-label="old/Trust &amp; Safety Class Old/Lessons/InteractiveSlides/03. Pitfalls of Binary Classification" data-href="old/Trust &amp; Safety Class Old/Lessons/InteractiveSlides/03. Pitfalls of Binary Classification" href=".html" class="internal-link" target="_self" rel="noopener nofollow">Pitfalls of Binary Classification</a> links to Political Ad Policies case
Lecture 4: Content Moderation I — History, Models, and the Anti-Censorship Ethos
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/04. Content_Moderation.pdf" data-href="Lessons/Slides/04. Content_Moderation.pdf" href="lessons/slides/04.-content_moderation.html" class="internal-link" target="_self" rel="noopener nofollow">Content Moderation</a>
Content: early internet norms; anti-censorship ethos and how it shaped moderation; models for content moderation; framing effects of regulatory language; scale considerations
Assigned: Discord Bot Milestone 1 — Abuse Study and Content Policy (individual; due end of Week 4)<br>Reading: <a data-tooltip-position="top" aria-label="https://www.tspa.org/curriculum/ts-fundamentals/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/curriculum/ts-fundamentals/" target="_self">TSPA T&amp;S Fundamentals handbook</a>; TSPA library selections at instructor discretionLecture 5: Content Moderation II — Commercial Moderators, Community Norms, and Scale
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/04. Content_Moderation.pdf" data-href="Lessons/Slides/04. Content_Moderation.pdf" href="lessons/slides/04.-content_moderation.html" class="internal-link" target="_self" rel="noopener nofollow">Content Moderation</a> (continuation)
Content: what commercial content moderation looks like at platforms like Twitch; moderator actions (remove, warn, ban, shadow block); relationship between moderators and platform admins; the psychological burden on moderators
<br>Exercise: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1Gx9lDmBz1k1WR8pqTBWelISW0h2niH-C3DOv0Fqkk5M/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1Gx9lDmBz1k1WR8pqTBWelISW0h2niH-C3DOv0Fqkk5M/edit?usp=sharing" target="_self">Content Moderation exercises doc</a>
<br>Short case video: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1slH4LVusEilqkmtkAt6UcbPhR3MNcd6H/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1slH4LVusEilqkmtkAt6UcbPhR3MNcd6H/view?usp=sharing" target="_self">Case video</a> / <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1mibJca7DOLJWu1c8zHFjEnt08iDDjIHg/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1mibJca7DOLJWu1c8zHFjEnt08iDDjIHg/view?usp=sharing" target="_self">Lesson plan</a>
Lecture 6: Community Moderation and Self-Governance
<br>Source: 🔗 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1qjB7jacwTUlqoJVuagyfOmjvsqHa5JCu/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1qjB7jacwTUlqoJVuagyfOmjvsqHa5JCu/view?usp=sharing" target="_self">Seering (KAIST) community moderation slides</a>
Content: community-based moderation models (Reddit, Discord, Wikipedia); how rules emerge; when community governance succeeds or fails
<br>Supplement: Seering <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1wjmme3Bw0r5zV-JliV9iYglz-oHUUHqM/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1wjmme3Bw0r5zV-JliV9iYglz-oHUUHqM/view?usp=sharing" target="_self">KAIST syllabus</a> for framing
<br>Reading: TSPA Handbook, <a data-tooltip-position="top" aria-label="https://www.tspa.org/curriculum/ts-fundamentals/content-moderation-and-operations/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/curriculum/ts-fundamentals/content-moderation-and-operations/" target="_self">Content Moderation and Operations</a>Lecture 7: Technical Lab — Discord Bot Setup Workshop
Source: Assignments/Discord Bot/discord_bot_assignment/ starter code
Content: what makes a good user reporting flow; tradeoffs between specificity and usability; the behind-the-scenes moderator review flow; multi-tier review; outcomes (remove, warn, shadow-block, escalate)
Lab: Python async programming; discord.py API; bot initialization; the mod channel pattern; forwarding messages; emoji reactions for moderator workflows
Lab: students fork the repo and get their bots running in their group channels; TAs available
Lecture 8: Metrics and Measurement I — What to Measure and Why
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/03. Metrics_and_Measurement.pdf" data-href="Lessons/Slides/03. Metrics_and_Measurement.pdf" href="lessons/slides/03.-metrics_and_measurement.html" class="internal-link" target="_self" rel="noopener nofollow">Metrics and Measurement</a>
<br>Also available: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1vEJb6CuR8Tyixk7828Yq_URLRxXYoFhaDRDcm8OA87g/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1vEJb6CuR8Tyixk7828Yq_URLRxXYoFhaDRDcm8OA87g/edit?usp=sharing" target="_self">Google Slides (fall 2023)</a>
Content: what is a metric; generic platform metrics (DAU, retention) vs. T&amp;S-specific metrics; defining "success" in T&amp;S; prevalence estimates; common industry metrics; developing new metrics
<br>Exercise: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/193l_Y3Nct15nRhg3O7yn8FvlD32fJhqUO_CPcqg_cjQ/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/193l_Y3Nct15nRhg3O7yn8FvlD32fJhqUO_CPcqg_cjQ/edit?usp=sharing" target="_self">Metrics exercises doc</a> — students design metrics for their chosen Discord Bot abuse type
DUE: Discord Bot M1 — Abuse Study and Content PolicyASSIGNED: Discord Bot M2 — Content Moderation Bot (group; due end of Week 6)
Forming groups: after M1 is submitted, form interdisciplinary groups of 4–5. Ideally mix students with technical and policy backgrounds if the course is cross-listed.
<br>Reading: Metrics exercises doc; <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module CLecture 9: Governments and the Internet I — U.S. Law and Section 230
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/07. Government_Regulation.pdf" data-href="Lessons/Slides/07. Government_Regulation.pdf" href="lessons/slides/07.-government_regulation.html" class="internal-link" target="_self" rel="noopener nofollow">Government Regulation</a>
Content: why the U.S. became the center of the internet economy; overall U.S. approach to internet policy; Section 230 ("the 26 words that created the internet") — text, scope, limits, and contested interpretations
In-class activity: read Section 230 aloud in full; discuss its implications for T&amp;S enforcement
Lecture 10: Governments and the Internet II — International Regulation and Copyright
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/07. Government_Regulation.pdf" data-href="Lessons/Slides/07. Government_Regulation.pdf" href="lessons/slides/07.-government_regulation.html" class="internal-link" target="_self" rel="noopener nofollow">Government Regulation</a> (continuation)
<br>Supplement: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1sh8YGO7fZlvSxPEftmbgjEF8XvJGnuqivb_eSYbi1MA/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1sh8YGO7fZlvSxPEftmbgjEF8XvJGnuqivb_eSYbi1MA/edit?usp=sharing" target="_self">Copyright Safe Harbors and DMCA slides by Justin Francese (U of Oregon)</a>
Content: EU Digital Services Act; Australia's eSafety Commissioner model; notice-and-takedown; copyright and safe harbor; how different regulatory frameworks shape platform behavior
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1KUqjd1mf2G-YeYpB0P1H9ToGMbXOPoPc/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1KUqjd1mf2G-YeYpB0P1H9ToGMbXOPoPc/view?usp=sharing" target="_self">16-min lecture on Australia's eSafety Commissioner</a>
<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module B, The Twenty-Six Words That Created the Internet by Jeff KosseffLecture 11: Proactive Moderation — Classifiers, Hash-Matching, and Labeling Systems
Source: 📓 Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-lecture.ipynb
Content: text classification pipelines; perceptual hashing (PHash, PhotoDNA); the labeler architecture on Bluesky (AppView → Relay → PDS); differences between proactive automated detection and reactive user-reporting
In-class coding demo: run the Bluesky starter code; attach a test label; view it in the Bluesky UI
Lecture 12: Decentralized Moderation and the AT Protocol
Content: federated/decentralized platform architectures; how AT Protocol enables user-configurable filtering; trade-offs between centralized and decentralized moderation; third-party labelers as a governance model
<br>Reference: <a data-tooltip-position="top" aria-label="https://docs.bsky.app/blog/blueskys-moderation-architecture" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.bsky.app/blog/blueskys-moderation-architecture" target="_self">Bluesky moderation architecture docs</a>; <a data-tooltip-position="top" aria-label="https://www.bluesky-labelers.io/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.bluesky-labelers.io/" target="_self">list of labelers</a>
DUE: Discord Bot M2 — Content Moderation Bot. Extra credit: Discord Bot M3 Smart ClassifierASSIGNED: Bluesky Moderation M1–M4 (setup, T&amp;S words, news citation, perceptual hashing; due end of Week 9)Reading: Weapons of Math Destruction by Cathy O'NeilLecture 13: Harassment and Hate Speech I — Definitions, Scale, and Policy
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/06. Harassment_and_Hate_Speech.pdf" data-href="Lessons/Slides/06. Harassment_and_Hate_Speech.pdf" href="lessons/slides/06.-harassment_and_hate_speech.html" class="internal-link" target="_self" rel="noopener nofollow">Harassment and Hate Speech</a>
Content: spectrum of harassment; hate speech definitions and jurisdictional variation; which identities are most targeted; the role of anonymity and pseudonymity; platform policy evolution
Lecture 14: Harassment and Hate Speech II — Automated Detection and Exercises
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/06. Harassment_and_Hate_Speech.pdf" data-href="Lessons/Slides/06. Harassment_and_Hate_Speech.pdf" href="lessons/slides/06.-harassment_and_hate_speech.html" class="internal-link" target="_self" rel="noopener nofollow">Harassment and Hate Speech</a> (continuation)
<br>Supplement: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1Q8UBYDpXC4ixpPEBw9gWH_OpQhAWDbMS/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1Q8UBYDpXC4ixpPEBw9gWH_OpQhAWDbMS/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" target="_self">McLester hate speech and harassment slides (UAB)</a>
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1vrEE1kjftvUKtwbE5oj8Q_7R6pjMWnCK/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1vrEE1kjftvUKtwbE5oj8Q_7R6pjMWnCK/view?usp=sharing" target="_self">25-min lecture on online hate speech by Mark Schneider (UPenn)</a>
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1AMbZfkGPzDPHfKV1n7IGUCplLaWk2GJHxTkNC4N44Mw/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1AMbZfkGPzDPHfKV1n7IGUCplLaWk2GJHxTkNC4N44Mw/edit?usp=sharing" target="_self">Harassment and Hate Speech exercises doc</a>
In-class structured debate: safety (public harm) vs. censorship (freedom of speech); how do different platforms' decisions reflect this tradeoff?
<br>Reading: 📘 tsbook <a data-tooltip-position="top" aria-label="https://tsbook.org/ch6-harassment/" rel="noopener nofollow" class="external-link is-unresolved" href="https://tsbook.org/ch6-harassment/" target="_self">ch6</a> — Harassment (full chapter)Lecture 15: Terrorism, Radicalization, and Extremism
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/05. Terrorism_Radicalization_and_Extremism.pdf" data-href="Lessons/Slides/05. Terrorism_Radicalization_and_Extremism.pdf" href="lessons/slides/05.-terrorism_radicalization_and_extremism.html" class="internal-link" target="_self" rel="noopener nofollow">Terrorism Radicalization and Extremism</a>
<br>Also available: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1kx4ncmCAG_GZkKaISq2R3A6FtXzG66Jy/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1kx4ncmCAG_GZkKaISq2R3A6FtXzG66Jy/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" target="_self">Google Slides (fall 2023)</a>
Content: definitions; radicalization models; online recruitment pipelines; counter-terrorism vs. counter-violent extremism (CVE); the live-streaming of attacks; role of platform algorithms in amplification
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1vQHw8O4nfeJTvR0X7mypjmzaiyVPMMhk/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1vQHw8O4nfeJTvR0X7mypjmzaiyVPMMhk/view?usp=sharing" target="_self">30-min lecture on terrorism, radicalization, and extremism by Marten Risius (U of Queensland)</a>
Lecture 16: CVE, Counter-Speech, and Policy Responses
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/05. Terrorism_Radicalization_and_Extremism.pdf" data-href="Lessons/Slides/05. Terrorism_Radicalization_and_Extremism.pdf" href="lessons/slides/05.-terrorism_radicalization_and_extremism.html" class="internal-link" target="_self" rel="noopener nofollow">Terrorism Radicalization and Extremism</a> (exercises + discussion)
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1MANATPIXk-8Rrquqdec8eijnGvJPL1r0Ob1FYt1YHZE/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1MANATPIXk-8Rrquqdec8eijnGvJPL1r0Ob1FYt1YHZE/edit?usp=sharing" target="_self">Terrorism, Radicalization, and Extremism exercises doc</a>
Case study: GIFCT hash-sharing database; Jan. 6th Capitol Riot deplatforming decisions
<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module G on Terrorism.Lecture 17: Authentication, Identity, and Platform Manipulation
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/06. Authentication, Identity, and Platform Manipulation.pdf" data-href="Lessons/Slides/06. Authentication, Identity, and Platform Manipulation.pdf" href="lessons/slides/06.-authentication,-identity,-and-platform-manipulation.html" class="internal-link" target="_self" rel="noopener nofollow">Authentication, Identity, and Platform Manipulation</a>
Content: authentication models; real-name vs. pseudonymous policies; coordinated inauthentic behavior (CIB); sockpuppet networks; astroturfing; state-sponsored information operations
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1Wa9G4fGWTT6oAaql3egv5-yY3jCJx75S5na_kbzOvaM/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1Wa9G4fGWTT6oAaql3egv5-yY3jCJx75S5na_kbzOvaM/edit?usp=sharing" target="_self">Auth/Identity exercises doc</a>
<br>Supplement: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1ZLeB4ApU1ZX0t_I2dCtTxiAe23IRJueg/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1ZLeB4ApU1ZX0t_I2dCtTxiAe23IRJueg/edit?usp=sharing&amp;ouid=108096568393170327223&amp;rtpof=true&amp;sd=true" target="_self">McLester investigations and intelligence lecture</a>
Lecture 18: Spam, Fraud, and Account Integrity
Content: spam taxonomy; online fraud (scams, phishing, impersonation); account takeovers; the arms race between spammers and platforms; ML-based abuse detection at account level
Reading: 📘 tsbook ch2 — Spam and Online Fraud (Nelly Agbogu case study is an excellent discussion anchor)
DUE: Bluesky M1–M4 (T&amp;S words labeler, news citation labeler, perceptual hash dog labeler)
ASSIGNED: Bluesky M5 — Policy Proposal Labeler (due end of Week 10)<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module K on Authenticity.Lecture 19: Misinformation — Definitions, Spread, and Platform Responses
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/08. Misinformation (information environment).pdf" data-href="Lessons/Slides/08. Misinformation (information environment).pdf" href="lessons/slides/08.-misinformation-(information-environment).html" class="internal-link" target="_self" rel="noopener nofollow">Misinformation</a>
Content: the mis/dis/mal taxonomy (Wardle &amp; Derakhshan information disorder framework); how false information spreads through social networks; the role of algorithmic amplification; platform interventions (labels, friction, removal, amplification reduction); the fact-checking ecosystem
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1aWOFZ92RAyAyfELRsHnRu0Z0h4cLcLlH/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1aWOFZ92RAyAyfELRsHnRu0Z0h4cLcLlH/view?usp=sharing" target="_self">45-min lecture on misinformation by Sarah Shirazyan (Stanford Law)</a>
Lecture 20: Misinformation Detection Tutorial
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/09. Misinformation Detection Tutorial (IC2S2_25).pdf" data-href="Lessons/Slides/09. Misinformation Detection Tutorial (IC2S2_25).pdf" href="lessons/slides/09.-misinformation-detection-tutorial-(ic2s2_25).html" class="internal-link" target="_self" rel="noopener nofollow">Misinformation Detection Tutorial (IC2S2 2025)</a>
Content: hands-on NLP tutorial — claim-level check-worthiness detection using the CT24 dataset; full pipeline from data loading through feature engineering, classifier training (logistic regression, BERT-based), evaluation (precision, recall, F1), and error analysis; AI-generated misinformation detection techniques; Podcast Factchecking preview using the PodChecker system (Irmetova et al., 2026)
This is a lab-style session; students should have Python and the tutorial dependencies installed before class.
<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module F on the Information Environment.DUE: Bluesky M5
ASSIGNED: Podcast Factchecking (data + analysis due Week 13; final report due Week 15)
Coordinating M5 with earlier work: students should implement the same abuse type they researched for Discord Bot M1, making the policy proposal in M5 a direct technical extension of the written analysis from Week 4. Consider also requiring students to apply a counter-intervention framing from Lecture 21 (Adversarial Adaptation) to their M5 policy design.
Lecture 21: Source Credibility and Misinformation Source Detection
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/10. Source Credibility.pdf" data-href="Lessons/Slides/10. Source Credibility.pdf" href="lessons/slides/10.-source-credibility.html" class="internal-link" target="_self" rel="noopener nofollow">Source Credibility</a>
Content: what makes a source credible; SEO-based misinformation source detection using CommonCrawl webgraphs; backlinking patterns as credibility signals; multi-class classification of news domains; feature importances for predicting credibility vs. political reliability; limitations (implied content, domain decay, propaganda vs. opinion)
Lecture 22: Intervention Effectiveness — Misinformation and Search Rankings
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/11. Intervention Effectiveness Case Study - Misinformation and Search Rankings.pdf" data-href="Lessons/Slides/11. Intervention Effectiveness Case Study - Misinformation and Search Rankings.pdf" href="lessons/slides/11.-intervention-effectiveness-case-study-misinformation-and-search-rankings.html" class="internal-link" target="_self" rel="noopener nofollow">Intervention Effectiveness — Misinformation and Search Rankings</a>
Content: small-scale PageRank-based interventions; personalized PageRank and authority-based reranking; large-scale link scheme removal; "multi-category" scheme removal as a more precise intervention tool; traffic estimates from CommonCrawl vs. SimilarWeb; design principles for robust interventions; open problems and future directions
<br>Reading: Sample of papers from the <a data-tooltip-position="top" aria-label="Misinformation Syllabus (Advanced Topic) > Readings" data-href="Misinformation Syllabus (Advanced Topic)#Readings" href="advanced-topics/misinformation/misinformation-syllabus-(advanced-topic).html#Readings" class="internal-link" target="_self" rel="noopener nofollow">special topic reading list</a> on Misinformation, at least 1 paper per category.Lecture 23: Adversarial Adaptation and the Limitations of Interventions
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/12. Adversarial Adaptation and the Limitations of Interventions.pdf" data-href="Lessons/Slides/12. Adversarial Adaptation and the Limitations of Interventions.pdf" href="lessons/slides/12.-adversarial-adaptation-and-the-limitations-of-interventions.html" class="internal-link" target="_self" rel="noopener nofollow">Adversarial Adaptation and the Limitations of Interventions</a>
Content: how adversaries adapt to interventions over time (SEO gaming, platform manipulation, bot evolution); the credibility–pluralism tradeoff — credibility-based filtering reduces source diversity; assortativity in news transition matrices; Wasserstein distances to quantify polarization effects; principles for adversarially robust policy design
This lecture directly sets up the proactive moderation problem introduced in Week 11: reactive interventions are always lagging, which motivates automated proactive systems.
Lecture 24: Types of Attack Surfaces I — Safety Perspective
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/12. Types of Attack Surfaces.pdf" data-href="Lessons/Slides/12. Types of Attack Surfaces.pdf" href="lessons/slides/12.-types-of-attack-surfaces.html" class="internal-link" target="_self" rel="noopener nofollow">Types of Attack Surfaces</a>
Content: attack surface taxonomy; how bad actors exploit platform features; API abuse; content injection; account compromise vectors
<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module L on Attack SurfacesLecture 25: Types of Attack Surfaces II — Security Perspective
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/12. Types of Attack Surfaces.pdf" data-href="Lessons/Slides/12. Types of Attack Surfaces.pdf" href="lessons/slides/12.-types-of-attack-surfaces.html" class="internal-link" target="_self" rel="noopener nofollow">Types of Attack Surfaces</a> (continuation)
Content: platform defenses; CAPTCHAs and bot detection; rate limiting; shadow-banning; detection pipelines
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1EWQEVh4poRkINWgge4zsJmHp0J8Bni23ovc2j7ktyW4/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1EWQEVh4poRkINWgge4zsJmHp0J8Bni23ovc2j7ktyW4/edit?usp=sharing" target="_self">Attack Surfaces exercises doc</a>
Lecture 26: Emerging Topics I — AI in Trust and Safety
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/13. Emerging_Topics 1 - AI in Trust and Safety.pdf" data-href="Lessons/Slides/13. Emerging_Topics 1 - AI in Trust and Safety.pdf" href="lessons/slides/13.-emerging_topics-1-ai-in-trust-and-safety.html" class="internal-link" target="_self" rel="noopener nofollow">Emerging Topics — AI in Trust and Safety</a>
<br>Also available: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1L7nG9QpmxPsyapbaSktHmK4eOrfXHCvpxgmURetR0Cs/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1L7nG9QpmxPsyapbaSktHmK4eOrfXHCvpxgmURetR0Cs/edit?usp=sharing" target="_self">Google Slides (January 2024)</a>
Content: AI and ML in T&amp;S (generative AI, detection, red-teaming); AR/VR harm areas; emerging platforms and harm surfaces; T&amp;S career pathways; what skills employers look for
Interactive demo: AI-generated content identification exercise (slides include an interactive breakout)
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1sUhWuNvlOtxQPIHIGNEKWblvOWXTGKw1FAkl0vXhCEI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1sUhWuNvlOtxQPIHIGNEKWblvOWXTGKw1FAkl0vXhCEI/edit?usp=sharing" target="_self">Emerging Technologies exercises doc</a>
DUE: Podcast Factchecking data + analysis<br>
ASSIGNED: Course paper writeup. Reference: <a data-tooltip-position="top" aria-label="https://trustandsafetyjournal@stanford.edu" rel="noopener nofollow" class="external-link is-unresolved" href="https://trustandsafetyjournal@stanford.edu" target="_self">Trust &amp; Safety Journal</a> for graduate-level research directions, and ICWSM / IC2S2 for computational social science.<br>Reading: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">T&amp;S reading list</a> Module M on Emerging TechnologiesLecture 27: Emerging Topics II — Adversarial Retrieval and LLMs
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/14. Emerging_Topics 2 - Adversarial Retrieval.pdf" data-href="Lessons/Slides/14. Emerging_Topics 2 - Adversarial Retrieval.pdf" href="lessons/slides/14.-emerging_topics-2-adversarial-retrieval.html" class="internal-link" target="_self" rel="noopener nofollow">Emerging Topics — Adversarial Retrieval</a>
Content: how adversaries manipulate retrieval-augmented generation (RAG) systems and search indexes; corpus poisoning and gradient-based attacks on IR systems; SEO manipulation as an information operation; connecting the misinformation interventions from Weeks 9–10 to the LLM attack surface; adversarial means, motives, and opportunities in the AI era
Lecture 28: Emerging Topics III — LLM Hallucinations and Knowledge Conflicts
<br>Source: <a data-tooltip-position="top" aria-label="Lessons/Slides/15. Emerging_Topics 3 - LLM Hallucinations and Knowledge Conflicts.pdf" data-href="Lessons/Slides/15. Emerging_Topics 3 - LLM Hallucinations and Knowledge Conflicts.pdf" href="lessons/slides/15.-emerging_topics-3-llm-hallucinations-and-knowledge-conflicts.html" class="internal-link" target="_self" rel="noopener nofollow">Emerging Topics — LLM Hallucinations and Knowledge Conflicts</a>
Content: faithfulness vs. factuality hallucinations; knowledge conflicts between parametric memory, retrieved context, and ground truth; RLHF safety tuning and jailbreaking; detection and mitigation approaches; implications for T&amp;S practitioners deploying LLM-based moderation or fact-checking systems
<br>Reading: Sample of papers from the <a data-tooltip-position="top" aria-label="Adversarial Retrieval and LLMs Syllabus (Advanced Topic) > Readings" data-href="Adversarial Retrieval and LLMs Syllabus (Advanced Topic)#Readings" href="advanced-topics/adversarial-retrieval-and-llms/adversarial-retrieval-and-llms-syllabus-(advanced-topic).html#Readings" class="internal-link" target="_self" rel="noopener nofollow">special topic reading list</a> on Adversarial Retrieval and LLMs, at least 1 paper per category.Lecture 29: Project Presentations I
Format: groups present Discord Bot M3 results (or Podcast Factchecking for individually structured courses); guest judges from industry where possible (see Consortium member list in the README for potential invitees)
~8 minutes per group + Q&amp;A; rubric focuses on policy motivation, technical implementation, testing and evaluation, and ethical reflection
Lecture 30: Project Presentations II + Course Debrief
Format: remaining presentations + open debrief
Discussion: what has changed in T&amp;S since the semester began? What did the class get wrong? What questions remain?
<br>Optional: play <a data-tooltip-position="top" aria-label="https://trustandsafety.fun/" rel="noopener nofollow" class="external-link is-unresolved" href="https://trustandsafety.fun/" target="_self">Trust &amp; Safety Tycoon</a> as a closing reflection on organizational complexity
DUE: Reports due
⚠️ Content Warning: These topics cover deeply sensitive material. They are for further reading only and are not examinable or covered directly in class. They are included here for completeness. If this content causes any distress, please reach out for support. Links to the student centers for mental health and wellbeing are provided.
Child and Adult Sexual Exploitation
<br>Source: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1WCbLjWgXPX-Lrfyw86C1oGQmvPMI-3iwpWLvPAiAdws/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1WCbLjWgXPX-Lrfyw86C1oGQmvPMI-3iwpWLvPAiAdws/edit?usp=sharing" target="_self">Google Slides</a> Content: CSAM definitions and legal landscape; PhotoDNA and hash-matching at scale; NCMEC partnerships and reporting obligations; grooming detection; sextortion; proxy content for classroom exercises
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1n1-7Tbr126lLLlKBLhR-I8RfQlixz7ejL602k0c2g9s/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1n1-7Tbr126lLLlKBLhR-I8RfQlixz7ejL602k0c2g9s/edit?usp=sharing" target="_self">CASE exercises doc</a>
<br>Short case video: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1vsGgr-NkUZUB9FVi7yT0iHmg5a9E_arshz8DHCLGf-Y/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1vsGgr-NkUZUB9FVi7yT0iHmg5a9E_arshz8DHCLGf-Y/edit?usp=sharing" target="_self">CASE case video</a>
<br>Reading: 📘 tsbook <a data-tooltip-position="top" aria-label="https://tsbook.org/ch7-cse/" rel="noopener nofollow" class="external-link is-unresolved" href="https://tsbook.org/ch7-cse/" target="_self">ch7</a> — Child Sexual Exploitation
Suicide, Self-Harm, and Platform Well-Being
<br>Source: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/1-t9-6NuN6CfHHEFCyZcWbjDwEKiXthe46xVO3Jca2Z4/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/1-t9-6NuN6CfHHEFCyZcWbjDwEKiXthe46xVO3Jca2Z4/edit?usp=sharing" target="_self">Google Slides (fall 2023)</a>
Content: safe messaging guidelines; the role of algorithmic amplification in self-harm content; contagion effects; platform design for well-being; tension between supporting at-risk users and removing harmful content; mental health resources for moderators
<br>Supplement: 🎥 <a data-tooltip-position="top" aria-label="https://drive.google.com/file/d/1C3d5EJIezj5xFooI1x3S53iWuAXqbg1q/view?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://drive.google.com/file/d/1C3d5EJIezj5xFooI1x3S53iWuAXqbg1q/view?usp=sharing" target="_self">45-min lecture by Katherine Keyes (Columbia University)</a>
<br>Exercises: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1AIBPHTRaEHMpB7yVtK7Njh1W4wlEz0EQ3oKGDI1uEKU/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1AIBPHTRaEHMpB7yVtK7Njh1W4wlEz0EQ3oKGDI1uEKU/edit?usp=sharing" target="_self">Suicide, Self-Harm, and Well-Being exercises doc</a> <br>Primary textbook: 📘 <a data-tooltip-position="top" aria-label="https://tsbook.org/" rel="noopener nofollow" class="external-link is-unresolved" href="https://tsbook.org/" target="_self">tsbook</a> — The book is a living draft; chapters on misinformation, extremism, and emerging tech may be added.
<br>Consortium reading list: 🔗 <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">Full reading list (Google Docs)</a> — organized by module, aligns directly with lecture sequence above.
<br>TSPA curriculum: <a data-tooltip-position="top" aria-label="https://www.tspa.org/curriculum/ts-fundamentals/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/curriculum/ts-fundamentals/" target="_self">T&amp;S Fundamentals</a> and <a data-tooltip-position="top" aria-label="https://www.tspa.org/explore/trust-safety-library/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/explore/trust-safety-library/" target="_self">Library</a>
<br>The lectures draw on the <a data-tooltip-position="top" aria-label="https://github.com/PeterCarragher/TeachingTrustSafety" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/PeterCarragher/TeachingTrustSafety" target="_self">Trust &amp; Safety Teaching Consortium</a> materials.
Likewise, assignments are accredited where previous materials have been drawn on.]]></description><link>curriculum.html</link><guid isPermaLink="false">Curriculum.md</guid><pubDate>Fri, 01 May 2026 16:25:22 GMT</pubDate></item><item><title><![CDATA[Course Overview]]></title><description><![CDATA[
Ambiguities in ethics and platform policy → Try <a data-tooltip-position="top" aria-label="Assessing T&amp;S Policies" data-href="Assessing T&amp;S Policies" href="quizzes/assessing-t&amp;s-policies.html" class="internal-link" target="_self" rel="noopener nofollow">Moderator Mayhem</a>
<br>Complexity of organizational problems and tradeoffs associated with trust &amp; safety → Try <a data-tooltip-position="top" aria-label="https://trustandsafety.fun/" rel="noopener nofollow" class="external-link is-unresolved" href="https://trustandsafety.fun/" target="_self">Trust &amp; Safety Tycoon</a> Requires coding in Python; intro CS background assumed
For research components, familiarity with at least one of: HCI / user studies
ML methods
Data science and statistics If you find your interest lies in a specific subtopic of Trust &amp; Safety, checkout one of the related special topics to this class:
<br><a data-tooltip-position="top" aria-label="Misinformation Syllabus (Advanced Topic)" data-href="Misinformation Syllabus (Advanced Topic)" href="advanced-topics/misinformation/misinformation-syllabus-(advanced-topic).html" class="internal-link" target="_self" rel="noopener nofollow">Misinformation</a>
<br><a data-tooltip-position="top" aria-label="Social Network Analysis Syllabus (Advanced Topic)" data-href="Social Network Analysis Syllabus (Advanced Topic)" href="advanced-topics/social-network-analysis/social-network-analysis-syllabus-(advanced-topic).html" class="internal-link" target="_self" rel="noopener nofollow">Social Network Analysis</a>
<br><a data-tooltip-position="top" aria-label="Adversarial Retrieval and LLMs Syllabus (Advanced Topic)" data-href="Adversarial Retrieval and LLMs Syllabus (Advanced Topic)" href="advanced-topics/adversarial-retrieval-and-llms/adversarial-retrieval-and-llms-syllabus-(advanced-topic).html" class="internal-link" target="_self" rel="noopener nofollow">Adversarial Retrieval and LLMs</a>
Overall T&amp;S:
An understanding of the most pressing challenges for online global communication platforms
Foundational knowledge of current research in Online Trust and Safety
A draft policy proposal and a working implementation of that proposal
Content Moderation:
Understand the breadth of models for content moderation (reactive, proactive, community-governed, automated)
Conceptualize different approaches to moderating a space and reflect on how these models could evolve
Critically assess how automated content moderation mechanisms handle borderline and ambiguous cases
Algorithmic Tradeoffs:
Identify critical issues and ethical dilemmas in algorithmic systems, especially in T&amp;S
Analyze technical, social, and policy-based responses to online harms (misinformation, extremist content, harassment, and others)
Develop in-depth knowledge of at least one selected harm type across the policy, technical, and organizational dimensions
Organizational Dynamics:
Understand platform T&amp;S operations through the following categories: Account vs. content moderation
Methods of Access vs. Harm
Organizational vs. technical complexity (Content-Neutral Outcomes) The three assignments are designed to build on one another: reactive moderation → proactive moderation → applied research.Source: Trust and Safety Engineering (Stanford CS152) + Cornell Tech CS 5342A three-milestone project in which you act as the T&amp;S team at a social media platform:
M1 (individual, Week 4): Abuse Research Report (2000–4000 words) covering one abuse type: description, actor/victim profiles, details, relevant technologies, and specific recommendations. Plus a Policy Comparison Table for three platforms.
M2 (group, Week 6): Design and implement a user reporting flow and behind-the-scenes moderator flow as a Discord bot in Python.
M3 (group, Week 6, extra credit): Extend the bot with automated detection — a classifier trained or prompted on your chosen abuse type.
Full spec: Assignments/Discord Bot/Discord Bot.mdSource: Cornell Tech CS 5342Build a Bluesky labeler — a service that attaches categorical labels to posts and accounts. Users who subscribe to your labeler can configure how labels affect what they see.
M1: Labeler setup (AT Protocol, Bluesky account, starter code)
M2: Label posts matching T&amp;S-related words and domains (text matching)
M3: Label posts linking to specific news sources (domain matching)
M4: Label dog photos using perceptual hashing (image matching)
M5 (policy proposal, Week 10): Extend your labeler to handle a harm of your choice; document your process, testing, and ethical analysis in a 10-minute video
Full spec: Assignments/Bluesky Moderation/Bluesky Moderation.md
Starter code: Assignments/Bluesky Moderation/bluesky_labeler_assignment/bluesky-assign3/Source: Irmetova, Liu, Teleki, Carragher, Zhang, &amp; Caverlee (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts.Collect and analyze podcast data through the lens of a fact-checking or trust-and-safety application. The reference implementation (PodChecker) provides a claim-extraction and credibility-analysis pipeline; students may extend, replicate, or critically analyze it using a different dataset or harm type.Full spec: Assignments/Podcast Factchecking/Podcast Factchecking.md
Code: Assignments/Podcast Factchecking/PodChecker/
<br>Working textbook: <a data-tooltip-position="top" aria-label="https://tsbook.org/" rel="noopener nofollow" class="external-link is-unresolved" href="https://tsbook.org/" target="_self">tsbook</a> (Stamos, Grossman, Pfefferkorn) — available chapters: Introduction, Spam/Fraud, Harassment, Child Sexual Exploitation
<br>TSPA resources: <a data-tooltip-position="top" aria-label="https://www.tspa.org/curriculum/ts-fundamentals/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/curriculum/ts-fundamentals/" target="_self">Handbook</a> and <a data-tooltip-position="top" aria-label="https://www.tspa.org/explore/trust-safety-library/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/explore/trust-safety-library/" target="_self">Library</a>
<br>Consortium: <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/179j30Z7TxB4b8r_6wPAcAXrJXctbTOqXRtSb-75UvsI/edit?usp=sharing" target="_self">reading list</a>
<br>The course lectures are modeled on the <a data-tooltip-position="top" aria-label="https://github.com/PeterCarragher/TeachingTrustSafety" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/PeterCarragher/TeachingTrustSafety" target="_self">Teaching Trust &amp; Safety Consortium</a>.<br>
See <a data-href="Curriculum" href="curriculum.html" class="internal-link" target="_self" rel="noopener nofollow">Curriculum</a> for the full week-by-week schedule.This course covers material that many students will find disturbing or personally resonant, including hate speech (Week 7) and terrorism (Week 8). Materials on child exploitation and suicide/self-harm are included as optional further reading and are not covered directly in class. Following the Consortium's guidance:
Topics are announced at least one week in advance
Students may skip any class covering sensitive content and associated readings without penalty
Assignments and homeworks cannot be skipped, whoever students may choose the area of research for these
Written summaries of key policy points are provided as alternatives for skipped sessions
If you find course content affecting your well-being, please reach out — resources include the course teaching
]]></description><link>course-overview.html</link><guid isPermaLink="false">Course Overview.md</guid><pubDate>Fri, 01 May 2026 15:21:57 GMT</pubDate></item><item><title><![CDATA[Misinformation Syllabus (Advanced Topic)]]></title><description><![CDATA[This course examines online misinformation and disinformation from interdisciplinary perspectives — drawing on communication studies, political science, cognitive psychology, and computational methods. Lectures move from definitional and theoretical foundations through empirical analysis of spread and vulnerability, to computational detection techniques and platform-level interventions, and finally to the emerging challenge of AI-generated misinformation.For a broader overview of the Trust and Safety space, see the <a data-tooltip-position="top" aria-label="Course Overview" data-href="Course Overview" href="course-overview.html" class="internal-link" target="_self" rel="noopener nofollow">Trust &amp; Safety class</a>.Lecture 1: Defining Misinformation (Consortium Information Environment)
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/01. Define Misinfo (Consortium Information Environment).pdf" data-href="Advanced Topics/Misinformation/Slides/01. Define Misinfo (Consortium Information Environment).pdf" href="advanced-topics/misinformation/slides/01.-define-misinfo-(consortium-information-environment).html" class="internal-link" target="_self" rel="noopener nofollow">Define Misinfo (Consortium Information Environment)</a>
Establishes the foundational vocabulary of the course: the distinctions among misinformation, disinformation, and malinformation; the Wardle &amp; Derakhshan information disorder framework; typologies of false and misleading content; and the information environment as the broader context in which misinformation operates. Students leave with a shared conceptual language for the rest of the course.
Lecture 2: Content Moderation Overview (Consortium)
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/02. Content Moderation Overview (Consortium).pdf" data-href="Advanced Topics/Misinformation/Slides/02. Content Moderation Overview (Consortium).pdf" href="advanced-topics/misinformation/slides/02.-content-moderation-overview-(consortium).html" class="internal-link" target="_self" rel="noopener nofollow">Content Moderation Overview (Consortium)</a>
Introduces how platforms respond to misinformation through reactive and proactive moderation. Covers the spectrum of moderation models (removal, labeling, demotion, counter-speech), the role of human reviewers vs. automated systems, and the inherent tradeoffs between free expression and harm reduction. Provides operational context before the course turns to technical detection.
Lecture 3: Detection and Discovery of Misinformation Sources
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/04. Detection and Discovery of Misinformation Sources.pdf" data-href="Advanced Topics/Misinformation/Slides/04. Detection and Discovery of Misinformation Sources.pdf" href="advanced-topics/misinformation/slides/04.-detection-and-discovery-of-misinformation-sources.html" class="internal-link" target="_self" rel="noopener nofollow">Detection and Discovery of Misinformation Sources</a>
Technical lecture covering how to identify and classify misinformation-producing websites using SEO network features, backlinking patterns, and multi-class classification. Key topics: construction of the SEO network from CommonCrawl data, predictive power of network features over credibility labels, limitations of current approaches (implied content, propaganda vs. opinion, link decay). Establishes the computational approach that underpins Lectures 4 and 5.
Lecture 4: Misinformation Resilient Search Rankings
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/05. Misinformation Resilient Search Rankings.pdf" data-href="Advanced Topics/Misinformation/Slides/05. Misinformation Resilient Search Rankings.pdf" href="advanced-topics/misinformation/slides/05.-misinformation-resilient-search-rankings.html" class="internal-link" target="_self" rel="noopener nofollow">Misinformation Resilient Search Rankings</a>
Builds on the source detection methods from Lecture 3 to ask: how do we intervene at the search level? Covers small-scale interventions (PageRank, Personalized PageRank, authority-based reranking), large-scale interventions targeting link schemes, and the design principles that make interventions robust. Discusses evidence that link schemes disproportionately link to unreliable news and that "multi-category" scheme removal has higher marginal effectiveness.
Lecture 5: Credibility Pluralism Tradeoff
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/06. Credibility Pluralism Tradeoff.pdf" data-href="Advanced Topics/Misinformation/Slides/06. Credibility Pluralism Tradeoff.pdf" href="advanced-topics/misinformation/slides/06.-credibility-pluralism-tradeoff.html" class="internal-link" target="_self" rel="noopener nofollow">Credibility Pluralism Tradeoff</a>
Complicates the intervention story. Using CommonCrawl and GDELT data, this lecture demonstrates that credibility-based filtering and viewpoint diversity (pluralism) are in tension: interventions that reduce low-credibility content tend to reduce the diversity of sources users encounter. Introduces assortativity analysis and Wasserstein distances as tools for measuring polarization in news transition matrices.
Lecture 6: Media Influences — Structural Dimensions of Credibility, Bias, and Ownership
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/07. Media Influences.pdf" data-href="Advanced Topics/Misinformation/Slides/07. Media Influences.pdf" href="advanced-topics/misinformation/slides/07.-media-influences.html" class="internal-link" target="_self" rel="noopener nofollow">Media Influences</a>
Zooms out to the structural and economic dimensions of the media ecosystem: how source credibility, political bias, and corporate ownership interact to shape information quality. Covers multi-agent dynamic scenarios and behavioral determinants of media use. Serves as a research-synthesis and ongoing-work session, best positioned after students have seen the computational interventions of Lectures 3–5.
Tutorial Session: Fact-Checking NLP (IC2S2 2025 Tutorial)
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Misinformation/Slides/03. Misinformation Detection Tutorial (IC2S2_25).pdf" data-href="Advanced Topics/Misinformation/Slides/03. Misinformation Detection Tutorial (IC2S2_25).pdf" href="advanced-topics/misinformation/slides/03.-misinformation-detection-tutorial-(ic2s2_25).html" class="internal-link" target="_self" rel="noopener nofollow">Misinformation Detection Tutorial (IC2S2_25)</a>
Hands-on notebook using the CT24 check-worthiness dataset. Covers the full pipeline: data loading and exploration, feature engineering, training a claim-level check-worthiness classifier, evaluation (precision, recall, F1), and error analysis. Recommended placement: after Lecture 3 (Detection and Discovery), once students have the conceptual vocabulary for detection. Can optionally be extended with the LLM-based detection approaches from the reading list.
<br><a data-tooltip-position="top" aria-label="https://bpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/5/2293/files/2025/04/misinformation-syllabus-2025.pdf" rel="noopener nofollow" class="external-link is-unresolved" href="https://bpb-us-e1.wpmucdn.com/sites.dartmouth.edu/dist/5/2293/files/2025/04/misinformation-syllabus-2025.pdf" target="_self">Dartmouth: Political Misinformation and Conspiracy Theories (Brendan Nyhan)</a><br><a data-tooltip-position="top" aria-label="https://andreasjungherr.net/wp-content/uploads/2025/04/Jungherr-Misinformation-disinformation-and-other-digital-fakery-Summer-2025.pdf" rel="noopener nofollow" class="external-link is-unresolved" href="https://andreasjungherr.net/wp-content/uploads/2025/04/Jungherr-Misinformation-disinformation-and-other-digital-fakery-Summer-2025.pdf" target="_self">Uni Bamberg: Misinformation, Disinformation and Other Digital Fakery (Andreas Jungherr)</a> <a data-tooltip-position="top" aria-label="https://ils.unc.edu/courses/2022_fall/inls690_290/690Syllabus_Fall2022_signed.pdf" rel="noopener nofollow" class="external-link is-unresolved" href="https://ils.unc.edu/courses/2022_fall/inls690_290/690Syllabus_Fall2022_signed.pdf" target="_self">UNC: Misinformation and Society (Francesca Tripodi)</a><br><a data-tooltip-position="top" aria-label="https://citap.unc.edu/publications/critical-disinformation-studies/" rel="noopener nofollow" class="external-link is-unresolved" href="https://citap.unc.edu/publications/critical-disinformation-studies/" target="_self">UNC: Critical Disinformation Studies: A Syllabus</a><br>Zotero (from King et. al., 2025): <a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.zotero.org/groups/5535941/interventions-literature-review/library" target="_self">https://www.zotero.org/groups/5535941/interventions-literature-review/library</a>King, Catherine, Peter Carragher, and Kathleen M. Carley. "Mapping the Scientific Literature on Misinformation Interventions: A Bibliometric Review." Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media. Vol. 2025. 2025.<br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://workshop-proceedings.icwsm.org/pdf/2025_10.pdf" target="_self">https://workshop-proceedings.icwsm.org/pdf/2025_10.pdf</a><br>Aïmeur, Esma, Sabrine Amri, and Gilles Brassard. "Fake news, disinformation and misinformation in social media: a review."&nbsp;Social Network Analysis and Mining&nbsp;13.1 (2023): 30. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1007/s13278-023-01028-5" target="_self">https://doi.org/10.1007/s13278-023-01028-5</a><br>Altay, S., Berriche, M., Heuer, H., Farkas, J., &amp; Rathje, S. (2023). A survey of expert views on misinformation: Definitions, determinants, solutions, and future of the field. Harvard Kennedy School Misinformation Review. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.37016/mr-2020-119" target="_self">https://doi.org/10.37016/mr-2020-119</a><br>Broda, E., &amp; Strömbäck, J. (2024). Misinformation, Disinformation, and Fake News: Lessons from an Interdisciplinary, Systematic Literature Review. Annals of the International Communication Association, 48(2), 139–166. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1080/23808985.2024.2323736" target="_self">https://doi.org/10.1080/23808985.2024.2323736</a><br>Ecker, U. K. H., Tay, L. Q., Roozenbeek, J., van der Linden, S., Cook, J., Oreskes, N., &amp; Lewandowsky, S. (2024). Why misinformation must not be ignored.American Psychologist. Advance online publication. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1037/amp0001448" target="_self">https://doi.org/10.1037/amp0001448</a>&nbsp;<br>Kapantai, E., Christopoulou, A., Berberidis, C., &amp; Peristeras, V. (2020). A systematic literature review on disinformation: Toward a unified taxonomical framework. New Media &amp; Society, 23(5), 1301-1326. <a data-tooltip-position="top" aria-label="https://doi.org/10.1177/1461444820959296" rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/1461444820959296" target="_self"></a><a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/1461444820959296" target="_self">https://doi.org/10.1177/1461444820959296</a><br>Murphy, G., de Saint Laurent, C., Reynolds, M., Aftab, O., Hegarty, K. Sun, Y. &amp; Greene, C. M. (2023). What do we study when we study misinformation? A scoping review of experimental research (2016-2022). Harvard Kennedy School (HKS) Misinformation Review. ttps://<a data-tooltip-position="top" aria-label="http://doi.org/10.37016/mr-2020-130" rel="noopener nofollow" class="external-link is-unresolved" href="http://doi.org/10.37016/mr-2020-130" target="_self">doi.org/10.37016/mr-2020-130</a><br>Pérez-Escolar, M., Lilleker, D., &amp; Tapia-Frade, A. (2023). A systematic literature review of the phenomenon of disinformation and misinformation. Media and communication, 11(2), 76-87. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.17645/mac.v11i2.6453" target="_self">https://doi.org/10.17645/mac.v11i2.6453</a><br>Saeidnia, H. R., Hosseini, E., Lund, B., Tehrani, M. A., Zaker, S., &amp; Molaei, S. (2025). Artificial intelligence in the battle against disinformation and misinformation: A systematic review of challenges and approaches. Knowledge and Information Systems, 67(4), 3139–3158. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1007/s10115-024-02337-7" target="_self">https://doi.org/10.1007/s10115-024-02337-7</a><br>Tandoc Jr. EC. The facts of fake news: A research review. Sociology Compass. 2019; 13:e12724. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/soc4.12724" target="_self">https://doi.org/10.1111/soc4.12724</a><br>Chadwick, A., &amp; Stanyer, J. (2022). Deception as a Bridging Concept in the Study of Disinformation, Misinformation, and Misperceptions: Toward a Holistic Framework. Communication Theory, 32(1), 1–24. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1093/ct/qtab019" target="_self">https://doi.org/10.1093/ct/qtab019</a><br>Freelon, D., &amp; and Wells, C. (2020). Disinformation as Political Communication. Political Communication, 37(2), 145–156. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1080/10584609.2020.1723755" target="_self">https://doi.org/10.1080/10584609.2020.1723755</a><br>Molina, M. D., Sundar, S. S., Le, T., &amp; Lee, D. (2019). “Fake News” Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content. American Behavioral Scientist, 65(2), 180-212. <a data-tooltip-position="top" aria-label="https://doi.org/10.1177/0002764219878224" rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/0002764219878224" target="_self"></a><a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/0002764219878224" target="_self">https://doi.org/10.1177/0002764219878224</a> (Original work published 2021)<br>Starbird, K. (2024). Facts, frames, and (mis) interpretations: understanding rumors as collective sensemaking. Link: <a data-tooltip-position="top" aria-label="https://www.cip.uw.edu/2023/12/06/rumors-collective-sensemaking-kate-starbird/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.cip.uw.edu/2023/12/06/rumors-collective-sensemaking-kate-starbird/" target="_self">Facts, frames, and (mis)interpretations: Understanding rumors as collective sensemaking</a><br>Tandoc, E. C., Lim, Z. W., &amp; Ling, R. (2017). Defining “Fake News”: A typology of scholarly definitions. Digital Journalism, 6(2), 137–153. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1080/21670811.2017.1360143" target="_self">https://doi.org/10.1080/21670811.2017.1360143</a>Wardle, C., &amp; Derakhshan, H. (2017). Information disorder: Toward an interdisciplinary framework for research and policymaking (Vol. 27, pp. 1-107). Strasbourg: Council of Europe.<br>Wu, L., Morstatter, F., Carley, K. M., &amp; Liu, H. (2019). Misinformation in social media: definition, manipulation, and detection. ACM SIGKDD explorations newsletter, 21(2), 80-90. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3373464.3373475" target="_self">https://doi.org/10.1145/3373464.3373475</a><br>Adams, Z., Osman, M., Bechlivanidis, C., &amp; Meder, B. (2023). (Why) Is Misinformation a Problem? Perspectives on Psychological Science, 18(6), 1436-1463. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/17456916221141344" target="_self">https://doi.org/10.1177/17456916221141344</a>&nbsp; (Original work published 2023)<br>Ecker, U., Roozenbeek, J., Van Der Linden, S., Tay, L. Q., Cook, J., Oreskes, N., &amp; Lewandowsky, S. (2024). Misinformation poses a bigger threat to democracy than you might think. Nature, 630(8015), 29-32. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.nature.com/articles/d41586-024-01587-3" target="_self">https://www.nature.com/articles/d41586-024-01587-3</a><br>McKay, S., &amp; Tenove, C. (2020). Disinformation as a Threat to Deliberative Democracy. Political Research Quarterly, 74(3), 703-717. <a data-tooltip-position="top" aria-label="https://doi.org/10.1177/1065912920938143" rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/1065912920938143" target="_self"></a><a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/1065912920938143" target="_self">https://doi.org/10.1177/1065912920938143</a> (Original work published 2021)Woolley, S. C., &amp; Howard, P. N. (2016). Automation, Algorithms, and Politics.pdf| Political Communication, Computational Propaganda, and Autonomous Agents—Introduction. International Journal of Communication, 10(0),<br>Altay, S., Berriche, M., &amp; Acerbi, A. (2023). Misinformation on Misinformation: Conceptual and Methodological Challenges. Social Media + Society, 9(1), 20563051221150412. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/20563051221150412" target="_self">https://doi.org/10.1177/20563051221150412</a><br>Budak, C., Nyhan, B., Rothschild, D. M., Thorson, E., &amp; Watts, D. J. (2024). Misunderstanding the harms of online misinformation. Nature, 630(8015), 45–53. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41586-024-07417-w" target="_self">https://doi.org/10.1038/s41586-024-07417-w</a><br>Harsin, J. (2024). Three Critiques of Disinformation (For-Hire) Scholarship: Definitional Vortexes, Disciplinary Unneighborliness, and Cryptonormativity. Social Media + Society, 10(1). <a data-tooltip-position="top" aria-label="https://doi.org/10.1177/20563051231224732" rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/20563051231224732" target="_self"></a><a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/20563051231224732" target="_self">https://doi.org/10.1177/20563051231224732</a><br>Nyhan, B. (2020). Facts and Myths about Misperceptions. Journal of Economic Perspectives, 34(3), 220–236. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1257/jep.34.3.220" target="_self">https://doi.org/10.1257/jep.34.3.220</a>Pasquetto, I. V., Lim, G., &amp; Bradshaw, S. (2024). Misinformed about misinformation: On the polarizing discourse on misinformation and its consequences for the field. Harvard Kennedy School (HKS) Misinformation Review, 5(5).Simon, F. M., Altay, S., &amp; Mercier, H. (2023). Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown. Harvard Kennedy School Misinformation Review, 4(5).<br>Allen, J., Howland, B., Mobius, M., Rothschild, D., &amp; Watts, D. J. (2020). Evaluating the fake news problem at the scale of the information ecosystem. Science Advances, 6(14), eaay3539. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1126/sciadv.aay3539" target="_self">https://doi.org/10.1126/sciadv.aay3539</a><br>Baribi-Bartov, S., Swire-Thompson, B., &amp; Grinberg, N. (2024). Supersharers of fake news on Twitter. Science, 384(6699), 979–982. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1126/science.adl4435" target="_self">https://doi.org/10.1126/science.adl4435</a><br>Chadwick, A., Vaccari, C., &amp; Kaiser, J. (2022). The Amplification of Exaggerated and False News on Social Media: The Roles of Platform Use, Motivations, Affect, and Ideology. American Behavioral Scientist, 69(2), 113-130. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/00027642221118264" target="_self">https://doi.org/10.1177/00027642221118264</a><br>Goel, P., Green, J., Lazer, D. et al. Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives. Nat Hum Behav (2025). <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-025-02223-4" target="_self">https://doi.org/10.1038/s41562-025-02223-4</a>Ozawa, J. V., Woolley, S., &amp; Lukito, J. (2024). Taking the power back: How diaspora community organizations are fighting misinformation spread on encrypted messaging apps. Harvard Kennedy School Misinformation Review.Pathak, R., Spezzano, F., &amp; Pera, M. S. (2023). Understanding the contribution of recommendation algorithms on misinformation recommendation and misinformation dissemination on social networks. ACM Transactions on the Web, 17(4), 1-26.<br>Renault, T., Mosleh, M., &amp; Rand, D. G. (2025). Republicans are flagged more often than Democrats for sharing misinformation on X’s Community Notes. Proceedings of the National Academy of Sciences, 122(25), e2502053122. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1073/pnas.2502053122" target="_self">https://doi.org/10.1073/pnas.2502053122</a>Tomassi, A., Falegnami, A., &amp; Romano, E. (2024). Mapping automatic social media information disorder. The role of bots and AI in spreading misleading information in society. Plos one, 19(5), e0303183.<br>Vosoughi, S., Roy, D., &amp; Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1126/science.aap9559" target="_self">https://doi.org/10.1126/science.aap9559</a><br>Anspach, N. M., &amp; Carlson, T. N. (2024). Not who you think? Exposure and vulnerability to misinformation. New Media &amp; Society, 26(8), 4847–4866. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/14614448221130422" target="_self">https://doi.org/10.1177/14614448221130422</a><br>Altay, S., &amp; Acerbi, A. (2024). People believe misinformation is a threat because they assume others are gullible. New Media &amp; Society, 26(11), 6440–6461. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/14614448231153379" target="_self">https://doi.org/10.1177/14614448231153379</a><br>Aslett, K., Sanderson, Z., Godel, W., Persily, N., Nagler, J., &amp; Tucker, J. A. (2024). Online searches to evaluate misinformation can increase its perceived veracity. Nature, 625(7995), 548–556. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41586-023-06883-y" target="_self">https://doi.org/10.1038/s41586-023-06883-y</a><br>Ceylan, G., Anderson, I. A., &amp; Wood, W. (2023). Sharing of misinformation is habitual, not just lazy or biased. Proceedings of the National Academy of Sciences, 120(4), e2216614120. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1073/pnas.2216614120" target="_self">https://doi.org/10.1073/pnas.2216614120</a><br>Ecker, U. K. H., Lewandowsky, S., Cook, J., Schmid, P., Fazio, L. K., Brashier, N., Kendeou, P., Vraga, E. K., &amp; Amazeen, M. A. (2022). The psychological drivers of misinformation belief and its resistance to correction. Nature Reviews Psychology, 1(1), 13–29. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s44159-021-00006-y" target="_self">https://doi.org/10.1038/s44159-021-00006-y</a><br>Ecker, U. K. H., Lewandowsky, S., Fenton, O., &amp; Martin, K. (2014). Do people keep believing because they want to? Preexisting attitudes and the continued influence of misinformation. Memory &amp; Cognition, 42(2), 292–304. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.3758/s13421-013-0358-x" target="_self">https://doi.org/10.3758/s13421-013-0358-x</a><br>Flynn, D. j., Nyhan, B., &amp; Reifler, J. (2017). The Nature and Origins of Misperceptions: Understanding False and Unsupported Beliefs About Politics. Political Psychology, 38(S1), 127–150. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/pops.12394" target="_self">https://doi.org/10.1111/pops.12394</a><br>Kunst, J. R., Gundersen, A. B., Krysińska, I., Piasecki, J., Wójtowicz, T., Rygula, R., van der Linden, S., &amp; Morzy, M. (2024). Leveraging artificial intelligence to identify the psychological factors associated with conspiracy theory beliefs online. Nature Communications, 15(1), 7497. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41467-024-51740-9" target="_self">https://doi.org/10.1038/s41467-024-51740-9</a><br>Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, C. R., Thorson, E. A., Watts, D. J., &amp; Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094–1096. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1126/science.aao2998" target="_self">https://doi.org/10.1126/science.aao2998</a><br>Pantazi, M., Hale, S., &amp; Klein, O. (2021). Social and Cognitive Aspects of the Vulnerability to Political Misinformation. Political Psychology, 42(S1), 267–304. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/pops.12797" target="_self">https://doi.org/10.1111/pops.12797</a><br>Pennycook, G., &amp; Rand, D. G. (2019). Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition, 188, 39–50. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1016/j.cognition.2018.06.011" target="_self">https://doi.org/10.1016/j.cognition.2018.06.011</a><br>Sultan, M., Tump, A. N., Ehmann, N., Lorenz-Spreen, P., Hertwig, R., Gollwitzer, A., &amp; Kurvers, R. H. J. M. (2024). Susceptibility to online misinformation: A systematic meta-analysis of demographic and psychological factors. Proceedings of the National Academy of Sciences, 121(47), e2409329121. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1073/pnas.2409329121" target="_self">https://doi.org/10.1073/pnas.2409329121</a><br>Van Bavel, J. J., Harris, E. A., Pärnamets, P., Rathje, S., Doell, K. C., &amp; Tucker, J. A. (2021). Political Psychology in the Digital (mis)Information age: A Model of News Belief and Sharing. Social Issues and Policy Review, 15(1), 84–113. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/sipr.12077" target="_self">https://doi.org/10.1111/sipr.12077</a><br>Weeks, B. E. (2015). Emotions, Partisanship, and Misperceptions: How Anger and Anxiety Moderate the Effect of Partisan Bias on Susceptibility to Political Misinformation. Journal of Communication, 65(4), 699–719. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/jcom.12164" target="_self">https://doi.org/10.1111/jcom.12164</a><br>Arechar, A. A., Allen, J., Berinsky, A. J., Cole, R., Epstein, Z., Garimella, K., Gully, A., Lu, J. G., Ross, R. M., Stagnaro, M. N., Zhang, Y., Pennycook, G., &amp; Rand, D. G. (2023). Understanding and combatting misinformation across 16 countries on six continents. Nature Human Behaviour, 7(9), 1502–1513. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-023-01641-6" target="_self">https://doi.org/10.1038/s41562-023-01641-6</a><br>Aruguete, N., Batista, F., Calvo, E., Guizzo-Altube, M., Scartascini, C., &amp; Ventura, T. (2024). Framing fact-checks as a “confirmation” increases engagement with corrections of misinformation: A four-country study. Scientific Reports, 14(1), 3201. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41598-024-53337-0" target="_self">https://doi.org/10.1038/s41598-024-53337-0</a>Bak-Coleman, J. B., Kennedy, I., Wack, M., Beers, A., Schafer, J. S., Spiro, E. S., ... &amp; West, J. D. (2022). Combining interventions to reduce the spread of viral misinformation. Nature Human Behaviour, 6(10), 1372-1380.<br>Ecker, U. K. H., &amp; Ang, L. C. (2019). Political Attitudes and the Processing of Misinformation Corrections. Political Psychology, 40(2), 241–260. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1111/pops.12494" target="_self">https://doi.org/10.1111/pops.12494</a><br>Feuerriegel, S., DiResta, R., Goldstein, J. A., Kumar, S., Lorenz-Spreen, P., Tomz, M., &amp; Pröllochs, N. (2023). Research can help to tackle AI-generated disinformation. Nature Human Behaviour, 7(11), 1818–1821. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-023-01726-2" target="_self">https://doi.org/10.1038/s41562-023-01726-2</a><br>Hoes, E., Aitken, B., Zhang, J., Gackowski, T., &amp; Wojcieszak, M. (2024). Prominent misinformation interventions reduce misperceptions but increase scepticism. Nature Human Behaviour, 8(8), 1545–1553. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-024-01884-x" target="_self">https://doi.org/10.1038/s41562-024-01884-x</a><br>Kozyreva, A., Lorenz-Spreen, P., Herzog, S. M., Ecker, U. K. H., Lewandowsky, S., Hertwig, R., Ali, A., Bak-Coleman, J., Barzilai, S., Basol, M., Berinsky, A. J., Betsch, C., Cook, J., Fazio, L. K., Geers, M., Guess, A. M., Huang, H., Larreguy, H., Maertens, R., … Wineburg, S. (2024). Toolbox of individual-level interventions against online misinformation. Nature Human Behaviour, 8(6), 1044–1052. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-024-01881-0" target="_self">https://doi.org/10.1038/s41562-024-01881-0</a><br>Lewandowsky, S, and van der Linden, S. (2021). “Countering Misinformation and Fake News Through Inoculation and Prebunking.”&nbsp;European Review of Social Psychology&nbsp;32 (2): 348–84. <a data-tooltip-position="top" aria-label="http://doi.org/10.1080/10463283.2021.1876983" rel="noopener nofollow" class="external-link is-unresolved" href="http://doi.org/10.1080/10463283.2021.1876983" target="_self">doi.org/10.1080/10463283.2021.1876983</a>Maertens, R., Roozenbeek, J., Basol, M., &amp; van der Linden, S. (2021). Long-term effectiveness of inoculation against misinformation: Three longitudinal experiments. Journal of Experimental Psychology: Applied, 27(1), 1.<br>Martel, C., &amp; Rand, D. G. (2023). Misinformation warning labels are widely effective: A review of warning effects and their moderating features. Current Opinion in Psychology, 54, 101710. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1016/j.copsyc.2023.101710" target="_self">https://doi.org/10.1016/j.copsyc.2023.101710</a><br>Martel, C., &amp; Rand, D. G. (2024). Fact-checker warning labels are effective even for those who distrust fact-checkers. Nature Human Behaviour, 8(10), 1957–1967. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41562-024-01973-x" target="_self">https://doi.org/10.1038/s41562-024-01973-x</a><br>McCabe, S.D., Ferrari, D., Green, J. et al. Post-January 6th deplatforming reduced the reach of misinformation on Twitter. Nature 630, 132–140 (2024). <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41586-024-07524-8" target="_self">https://doi.org/10.1038/s41586-024-07524-8</a><br>Nyhan, B., &amp; Reifler, J. (2010). When Corrections Fail: The Persistence of Political Misperceptions. Political Behavior, 32(2), 303–330. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1007/s11109-010-9112-2" target="_self">https://doi.org/10.1007/s11109-010-9112-2</a><br>Nyhan, B. (2021). Why the backfire effect does not explain the durability of political misperceptions. Proceedings of the National Academy of Sciences, 118(15), e1912440117. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1073/pnas.1912440117" target="_self">https://doi.org/10.1073/pnas.1912440117</a><br>Pennycook, G., &amp; Rand, D. G. (2022). Accuracy prompts are a replicable and generalizable approach for reducing the spread of misinformation. Nature Communications, 13(1), 2333. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41467-022-30073-5" target="_self">https://doi.org/10.1038/s41467-022-30073-5</a><br>van der Linden, S. (2022). Misinformation: Susceptibility, spread, and interventions to immunize the public. Nature Medicine, 28(3), 460–467. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41591-022-01713-6" target="_self">https://doi.org/10.1038/s41591-022-01713-6</a><br>Allen, J., Watts, D. J., &amp; Rand, D. G. (2024). Quantifying the impact of misinformation and vaccine-skeptical content on Facebook. Science, 384(6699), eadk3451. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1126/science.adk3451" target="_self">https://doi.org/10.1126/science.adk3451</a>Lenti, J., Mejova, Y., Kalimeri, K., Panisson, A., Paolotti, D., Tizzani, M., &amp; Starnini, M. (2023). Global misinformation spillovers in the vaccination debate before and during the COVID-19 pandemic: multilingual Twitter study. JMIR infodemiology, 3, e44714.Pielke Jr, R. A. (2004). When scientists politicize science: making sense of controversy over The Skeptical Environmentalist. Environmental Science &amp; Policy, 7(5), 405-417.Vicari, R., &amp; Komendatova, N. (2023). Systematic meta-analysis of research on AI tools to deal with misinformation on social media during natural and anthropogenic hazards and disasters. Humanities and Social Sciences Communications, 10(1), 1-14.West, J. D., &amp; Bergstrom, C. T. (2021). Misinformation in and about science. Proceedings of the National Academy of Sciences, 118(15), e1912444117.Note: We don’t focus on CS literature for fake news detection etc. here but there is a ton of work in that space. These selected papers focus on applications of AI in the “era” of AI.<br>Augenstein, I., Bakker, M., Chakraborty, T., Corney, D., Ferrara, E., Gurevych, I., Hale, S., Hovy, E., Ji, H., Larraz, I., Menczer, F., Nakov, P., Papotti, P., Sahnan, D., Warren, G., &amp; Zagni, G. (2025). Community Moderation and the New Epistemology of Fact Checking on Social Media (No. arXiv:2505.20067). arXiv. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.48550/arXiv.2505.20067" target="_self">https://doi.org/10.48550/arXiv.2505.20067</a><br>Augenstein, I., Baldwin, T., Cha, M., Chakraborty, T., Ciampaglia, G. L., Corney, D., DiResta, R., Ferrara, E., Hale, S., Halevy, A., Hovy, E., Ji, H., Menczer, F., Miguez, R., Nakov, P., Scheufele, D., Sharma, S., &amp; Zagni, G. (2024). Factuality challenges in the era of large language models and opportunities for fact-checking. Nature Machine Intelligence, 6(8), 852–863. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s42256-024-00881-z" target="_self">https://doi.org/10.1038/s42256-024-00881-z</a>Costello, T. H., Pennycook, G., &amp; Rand, D. G. (2024). Durably reducing conspiracy beliefs through dialogues with AI. Science, 385(6714), eadq1814.Costello, T. H., Pennycook, G., &amp; Rand, D. (2025). Just the facts: How dialogues with AI reduce conspiracy beliefs. OSF Preprint.<br>Luceri, L., Salkar, T. V., Balasubramanian, A., Pinto, G., Sun, C., &amp; Ferrara, E. (2025). Coordinated Inauthentic Behavior on TikTok: Challenges and Opportunities for Detection in a Video-First Ecosystem (No. arXiv:2505.10867). arXiv. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.48550/arXiv.2505.10867" target="_self">https://doi.org/10.48550/arXiv.2505.10867</a><br>Shoaib, M. R., Wang, Z., Ahvanooey, M. T., &amp; Zhao, J. (2023). Deepfakes, Misinformation, and Disinformation in the Era of Frontier AI, Generative AI, and Large AI Models. 2023 International Conference on Computer and Applications (ICCA), 1–7. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1109/ICCA59364.2023.10401723" target="_self">https://doi.org/10.1109/ICCA59364.2023.10401723</a><br>Schmitt, V., Villa-Arenas, L.-F., Feldhus, N., Meyer, J., Spang, R. P., &amp; Möller, S. (2024). The Role of Explainability in Collaborative Human-AI Disinformation Detection. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2157–2174. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3630106.3659031" target="_self">https://doi.org/10.1145/3630106.3659031</a>Wang, J., Wang, X., &amp; Yu, A. (2025). Tackling misinformation in mobile social networks a BERT-LSTM approach for enhancing digital literacy. Scientific Reports, 15(1), 1118.Xu, D., Fan, S., &amp; Kankanhalli, M. (2023, October). Combating misinformation in the era of generative AI models. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9291-9298).Yang, K. C., Varol, O., Davis, C. A., Ferrara, E., Flammini, A., &amp; Menczer, F. (2019). Arming the public with artificial intelligence to counter social bots. Human Behavior and Emerging Technologies, 1(1), 48-61.<br>Yi, J., Xu, Z., Huang, T., &amp; Yu, P. (2025). Challenges and Innovations in LLM-Powered Fake News Detection: A Synthesis of Approaches and Future Directions. In Proceedings of the 2025 2nd International Conference on Generative Artificial Intelligence and Information Security (pp. 87–93). Association for Computing Machinery. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3728725.3728739" target="_self">https://doi.org/10.1145/3728725.3728739</a><br>Zhang, Y., Sharma, K., Du, L., &amp; Liu, Y. (2024). Toward Mitigating Misinformation and Social Media Manipulation in LLM Era. Companion Proceedings of the ACM Web Conference 2024, 1302–1305. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3589335.3641256" target="_self">https://doi.org/10.1145/3589335.3641256</a>Zhao, Y., Liu, B., Ding, M., Liu, B., Zhu, T., &amp; Yu, X. (2023). Proactive deepfake defence via identity watermarking. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 4602-4611).<br>Chen, C., &amp; Shu, K. (2024). Combating misinformation in the age of LLMs: Opportunities and challenges. AI Magazine, 45(3), 354-368. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1002/aaai.12188" target="_self">https://doi.org/10.1002/aaai.12188</a><br>-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a data-tooltip-position="top" aria-label="https://llm-misinformation.github.io/" rel="noopener nofollow" class="external-link is-unresolved" href="https://llm-misinformation.github.io/" target="_self">LLMs Meet Misinformation (Canyu Chen and Kai Shu)</a> (Project Website)Chen, C., &amp; Shu, K (2024). Can LLM-Generated Misinformation Be Detected?. In The Twelfth International Conference on Learning Representations.<br>-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a data-tooltip-position="top" aria-label="https://github.com/llm-misinformation/llm-misinformation" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/llm-misinformation/llm-misinformation" target="_self">Can LLM-Generated Misinformation Be Detected (ICLR 2024)</a> (Github Repo)Huang, R., Dugan, L., Yang, Y., &amp; Callison-Burch, C. (2024, November). MiRAGeNews: Multimodal Realistic AI-Generated News Detection. In Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 16436-16448).<br>-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a data-tooltip-position="top" aria-label="https://github.com/nosna/miragenews#miragenews-multimodal-realistic-ai-generated-news-detection" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/nosna/miragenews#miragenews-multimodal-realistic-ai-generated-news-detection" target="_self">MiRAGeNews (Github Repo)</a><br>Lin, L., Gupta, N., Zhang, Y., Ren, H., Liu, C.-H., Ding, F., Wang, X., Li, X., Verdoliva, L., &amp; Hu, S. (2025). Detecting Multimedia Generated by Large AI Models: A Survey (No. arXiv:2402.00045). arXiv. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.48550/arXiv.2402.00045" target="_self">https://doi.org/10.48550/arXiv.2402.00045</a><br>Liu, A., Sheng, Q., &amp; Hu, X. (2024). Preventing and Detecting Misinformation Generated by Large Language Models. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 3001–3004. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3626772.3661377" target="_self">https://doi.org/10.1145/3626772.3661377</a>Wang, L. Z., Ma, Y., Gao, R., Guo, B., Zhu, H., Fan, W., ... &amp; Ng, K. C. (2024). Megafake: a theory-driven dataset of fake news generated by large language models. arXiv preprint arXiv:2408.11871.<br>-&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a data-tooltip-position="top" aria-label="https://github.com/zhe-wang0018/MegaFake" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/zhe-wang0018/MegaFake" target="_self">MegaFake Dataset (Github)</a>Zhou, J., Zhang, Y., Luo, Q., Parker, A. G., &amp; De Choudhury, M. (2023, April). Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI conference on human factors in computing systems (pp. 1-20).Barman, D., Guo, Z., &amp; Conlan, O. (2024). The dark side of language models: Exploring the potential of LLMs in multimedia disinformation generation and dissemination. Machine Learning with Applications, 100545.Calvo, P., &amp; Saura García, C. (2024). Generative AI and Democracy: the synthetification of public opinion and its impacts. Available at SSRN 4911710.<br>Chu-Ke, C., &amp; Dong, Y. (2024). Misinformation and Literacies in the Era of Generative Artificial Intelligence: A Brief Overview and a Call for Future Research. Emerging Media, 2(1), 70-85. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1177/27523543241240285" target="_self">https://doi.org/10.1177/27523543241240285</a>&nbsp;De Angelis, L., Baglivo, F., Arzilli, G., Privitera, G. P., Ferragina, P., Tozzi, A. E., &amp; Rizzo, C. (2023). ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Frontiers in Public Health, 11, 1166120.<br>Ferrara, E. (2025). Charting the Landscape of Nefarious Uses of Generative Artificial Intelligence for Online Election Interference (No. arXiv:2406.01862). arXiv. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.48550/arXiv.2406.01862" target="_self">https://doi.org/10.48550/arXiv.2406.01862</a><br>Garry, M., Chan, W. M., Foster, J., &amp; Henkel, L. A. (2024). Large language models (LLMs) and the institutionalization of misinformation. Trends in cognitive sciences. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00221-3" target="_self">https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(24)00221-3</a><br>Jaidka, K., Chen, T., Chesterman, S., Hsu, W., Kan, M.-Y., Kankanhalli, M., Lee, M. L., Seres, G., Sim, T., Taeihagh, A., Tung, A., Xiao, X., &amp; Yue, A. (2025). Misinformation, Disinformation, and Generative AI: Implications for Perception and Policy. Digit. Gov.: Res. Pract., 6(1), 11:1-11:15. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3689372" target="_self">https://doi.org/10.1145/3689372</a><br>Schroeder, D. T., Cha, M., Baronchelli, A., Bostrom, N., Christakis, N. A., Garcia, D., Goldenberg, A., Kyrychenko, Y., Leyton-Brown, K., Lutz, N., Marcus, G., Menczer, F., Pennycook, G., Rand, D. G., Schweitzer, F., Summerfield, C., Tang, A., Bavel, J. V., Linden, S. van der, … Kunst, J. R. (2025). How Malicious AI Swarms Can Threaten Democracy (No. arXiv:2506.06299). arXiv. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.48550/arXiv.2506.06299" target="_self">https://doi.org/10.48550/arXiv.2506.06299</a><br>Wack, M., Ehrett, C., Linvill, D., &amp; Warren, P. (2025). Generative propaganda: Evidence of AI’s impact from a state-backed disinformation campaign. PNAS Nexus, 4(4), pgaf083. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1093/pnasnexus/pgaf083" target="_self">https://doi.org/10.1093/pnasnexus/pgaf083</a><br>Bashardoust, A., Feuerriegel, S., &amp; Shrestha, Y. R. (2024). Comparing the Willingness to Share for Human-generated vs. AI-generated Fake News. Proc. ACM Hum.-Comput. Interact., 8(CSCW2), 489:1-489:21. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3687028" target="_self">https://doi.org/10.1145/3687028</a><br>Danry, V., Pataranutaporn, P., Groh, M., &amp; Epstein, Z. (2025). Deceptive Explanations by Large Language Models Lead People to Change their Beliefs About Misinformation More Often than Honest Explanations. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, 1–31. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1145/3706598.3713408" target="_self">https://doi.org/10.1145/3706598.3713408</a><br>Groh, M., Sankaranarayanan, A., Singh, N., Kim, D. Y., Lippman, A., &amp; Picard, R. (2024). Human detection of political speech deepfakes across transcripts, audio, and video. Nature Communications, 15(1), 7629. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1038/s41467-024-51998-z" target="_self">https://doi.org/10.1038/s41467-024-51998-z</a>Vaccari, C., &amp; Chadwick, A. (2020). Deepfakes and disinformation: Exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Social media+ society, 6(1), 2056305120903408.<br>Wittenberg, C., Epstein, Z., Péloquin-Skulski, G., Berinsky, A. J., &amp; Rand, D. G. (2025). Labeling AI-generated media online. PNAS Nexus, 4(6), pgaf170. <a rel="noopener nofollow" class="external-link is-unresolved" href="https://doi.org/10.1093/pnasnexus/pgaf170" target="_self">https://doi.org/10.1093/pnasnexus/pgaf170</a>]]></description><link>advanced-topics/misinformation/misinformation-syllabus-(advanced-topic).html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Misinformation Syllabus (Advanced Topic).md</guid><pubDate>Fri, 01 May 2026 15:12:56 GMT</pubDate></item><item><title><![CDATA[01. sna_animal_networks]]></title><link>advanced-topics/social-network-analysis/slides/01.-sna_animal_networks.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/01. sna_animal_networks.pdf</guid><pubDate>Tue, 28 Apr 2026 16:27:49 GMT</pubDate></item><item><title><![CDATA[Assignment 3 - Podcast Factchecking]]></title><description><![CDATA[Credits: Irmetova, A., Liu, H., Teleki, M., Carragher, P., Zhang, J., &amp; Caverlee, J. (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts. <a data-tooltip-position="top" aria-label="https://github.com/annatastic/PodChecker" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/annatastic/PodChecker" target="_self">GitHub</a>Podcasts are one of the fastest-growing media formats worldwide, yet they receive almost none of the editorial oversight applied to broadcast journalism. Hosts and guests regularly make factual claims — about science, politics, health, history, economics — without correction, rebuttal, or verification. This makes podcasts a significant and underexplored surface for trust and safety concerns: misinformation, misleading framing, unverifiable assertions, and coordinated narrative-pushing can all enter the information ecosystem through podcast audio without triggering any of the automated moderation systems that operate on text.In this assignment, you will use PodChecker — an automated fact-checking pipeline for podcasts — to collect, analyze, and critically evaluate the factual claims made across a corpus of podcast episodes. PodChecker ingests podcast audio (via file upload or RSS feed), transcribes it using OpenAI Whisper, extracts atomic factual claims using an LLM, and fact-checks each claim using Perplexity's web-search API. The result is a claim-level credibility report — verdict (true / false / misleading / unverifiable) with supporting source URLs — and an episode-level credibility score.Recall that the Bluesky assignment focused on proactive, real-time moderation of individual posts. PodChecker asks a different question: what does applied computational research look like when the medium is audio, the content is long-form, and the platform has no built-in moderation infrastructure? By the end of this assignment you will have hands-on experience with the full pipeline from data collection to analysis to critical evaluation — the same research cycle used in real trust and safety science.You will select a podcast corpus relevant to a trust and safety harm of your choice, run the PodChecker pipeline across a set of episodes, and produce a written analysis of your findings. In the final milestone, you will either extend the system with a new capability, replicate and apply it to a new domain, or critically evaluate its limitations — documenting your process and findings in a short video presentation.There is no auto-grader for this assignment. Your grade depends on the quality of your corpus selection rationale, the rigor of your analysis, and the depth of your critical reflection — not on whether PodChecker produces a particular score.<br>Podcasts are public media, but analysis of named hosts and guests carries ethical responsibilities. Follow the course's <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1YURao9YSOD9OPVi5A5Jn3u_pyWFfECSdnDfCJbcxrVQ/edit?tab=t.0#bookmark=id.gohxxm3tbr5c" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1YURao9YSOD9OPVi5A5Jn3u_pyWFfECSdnDfCJbcxrVQ/edit?tab=t.0#bookmark=id.gohxxm3tbr5c" target="_self">policy on engaging with harmful content</a> throughout this assignment:
Do not analyze content depicting child exploitation, solicitation of illegal activity, or other severely harmful material. If your chosen podcast unexpectedly contains such content, stop and consult an instructor before proceeding.
Be precise in your claims about speakers. Reporting that PodChecker labeled a claim as "false" is different from asserting that the host deliberately lied. Automated fact-checking has error rates; your writeup should reflect this.
Do not publish or publicly share individual claim-level verdicts about named people without explicit instructor approval. The analysis is for academic purposes.
API costs are real. PodChecker consumes OpenAI Whisper (transcription) and Perplexity Sonar (fact-checking) API calls. Budget your usage — run on a small sample first, cache results aggressively, and use the MAX_AUDIO_SIZE_MB cap to control costs. Discuss API cost management in your writeup. Analysis notebook (analysis/my_corpus_analysis.ipynb): a documented Jupyter notebook containing your corpus collection, credibility analysis, and visualizations.
Code for any extensions (Milestone 3, Track A): well-commented Python, placed in analysis/ with a short README explaining how to run it.
<br>A 10-minute recorded video presentation covering all three milestones (see <a class="internal-link" data-href="#presentation-guidelines" href="#presentation-guidelines" target="_self" rel="noopener nofollow">Presentation Guidelines</a>).
Your presentation slides or any other materials used in the video.
PodChecker is a research prototype with two usage modes:Web application — a React + Flask stack that accepts an audio file or RSS feed URL, runs the full pipeline, and renders a results table in the browser. This is the easiest way to verify that the system is working.Python analysis client — the PodCheckerClient class in analysis/podchecker_client.py allows you to run the pipeline programmatically across many episodes, with built-in audio and results caching. This is what you should use for your corpus analysis.RSS feed / audio file ↓ Whisper (small.en) ← OpenAI API ↓ transcript (text) ↓ Claim extraction ← OpenAI API (GPT-4o) ↓ Fact-checking loop ← Perplexity Sonar (web search) ↓ claim-level verdicts true / false / misleading / unverifiable ↓ credibility score (true=100%, false=0%, misleading=50%, unverifiable=excluded from score)
Source reliability ratings (from site/backend/filtered_attrs.csv) assign a 1–6 quality score to fact-check sources; sources rated ≥ 5 are marked "trusted" with a star prefix in results.
Python 3.10+
Node.js 18+ and npm
ffmpeg (required for Whisper audio processing)
OpenAI API key (for transcription and claim extraction)
Perplexity API key (for web-search fact-checking)
git clone https://github.com/annatastic/PodChecker.git
cd PodChecker
macOS (Homebrew):brew install ffmpeg
<br>Windows: Download from <a data-tooltip-position="top" aria-label="https://ffmpeg.org/download.html" rel="noopener nofollow" class="external-link is-unresolved" href="https://ffmpeg.org/download.html" target="_self">ffmpeg.org/download.html</a> and add to PATH.Ubuntu/Debian:sudo apt install ffmpeg
Verify: ffmpeg -versioncd site/backend
pip3 install --upgrade pip
pip3 install pandas openai openai-whisper perplexityai feedparser requests flask flask-cors
For the analysis notebook only (no backend server needed):pip3 install pandas openai openai-whisper perplexityai feedparser requests matplotlib
Set your keys as environment variables (do not hard-code them in notebooks you submit):export OPENAI_API_KEY="sk-..."
export PERPLEXITY_API_KEY="pplx-..."
Or use a .env file (add .env to .gitignore before committing anything):OPENAI_API_KEY=sk-...
PERPLEXITY_API_KEY=pplx-...
Run the web application to confirm the full pipeline works:# Terminal 1: start the backend
cd site/backend
python3 app.py # runs on port 8000 # Terminal 2: start the frontend
cd site/frontend
npm install
npm run dev # runs on port 5173
<br>Open <a rel="noopener nofollow" class="external-link is-unresolved" href="http://localhost:5173" target="_self">http://localhost:5173</a> in a browser. Use the sample report dropdown to verify that a pre-processed result loads correctly. You should not need to call any APIs to view sample reports.Due: end of Week 14 (submit as a brief written memo, ≤ 500 words + sample output)Select a podcast that is relevant to a trust and safety harm of your choice and verify that PodChecker can process it. Your podcast choice should be motivated by a specific T&amp;S concern — not simply by personal interest or convenience.Good corpus choices share these properties:
T&amp;S relevance — the podcast regularly discusses topics where false or misleading claims could cause real-world harm (health misinformation, political disinformation, financial fraud, extremist rhetoric, etc.)
Public RSS feed — the podcast is accessible via a public RSS feed or individual episode audio URLs; this is what PodChecker uses to ingest content
Sufficient volume — the podcast has at least 15 recent episodes you can analyze; older archives are fine if they cover a coherent time period
English audio — Whisper's small.en model performs best on English; non-English podcasts require the multilingual model (you may switch to it, but note this in your writeup)
Here are illustrative examples — you are not limited to these:Submit a short memo covering:
Podcast name, RSS URL, and episode count in the period you plan to analyze.
T&amp;S rationale — which harm type are you investigating, and why is this podcast a good data source for it?
Sample output — run PodChecker on 1–2 episodes (use the web interface or the analysis client), and include a screenshot or table of the claim-level results.
API cost estimate — based on your sample run, estimate the total OpenAI and Perplexity API cost for your full corpus (use the usage stats printed by the API calls). Propose a MAX_AUDIO_SIZE_MB cap if needed.
Due: end of Week 15 (with Milestone 3)Run PodChecker across a corpus of at least 15 episodes and produce a rigorous quantitative analysis of claim-level credibility across your corpus.Use analysis/episode_credibility_analysis.ipynb as your starting point. Adapt it for your corpus by:
Pointing it at your RSS feed (or a list of episode audio URLs if no RSS is available):
from analysis import get_recent_episodes, compute_credibility_percentage, PodCheckerClient RSS_PATH = "my_podcast_rss.xml" # local copy of the RSS feed
NUM_EPISODES = 15 episodes = get_recent_episodes(RSS_PATH, NUM_EPISODES) Initializing the client (always use mode="local" for corpus analysis — it is faster and cheaper than the HTTP mode):
client = PodCheckerClient( openai_api_key=OPENAI_API_KEY, perplexity_api_key=PERPLEXITY_API_KEY, mode="local", max_audio_size_mb=60 # adjust based on your API budget
) Running the analysis loop — results are cached to data/ automatically so you can re-run the notebook without incurring API costs for already-processed episodes:
episode_results = []
for episode in episodes: result = client.analyze_episode(episode, podcast_name="MyPodcast") episode_results.append({'episode': episode, 'result': result})
Your notebook must include the following:A. Episode credibility over time — a line plot of credibility score (0–100%) across episode dates, following the template in the starter notebook. Annotate any notable outliers.B. Claim-level verdict distribution — a bar chart or table showing the proportion of claims labeled true, false, misleading, and unverifiable across the full corpus. Compute this both per-episode and aggregated.C. Error analysis — for at least 10 claims that received a "false" or "misleading" verdict, manually verify the verdict by checking the supporting sources PodChecker provides. Report:
How many did you agree with? Disagree? Find ambiguous?
What patterns explain errors (hallucinated sources, out-of-date information, opinion framed as fact)?
D. Failure modes — document any episodes that failed to process (rate limits, audio access errors, truncation) and how you handled them. What fraction of your corpus is usable?E. Cost accounting — report the actual API costs incurred (OpenAI token counts for Whisper + GPT-4o, Perplexity call count). Compare to your Milestone 1 estimate.Due: end of Week 15 (submitted together with Milestone 2)Choose one of the three tracks below. All tracks have equivalent weight in the grading rubric. Your choice should reflect what you find most interesting and what is most tractable given your corpus.Extend PodChecker with a new capability that addresses a gap in the current system. Examples:
Multi-podcast comparison — run PodChecker on two or more podcasts covering the same topic (e.g., two podcasts on the same health topic with different credibility reputations) and compare their credibility profiles quantitatively.
Claim-type taxonomy — add a claim-type classifier that categorizes claims before fact-checking (e.g., statistical claims, causal claims, identity claims, predictions) and analyze how credibility varies by claim type.
<br>Alternative fact-checking source — replace or supplement Perplexity with a structured fact-check database (e.g., <a data-tooltip-position="top" aria-label="https://newsinitiative.withgoogle.com/resources/trainings/google-fact-check-tools/" rel="noopener nofollow" class="external-link is-unresolved" href="https://newsinitiative.withgoogle.com/resources/trainings/google-fact-check-tools/" target="_self">Google Fact Check Tools API</a>, <a data-tooltip-position="top" aria-label="https://idir.uta.edu/claimbuster/" rel="noopener nofollow" class="external-link is-unresolved" href="https://idir.uta.edu/claimbuster/" target="_self">ClaimBuster</a>, or <a data-tooltip-position="top" aria-label="https://communitynotes.x.com/guide/en/under-the-hood/download-data" rel="noopener nofollow" class="external-link is-unresolved" href="https://communitynotes.x.com/guide/en/under-the-hood/download-data" target="_self">Community Notes data</a>) and compare the verdicts produced by each source on the same claims.
Temporal trend detection — identify claims that recur across episodes and analyze how their veracity changes over time (e.g., does a podcast host's credibility improve after a public correction?).
Your extension should include working code and a brief evaluation demonstrating that it produces meaningful results on your corpus.Replicate the core PodChecker analysis from Irmetova et al. (2026) on a new podcast corpus and apply the findings to a specific T&amp;S question. This track is appropriate if your main interest is empirical analysis rather than system development.Your writeup should:
Describe how your corpus differs from the paper's (podcast genre, time period, harm type).
Report credibility scores, claim distributions, and source reliability breakdowns comparable to the paper's results.
Apply the findings to a specific T&amp;S question — for example: Does credibility score correlate with the podcast's media bias rating from an external source? Do episodes featuring certain guest types (politicians, scientists, activists) have systematically different verdicts?
Critically discuss what the system gets right and what it misses for your specific harm type.
Conduct a systematic evaluation of PodChecker's accuracy, limitations, and potential for harm — without building a new extension or collecting a large new corpus. This track is appropriate if you want to focus on evaluation methodology and ethical analysis.Your evaluation should address at least three of the following:
Precision and recall — manually fact-check a stratified sample of claims (e.g., 30–50 claims) and compute precision, recall, and F1 against your ground-truth labels for each verdict category.
Hallucination analysis — examine the supporting source URLs PodChecker provides. How often do the URLs actually support the verdict? How often are they irrelevant or broken?
Claim extraction quality — evaluate whether the claims Whisper + GPT-4o extract are the important claims from the episode, or whether the system over- or under-samples certain claim types.
Domain sensitivity — test the system on a domain where automated fact-checking is particularly risky (contested political topics, rapidly evolving science) and analyze where human judgment would be required.
Bias and representation — does the system systematically produce different verdicts for claims made by speakers of different political affiliations, genders, or expertise levels? Design a small experiment to test this.
Record a ~10-minute video covering all milestones. You will submit the video along with your notebook and slides. Structure your presentation as follows:
Podcast corpus and T&amp;S rationale (2 min) — introduce the podcast(s) you chose, the harm type you are investigating, and why this corpus is an interesting subject for T&amp;S research.
System overview (1 min) — briefly explain how PodChecker works (pipeline diagram from the README is fine); you can assume your audience knows what transcription and LLMs are.
Corpus analysis results (3 min) — walk through your key findings from Milestone 2: credibility trend over time, verdict distribution, error analysis highlights. Show at least one chart.
Extension / Replication / Critical Evaluation (2 min) — present your Milestone 3 track: what you built or analyzed, what you found, and what surprised you.
Ethical reflection and limitations (1 min) — what are the risks of deploying a system like PodChecker at scale? What should a T&amp;S practitioner know before using automated podcast fact-checking?
Future directions (30 sec) — one specific, actionable improvement you would make if you had more time.
Unlike the Bluesky assignment (which has an auto-grader), there is no single correctness score for this assignment. Instead, demonstrate rigor through:
Reproducibility — your notebook should run end-to-end from a clean environment using cached data. Include a requirements.txt or environment spec.
Sample size — at minimum 15 episodes with analyzable audio. Justify your sample size in the writeup.
Manual verification — the error analysis in Milestone 2C is the closest thing to ground-truth evaluation; take it seriously. Spot-checking 10 claims is the minimum; more is better.
Quantitative reporting — report credibility scores with summary statistics (mean, median, standard deviation); report claim counts and verdict breakdowns with exact numbers; plot over time when the corpus spans more than a week.
Acknowledgment of failures — episodes that failed to process, API rate limits hit, audio truncations, and claims that were too ambiguous to verify are all expected and should be documented, not hidden. <br><a data-tooltip-position="top" aria-label="https://github.com/annatastic/PodChecker" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/annatastic/PodChecker" target="_self">GitHub repository</a> — source code, README, sample reports
analysis/episode_credibility_analysis.ipynb — starter notebook for corpus analysis
analysis/podchecker_client.py — PodCheckerClient API documentation (docstrings)
analysis/rss_utils.py — get_recent_episodes, compute_credibility_percentage functions <br><a data-tooltip-position="top" aria-label="https://openai.com/api/pricing/" rel="noopener nofollow" class="external-link is-unresolved" href="https://openai.com/api/pricing/" target="_self">OpenAI API pricing</a> — Whisper: $0.006/min of audio; GPT-4o: varies by token count
<br><a data-tooltip-position="top" aria-label="https://docs.perplexity.ai/" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.perplexity.ai/" target="_self">Perplexity API docs</a> — Sonar model pricing per search call
<br><a data-tooltip-position="top" aria-label="https://developers.google.com/fact-check/tools/api" rel="noopener nofollow" class="external-link is-unresolved" href="https://developers.google.com/fact-check/tools/api" target="_self">Google Fact Check Tools API</a> — free structured fact-check database (useful for Track A/B) <br><a data-tooltip-position="top" aria-label="https://www.listennotes.com/api/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.listennotes.com/api/" target="_self">Listen Notes API</a> — podcast search and RSS discovery
<br><a data-tooltip-position="top" aria-label="https://podcastindex.org/" rel="noopener nofollow" class="external-link is-unresolved" href="https://podcastindex.org/" target="_self">Podcast Index</a> — open podcast RSS directory
Most major podcast platforms (Spotify, Apple Podcasts) publish RSS feeds for public shows Course reading list — Misinformation sections (Lectures 17–21) for background on claim detection, source credibility, and intervention design
Irmetova et al. (2026) — the PodChecker paper; available in PodChecker/ folder
]]></description><link>assignments/assignment-3-podcast-factchecking.html</link><guid isPermaLink="false">Assignments/Assignment 3 - Podcast Factchecking.md</guid><pubDate>Mon, 27 Apr 2026 21:10:04 GMT</pubDate></item><item><title><![CDATA[Assignment 2 - Bluesky Moderation]]></title><description><![CDATA[Credits: <a data-tooltip-position="top" aria-label="https://classes.cornell.edu/browse/roster/SP25/class/CS/5342" rel="noopener nofollow" class="external-link is-unresolved" href="https://classes.cornell.edu/browse/roster/SP25/class/CS/5342" target="_self">Cornell CS 5342</a>, original [link](<a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1oiHNC7073vZ1kWSEjwO2MO1cOErdRH15ImCsNXgidpI/edit?tab=t.0#heading=h.rsrgt54e81af" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1oiHNC7073vZ1kWSEjwO2MO1cOErdRH15ImCsNXgidpI/edit?tab=t.0#heading=h.rsrgt54e81af" target="_self">CS5342 Automated moderator for Bluesky</a>), <a data-tooltip-position="top" aria-label="https://github.com/PeterCarragher/bluesky_labeler_assignment" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/PeterCarragher/bluesky_labeler_assignment" target="_self">GitHub</a>In this assignment, you will gain first-hand experience with Bluesky’s customizable approach to moderation. We’ll walk you through implementing a labeler, which is a service that attaches categorical labels to Bluesky posts and accounts. Users who subscribe to your labeler can configure how these labels are applied to the posts they see. For instance, your service may attach a label for spam or NSFW content (throughout, content will refer to both posts and accounts). Some users may wish to hide such content altogether, others may prefer that a badge be attached to it. Recall that content moderation is not solely about blocking harmful content. It can also be about organizing and displaying content in a way that is helpful to users. Similarly, labelers are not just for marking definitely objectionable posts and accounts. Here are a few examples:
<br>The <a data-tooltip-position="top" aria-label="https://bsky.app/profile/did:plc:wkoofae5uytcm7bjncmev6n6" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/did:plc:wkoofae5uytcm7bjncmev6n6" target="_self">pronouns labeler</a> allows users to display a badge on their profile indicating their pronouns that subscribers to the labeler can see. <br>The <a data-tooltip-position="top" aria-label="https://bsky.app/profile/did:plc:ylmdlijvrvgbe4md6v4dyce6" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/did:plc:ylmdlijvrvgbe4md6v4dyce6" target="_self">US Government Contributions</a> labeler will apply badges to the accounts of representatives with the organizations that fund them. After subscribing to this labeler, you can look up Alexandria Ocasio-Cortez’s <a data-tooltip-position="top" aria-label="https://bsky.app/profile/aoc.bsky.social" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/aoc.bsky.social" target="_self">account</a>, and see badges indicating that her donor list includes employees of or PACs tied to Alphabet and the City of New York. <br>This <a data-tooltip-position="top" aria-label="https://bsky.app/profile/did:plc:bpkpvmwpd3nr2ry4btt55ack" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/did:plc:bpkpvmwpd3nr2ry4btt55ack" target="_self">popular labeler</a> attempts to identify AI-generated imagery.
<br>You can find more examples of labelers at <a data-tooltip-position="top" aria-label="https://www.bluesky-labelers.io/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.bluesky-labelers.io/" target="_self">Bluesky-labelers.io</a>. We encourage you to try some of them out before starting the assignment. In the first part of the assignment, you will build an automated labeler that will apply labels to Bluesky posts based on their text content. We will provide a test set of posts and their expected labels in a CSV file. You should not hard-code these labels in your implementation – we will test your code on some examples that do not appear in this test set, and a portion of your grade will be based on your labeler’s accuracy on these instances. Furthermore, if you do hard-code labels for particular posts, you will receive a 0 for the functionality score. If your labeler produces nothing for all inputs, it will also receive a 0.In the second part of this assignment, you will implement your own automated moderation policy as a Bluesky labeler. This can be the policy you articulated in Assignment 2, but you are free to choose another topic. We expect you to comprehensively test your code for this component. The extent to which your code and testing is well-documented will constitute a portion of your grade. This part of the assignment is more open-ended, so you’ll have to demonstrate to us that you’ve thought through how you can verify that your implementation will meet your stated moderation goal. The creativity you demonstrate in your chosen problem/solution will also constitute a portion of your grade.<br>Throughout this class we have discussed how safety measures can in turn be abused. We encourage you to continuously check the work that you are doing for unintended consequences and follow our course’s <a data-tooltip-position="top" aria-label="https://docs.google.com/document/d/1YURao9YSOD9OPVi5A5Jn3u_pyWFfECSdnDfCJbcxrVQ/edit?tab=t.0#bookmark=id.gohxxm3tbr5c" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/document/d/1YURao9YSOD9OPVi5A5Jn3u_pyWFfECSdnDfCJbcxrVQ/edit?tab=t.0#bookmark=id.gohxxm3tbr5c" target="_self">policy</a> on engaging with harmful content. You should be particularly careful when completing <a data-tooltip-position="top" aria-label="#%5Bpart-ii%5D-milestone-5:-implementing-your-policy-proposal" rel="noopener nofollow" class="external-link" href="#[part-ii]-milestone-5:-implementing-your-policy-proposal" target="_self">Milestone 5</a>. This will include documenting your process carefully, clearly signposting the exercise as an academic effort, and providing a way for labeled users to express any concerns with the label.
A well-documented implementation of your labeler in python Your Part I implementation should be in automated_labeler.py For Part II, create a file named policy_proposal_labeler.py – you can base that implementation off of the code we provide for you. A 10-minute video presentation describing your implementation choices, your testing approach, and an evaluation of your solution for addressing the chosen harm Your presentation slides and any other materials you used to create your video
<br>You can find a detailed discussion of the Bluesky moderation infrastructure <a data-tooltip-position="top" aria-label="https://docs.bsky.app/blog/blueskys-moderation-architecture" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.bsky.app/blog/blueskys-moderation-architecture" target="_self">here</a>. We provide a high-level overview below. Account data is hosted at a personal data server (PDS). This data is distributed, via a relay, to an AppView service. Labelers are services that generate labels on posts and accounts. These labels are sent to the AppView. When user client devices download posts from the AppView, they obtain labels associated with the content they received, depending on what labelers they are subscribed to.<br><img alt="Pasted image 20260415142449.png" src="assets/pasted-image-20260415142449.png" target="_self"><br>The figure above<a data-footref="1" href="#fn-1-4e5c5c1f7f459aeb" class="footnote-link" target="_self" rel="noopener nofollow">[1]</a>, from the Bluesky moderation infrastructure overview, provides a visual representation of how labels are generated and sent to the AppView. Bluesky provides its own labeler that handles platform-level moderation policies. In addition to this, users can opt into other third-party labelers for additional layers of moderation. You will implement and run one such labeler for this assignment.<br>You can access the starter code from the class <a data-tooltip-position="top" aria-label="https://github.com/cornelltech/cs5342-spring2025" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/cornelltech/cs5342-spring2025" target="_self">Github</a>, under the bluesky-assign3 directory.<br>In an actual production environment, your labeler would likely ingest posts from the <a data-tooltip-position="top" aria-label="https://docs.bsky.app/docs/advanced-guides/firehose" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.bsky.app/docs/advanced-guides/firehose" target="_self">firehose</a>, which provides a stream of content as it is disseminated through the network. However, for the purposes of this assignment, your labeler will be ingesting posts from a CSV file. This will allow for easier testing and debugging.<br>In this assignment, your labeler consists of two components: the first is the labeling server, which interfaces with the AppView to attach labels to content. This is a Javascript program that uses the <a data-tooltip-position="top" aria-label="https://www.npmjs.com/package/@skyware/labeler" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.npmjs.com/package/@skyware/labeler" target="_self">skyware/labeler</a> library. The second component is your labeler bot, which you will implement as a python program that will interface with the labeler server to produce labels.<br>In order to create a labeler, you need a public domain to host the labeling server and a Bluesky account associated with the labeler. We will handle the hosting of the labeling server for you. If you’re interested in an additional challenge and want to own/operate your own labeling infrastructure, you can consult <a data-tooltip-position="top" aria-label="https://skyware.js.org/guides/labeler/introduction/getting-started/" rel="noopener nofollow" class="external-link is-unresolved" href="https://skyware.js.org/guides/labeler/introduction/getting-started/" target="_self">this guide</a>. For the purposes of completing this assignment, you are not required to make your labeler live (i.e., emit public labels). In fact, for Part II, you should be careful to check with us before you start attaching labels to public posts. However, we give you the option to emit labels so you can see your hard work in action on the Bluesky network.Start by creating your own GitHub repository with a clone of the starter code.Make sure you have nodejs and python installed. Install the skyware labeler package with the following command:npm install @skyware/labelerMake sure you have Python 3 installed, along with the ATProto, Dotenv, Requests, and Perception modules:pip install atproto dotenv requests perceptionRun the following command in the starter code directory to ensure that you can access posts.python get_post_test.pyIf this runs successfully, you are ready to begin the assignment. Testing that you can emit public labels (the following section) is optional.From your browser, visit https://&lt;your-domain&gt;/xrpc/com.atproto.label.queryLabels
This will display all the labelers that have been issued by your labeler. Initially, this list will be empty. Let’s change that. Run the following command to apply a label to the bsky.app account:python label.py post https://bsky.app/profile/bsky.app/post/3l6oveex3ii2l great<br>This applies a “great” label to the post at the specified <a data-tooltip-position="top" aria-label="https://bsky.app/profile/bsky.app/post/3l6oveex3ii2l" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/bsky.app/post/3l6oveex3ii2l" target="_self">URL</a>. You can use the skyware/labeler command line utility to modify the labels that your labeler supports.Subscribe to your labeler from a different account (perhaps your personal bluesky account) and visit the post in the URL to observe that the label has been applied:<br><img alt="Pasted image 20260415142538.png" src="assets/pasted-image-20260415142538.png" target="_self"><br>You can also use the <a data-tooltip-position="top" aria-label="https://blue.mackuba.eu/scanner/" rel="noopener nofollow" class="external-link is-unresolved" href="https://blue.mackuba.eu/scanner/" target="_self">Label Scanner</a> tool to verify that your label was applied to the post.At this point, you have confirmed that you can emit labels for particular posts and accounts via the command line tool. That’s a great achievement already – you have the essential infrastructure for running your own third-party moderation service on Bluesky, congratulations!Now, you’ll automatically apply labels based on posts that meet certain criteria. Open up automated-labeler.py. You’ll notice that we provide a constructor for your labeler and a moderate_post function. This function takes as input a url to a Bluesky post and produces an List[str], i.e., the function returns a list of string, corresponding to a label if there is one to be added for the post, or a [] value if there are no labels to add. You must implement your labeling logic in this function as this will interface with the auto-grader for the assignment. When you run your labeler via test-labeler.py, the output of moderate_post will be used to emit a label via the label_post function defined in label.py. You can configure whether to actually emit labels to the Bluesky network via a command-line argument for test-labeler.py – while you’re testing your code, you shouldn’t be emitting labels.A portion of your grade will consist of your coding style – your code should be legible and well-organized. You should decompose the logic in moderate_post across different functions that you’ll define for your AutomatedLabeler.For Part I, we will provide an isolated testing script for you to test how your code generates labels. This will be the same script that our auto-grader will use in determining the functionality score for Part I. You may find this script helpful for your testing set-up in Part II as well.<br>A common moderation technique that platforms employ is text matching against a list of known harmful text. In this part of the assignment, you will implement this technique to label posts containing Trust-and-Safety-related words/domains. We sourced the words from the TSPA <a data-tooltip-position="top" aria-label="https://www.tspa.org/curriculum/ts-curriculum/glossary/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.tspa.org/curriculum/ts-curriculum/glossary/" target="_self">glossary</a>.
You will find this list in t-and-s-words.csv and a list of T&amp;S domains in t-and-s-domains.csv. For each post in input-posts-t-and-s.csv, apply a “t-and-s” label to those that match either list. Make sure to take into account case sensitivity. If the word “moderate” is on the list, then a post containing the word “mOderAtE” should also be labeled.Label posts that link to news articles with the news publications with which they are affiliated. You will have to create labels for the following publications: CNN, BBC, NYT, Washington Post, Fox News, Reuters, NPR, AP. The file news-domains.csv will contain a list of the domains you should scan for, along with the label to apply. For each post in input-posts-cite.csv, apply the appropriate label(s). Note that if there are multiple news links from different sources, then multiple labels should be generated for each source. If there are multiple links from the same source, then only one label should be generated.Many platforms employ a technique called perceptual hash matching in order to detect harmful or illegal images. An image is passed through a perceptual hash function, which outputs a bitstring (a sequence of 0’s and 1’s), such that two similar images should hash to similar bitstrings. In this part of the assignment, you will use this technique to identify pictures of dogs that match a known list (the dog-list).<br>For posts in input-posts-dogs.csv containing an image matching the image dog-list (the images in the dog-list-images directory – sourced from <a data-tooltip-position="top" aria-label="https://bsky.app/profile/weratedogs.com" rel="noopener nofollow" class="external-link is-unresolved" href="https://bsky.app/profile/weratedogs.com" target="_self">WeRateDogs</a>), apply the “dog” label. A match is defined as the image being within a hamming distance of THRESH<a data-footref="2" href="#fn-2-4e5c5c1f7f459aeb" class="footnote-link" target="_self" rel="noopener nofollow">[2]</a> of the target image’s perceptual hash. You can use the PHash implementation provided <a data-tooltip-position="top" aria-label="https://github.com/thorn-oss/perception" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/thorn-oss/perception" target="_self">here</a> to perform perceptual hashing. We leave it to you to figure out how to extract the image(s) contained within a post. You’ll find it helpful to consult the atproto documentation along with the PIL and requests python modules. See if you can notice a pattern in the URLs associated with post images.In Part I of this assignment, you gained familiarity with the AT protocol and implemented automated moderation routines. Using those skills, you’ll extend your labeler to handle a harm type of your choice. You are encouraged to implement the policy proposal you outlined in Assignment 2 because you’ll have spent significant time grappling with it, but you are also free to choose a different problem to tackle if you don’t think you can implement your solutions from Assignment 2. Your choice to build on Assignment 2 or start anew will not affect your grade. Recall that your implementation for Part II should be in a file named policy_proposal_labeler.py.We expect your problem selection and solution to demonstrate a reasonable level of creativity, sophistication, and involvement. For instance, tackling toxicity by making a call to the Perspective API for each post and attaching a “toxic” label if it exceeds some threshold would not suffice. You will likely have to iterate on your policy proposal and implementation to achieve something reasonable. Document this process and discuss it in your presentation.You should begin by gathering data on the harm you plan to tackle. This will inform your testing approach and solution design. You can use the ATProto SDK to crawl Bluesky and filter for content that may be relevant to the harm you address. You can also leverage research done for Assignment 2. Part of your grade will be based on the description, execution, and efficacy of your testing setup. Depending on your approach, you may have to manually label some of the data you collect. We do not want you to deal with illegal or severely harmful content (e.g. sale/solicitation of illegal substances, CSAM, etc).Remember that precision in labeling at scale is difficult – and you only have a few weeks. For that reason, we encourage you to choose a labeling implementation that recognizes it is detecting potentially sensitive content rather than one that is categorical about finding the harmful material – unless you can be highly confident about the accuracy of your endeavors. You can help yourself by being precise about what you call the labeler; you can then explain why that labeler might help fight the harm you care about in your presentation / video. Here are some illustrative examples:
“Potentially soliciting financial information” is a better labeler than “Fraud posts”. You might use a combination of text-matching and LLM reasoning to label posts that include certain brand names tied to money exchange (e.g. Venmo, CashApp) and a call to action (e.g. “Send me your” or “Give me”). Recognize that people often provide this information during emergencies, for fundraising, or as tips for their online work (e.g. on Patreon). “Addresses content that has been fact-checked before” is a better labeler than “Fake news” both because it is more precise and because you are not making a definitive judgment about the veracity of the Bluesky post, which will be very hard. To build such a labeler you may choose to lean on the Fact Check Explorer API or the open source Community Notes data.
For inspiration, we provide below a non-exhaustive list of inputs, signals, and tools you may consider using in your labeler:
<br><a data-tooltip-position="top" aria-label="https://perspectiveapi.com/" rel="noopener nofollow" class="external-link is-unresolved" href="https://perspectiveapi.com/" target="_self">Perspective API for toxicity scoring</a> <br><a data-tooltip-position="top" aria-label="https://newsinitiative.withgoogle.com/resources/trainings/google-fact-check-tools/" rel="noopener nofollow" class="external-link is-unresolved" href="https://newsinitiative.withgoogle.com/resources/trainings/google-fact-check-tools/" target="_self">Google fact checking tools</a>, other fact checking databases/APIs Analysis of the Bluesky network – looking at follower lists, number of posts/replies etc. other metadata. This could be helpful for analyzing particular communities. User input – users can message your labeler/react to posts. This can inform a collaborative voting/labeling approach <br>Non-profit, human rights, and/or legal groups that have categorized organizations in ways that may be useful for labeling purposes (e.g. <a data-tooltip-position="top" aria-label="https://rsf.org/en/index" rel="noopener nofollow" class="external-link is-unresolved" href="https://rsf.org/en/index" target="_self">RSF Freedom of the Press index</a>) LLMs, computer vision models
For full transparency, you should make it clear in the description of your labeler account that your labeler is part of an educational exercise and that it should not be trusted for complete accuracy. You should also collect and respond to any serious criticism.Take care to consider the ethical implications of deploying your labeler and ensure that it does not lead to or provide a vector for abuse/harm. For instance, you could see how labeling non-notable individuals for their perceived political position on a hot-button topic could lead to doxxing campaigns or worse. Please feel free to reach out to us if you want to gut-check your proposed labeler for possible harmful consequences.
~10 minute recorded video Introduce, motivate, and explain the harm you aim to mitigate, along with your proposed policy Discuss the various approaches you tried out, explain what hurdles where and what you needed to go back on in your policy to make a better implementation that reflects it Give a high-level technical overview of your implementation Provide a demo of your labeler in action Discuss your approach to testing and evaluation Analyze the ethical implications of deploying your labeler Talk about future areas for improvement
We expect you to test on a reasonably large number of posts (e.g. somewhere in the ballpark of ~100), and evaluate the accuracy, precision, and recall of your labeler (for labelers highly dependent on user input, this analysis may look slightly different). You should also discuss the efficiency and performance of your labeler in terms of the amount of computation and memory it requires – these are things you may consider measuring e.g., How long does it take your labeler to make a decision on a particular post? How much memory does it consume? How much network communication does it require?As you build an application that interfaces with Bluesky and the AT protocol, you’ll likely have conceptual questions about how the protocol and python SDK work. Additionally you may wonder about useful APIs for implementing your Part II solution. We list relevant resources below:
<br><a data-tooltip-position="top" aria-label="https://atproto.com/specs/atp" rel="noopener nofollow" class="external-link is-unresolved" href="https://atproto.com/specs/atp" target="_self">AT Protocol spec</a> <br><a data-tooltip-position="top" aria-label="https://atproto.blue/en/latest/" rel="noopener nofollow" class="external-link is-unresolved" href="https://atproto.blue/en/latest/" target="_self">AT Protocol Python SDK documentation</a> <br><a data-tooltip-position="top" aria-label="https://discord.gg/PCyVJXU9jN" rel="noopener nofollow" class="external-link is-unresolved" href="https://discord.gg/PCyVJXU9jN" target="_self">Bluesky developer discord</a> <br><a data-tooltip-position="top" aria-label="https://www.bluesky-labelers.io/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.bluesky-labelers.io/" target="_self">List of Bluesky labelers</a> <br><a data-tooltip-position="top" aria-label="https://docs.bsky.app/" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.bsky.app/" target="_self">Broader developer docs for Bluesky</a>
Your grade in the assignment will be made up following components: <br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.bsky.app/blog/blueskys-moderation-architecture" target="_self">https://docs.bsky.app/blog/blueskys-moderation-architecture</a> <a href="#fnref-1-4e5c5c1f7f459aeb" class="footnote-backref footnote-link" target="_self" rel="noopener nofollow">↩︎</a>
<br>This is a constant that will be provided in the starter code.<a href="#fnref-2-4e5c5c1f7f459aeb" class="footnote-backref footnote-link" target="_self" rel="noopener nofollow">↩︎</a>
]]></description><link>assignments/assignment-2-bluesky-moderation.html</link><guid isPermaLink="false">Assignments/Assignment 2 - Bluesky Moderation.md</guid><pubDate>Mon, 27 Apr 2026 21:09:56 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Assignment 1 - Discord Bot]]></title><description><![CDATA[Source code: <a data-tooltip-position="top" aria-label="https://github.com/stanfordio/cs152bots" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/stanfordio/cs152bots" target="_self">GitHub</a><br>
Credits: Trust &amp; Safety Teaching Consortium and <a data-tooltip-position="top" aria-label="https://online.stanford.edu/courses/cs152-trust-and-safety" rel="noopener nofollow" class="external-link is-unresolved" href="https://online.stanford.edu/courses/cs152-trust-and-safety" target="_self">Cornell Tech</a> Alex Stamos, Stanford Internet Observatory Shelby Grossman, Stanford Internet Observatory Jeffrey Hancock, Stanford Internet Observatory For the course project, you and your group will be the Trust and Safety team at a major social media or consumer cloud platform. You will be assigned to a group at the end of April. The team will focus on a particular type of abuse, proposing policies to the executives at your company as well as researching and implementing relevant technological solutions within a content moderation bot in Discord. The project is split into three milestones, which will be completed over the course of the quarter. The first milestone will be completed individually. The second and third milestones will be completed with your group. The final milestone will culminate in a presentation that your group will give to the teaching team and guest judges from industry, the evening of Wednesday, June 5th.Please do not generate the text you submit using ChatGPT or any other LLM. You will have plenty of opportunities to use these systems to generate test data or perform abuse detection in future milestones, but anything submitted by students in this assignment should be written by humans.Percent of Final Grade: 20%Deliverables: A PDF containing two major sections: 1) Abuse Research Report (2000-4000 words) 2) Policy Comparison Table Submission*:* This first milestone will be completed independently. CS152 students should upload the document to their Canvas site and POLISCI143 students should upload the document to their Canvas site. Note: The abuse type you choose to focus on for this milestone may or may not be the abuse type your group focuses on for milestones 2 and 3.Description: The execs recently came to your team and tasked you with looking into a significant type of online abuse, researching the current best options available for dealing with such abuse, and making specific recommendations to the company of how to detect and mitigate it. <br>In writing this paper, please use citations (<a data-tooltip-position="top" aria-label="https://undergrad.stanford.edu/tutoring-support/hume-center/resources/student-resources/documentation-and-citation-resources-writers" rel="noopener nofollow" class="external-link is-unresolved" href="https://undergrad.stanford.edu/tutoring-support/hume-center/resources/student-resources/documentation-and-citation-resources-writers" target="_self">choose your preferred style</a>) for factual information and feel free to add your own original interpretations and suggestions. Please follow the structure below for the paper overall, but you are welcome to add additional information in whatever sections you see fit. You are also welcome to use graphics, charts, or diagrams as long as you cite the original source. Example Abuse Types: You are welcome to write about any of these abuse types or one of equivalent importance. If you want to move away from topics that are covered in the syllabus, please check in with the teaching team first. Suicides driven by bullying on Instagram Murder-Suicides on Facebook Live Live streaming of terrorist attacks Government propaganda against domestic minorities Disinformation in online ads Coordinated harassment of journalists on Twitter Sextortion (on many platforms) Trading of Child Sexual Abuse Materials (on many platforms) Online cryptocurrency scams on Twitter Hate speech on a streaming platform Terrorist recruitment on Twitter Fraudulent identification on Airbnb Grooming on Snapchat or Instagram Catfishing on dating apps
A note on choosing abuse type: You may end up working with this abuse type for the entire quarter, so we encourage you to pick a topic that you care about. Note that when you later work to implement technological solutions we will use stand-ins for illegal and/or very harmful material (e.g. pictures of kittens instead of CSAM). Don’t let the potential technical challenge of a topic scare you away from tackling it; your milestones will be graded on effort and thoughtfulness, not whether or not you’re able to effectively solve these problems. They’re still problems in the real world because they’re quite hard, after all!Required Sections: Description of the Abuse Type - Provide a summary of the kind of abuse, including citations to known examples. This can include linking to reporting in the media, academic research, talks by professionals, or links to legal documents like indictments. Actor and Victim - What do you know about the people behind this kind of abuse? How about the victims? Is this something anybody can experience, or is the abuse tied to a specific part of the victim's identity? Are there forums or other platforms where these kinds of abusers congregate and can we learn more about them? In-Depth Profile Piece on an Actor or Victim - Research someone who has personal experience (as an actor, victim, bystander, or content moderator) with the abuse type you are analyzing. What are their experiences with this type of abuse? How has the abuse shaped or influenced their post-abuse experience (if at all)? What do they have to say about the way content moderation on major tech platforms should handle this type of abuse? Include these findings in your final paper. You will receive extra credit on this section if you are able to do an “in-person” (remote) interview with a real person who experienced (or perpetrated) this kind of abuse. Details of the Abuse - Describe the immutable aspects of this kind of abuse and how they could be detected. What might differ between attackers and victims? Dive into at least one real-world example and pull out specific moments at which the abuse could be detected or mitigated. Relevant Technologies - What technologies currently exist that are/can be used to combat this kind of abuse? What are their strengths and weaknesses? On what platforms are they used now and to what levels of success? Specific Recommendations - What policy, product, engineering, or operational changes do you recommend to deal with this type of abuse? Length: 2000-4000 words. We will penalize papers that exceed this word limit.<br>Create a policy table outlining what platform policies are currently in effect that relate to this kind of abuse. Compile language, pulled from the policies of other platforms, that you think is relevant or appropriate. Please do this for three platforms. An example table for an investigation into coordinated harassment of journalists on Twitter is below, but you may adjust as appropriate for different abuse types. Think critically about what types of columns you think are theoretically important. Please make sure all table cells that reference policy hyperlink to the exact website. Note that there may be cases where you need to write “unclear.” That’s fine, just justify the reasoning. You can see other examples in Figure 1 <a data-tooltip-position="top" aria-label="https://www.eipartnership.net/policy-analysis/platform-policies" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.eipartnership.net/policy-analysis/platform-policies" target="_self">here,</a> Table 1 <a data-tooltip-position="top" aria-label="https://www.eipartnership.net/policy-analysis/evaluating-transparency-in-platform-political-advertising-policies" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.eipartnership.net/policy-analysis/evaluating-transparency-in-platform-political-advertising-policies" target="_self">here,</a> and <a data-tooltip-position="top" aria-label="https://stacks.stanford.edu/file/druid:tm443wf7913/20210408-Self-Harm-Policies-And-Internet-Platforms.pdf" rel="noopener nofollow" class="external-link is-unresolved" href="https://stacks.stanford.edu/file/druid:tm443wf7913/20210408-Self-Harm-Policies-And-Internet-Platforms.pdf" target="_self">here</a>. In addition to the table, in two or three paragraphs, please explore any research on the effects of these policies, including whether they help mitigate the abuse or enable it. Coordinated harassment of journalists on Twitter (Note: this table includes made up data) Note to instructors: Including this example table will lead many students to focus on this exact abuse type.Percent of Final Grade: 20%This milestone has 3 components and 4 deliverables:
Design a user reporting flow and a behind-the-scenes moderator flow Deliverable: A user reporting flow and a behind-the-scenes moderator flow, in pdf format Implement these flows into your Discord bot Deliverable: All code files from your backend implementation checked into a forked Github repository (submit the link to your repo) Deliverable: Short video (around 6 minutes) demoing your bot’s functionality + discussing examples Writeup Deliverable: Writeup about the work you’ve done so far to handle your specific abuse type (~500 words) Submission*:* Please have one group member from CS152 upload all documents to the Canvas CS152 site before the deadline. Make sure to include your team number in the name of each document.For milestone 3 you will be asked to make your bot “smart,” e.g. train a classifier on your abuse type. For this milestone, milestone 2, it is fine if the back end moderator flow is manual. Select an abuse type to focus on for the group project. This should probably be an abuse type investigated by a member of your group for milestone 1, so that you are starting with a good understanding of the abuse and potential mitigations. While choosing an abuse type to focus on, please keep in mind that milestone 3 will ask you to create some kind of automated detection mechanism for that abuse type, so choose something for which automated detection is at least tractable. You are allowed to choose an abuse type where realistic testing is illegal or would be extremely difficult for the team, such as the trading of child sexual abuse material. In these cases, you will use a type of proxy content to train and test your detection system. For example, we have used “naked” photos of kittens (versus adult cats) as stand-ins for CSAM detection in the past. If you want to scan for extremely violent videos, you can use cartoon violence instead of content such as beheading videos. For topics where testing might require the use of upsetting content, such as racist language or misogynistic threats, make sure that the entire team is ok with the topic before proceeding with milestone 2.Your group will design two reporting flows: one based on user reports and the other a behind-the-scenes flow for content moderators. You will implement both these flows (within Discord’s constraints) in your bot for this milestone. The best way to represent these flows is using a flowchart that includes both the user action and the system’s response. The reporting flow should have a variety of abuse types at the highest level, but the more detailed flow (after the first or second prompt) only needs to be built out for your specific abuse type. An example of a User Reporting Flow focused on hate and harassment that received a high score last year is included here, however as noted above you only need to show a variety of abuse types at the highest level. <br><img alt="Assets/Pasted image 20260419103604.png" src="assets/pasted-image-20260419103604.png" target="_self">Some additional examples from industry:
<br>Facebook: <a rel="noopener nofollow" class="external-link is-unresolved" href="https://blog.heyo.com/wp-content/uploads/2012/06/FB-Reporting-Guide.png" target="_self">https://blog.heyo.com/wp-content/uploads/2012/06/FB-Reporting-Guide.png</a> <br>Twitter: <a rel="noopener nofollow" class="external-link is-unresolved" href="https://help.twitter.com/en/safety-and-security/report-abusive-behavior" target="_self">https://help.twitter.com/en/safety-and-security/report-abusive-behavior</a> Your user reporting flow should outline the process that a user is taken through when they attempt to report an instance of your abuse type on your platform. It should do the following:
Offer users the ability to specify the detailed type of abuse Note steps of the process that require review (automated or manual) Clearly identify potential outcomes of a report (nothing, post is removed, shadow block, etc.)
Your manual review flow should outline the process that a content reviewer goes through when they review a piece of content submitted by a user as abusive using the flow you just created. It should do the following:
Handle reports coming both from users and automated flagging (though automated flagging need not be complete until milestone 3) Outline the manual review process of flagged messages - what options are given to reviewers? What information do they have access to? Make sure to clearly identify potential outcomes of a report (nothing, post is removed, user is banned, etc.) Are there multiple levels of reviewers? Are there situations where a first-tier content reviewer can engage their management or a specialized investigations team?
Some questions you should think about when designing your flows:
How many steps should there be? How does this balance warding off malicious reporters/spammy reporters while still encouraging real reports? How specific should the options be? What’s the tradeoff between offering many different options and only a few? How will this affect user experience? What characteristics make content able to be moderated automatically, and what content should go through human review? In a perfect world, what outcomes might exist to help keep users safe? (E.g. shadow blocking, user rehabilitation programs, etc.) How can you work those ideas into the flows?
Your company’s T&amp;S-minded CEO has approved your reporting flows and asked that you implement them for this abuse type, commending you on the thought that went into them. Success! She gives you the green light to build out a skeleton of the system for some A/B testing with real users.You will design and implement a reporting flow within the context of Discord. We are using Discord for the class project because their bot framework is very powerful; many communities build the functionality they want on top of Discord by using bots to greet users as they join servers, auto-respond to messages, moderate chat/ban swear words, and much more. Discord bots are written in asynchronous Python - don’t worry if you haven’t worked with it before, it shouldn’t be too hard to pick up (and the TAs are here to help!). Please consult the Discord Bot Setup Guide at the bottom of this doc. Your group will be given two pre-configured channels: group-# for general chatting and group-#-mod to serve as a back-end for human moderators (from now on we’ll call them the “main” channel and the “mod” channel). User generated content will go in the main channel, and that content will either be manually reported or automatically flagged by your bot and potentially be sent to the mod channel for human review.Although we have given you the setup of these two channels, you are not necessarily restricted to the context of a group chat! It is up to you to specify what context your moderation tools exist in, and style your main group-# channel accordingly. For example, you could say it is a feed of content that one specific user is scrolling through (e.g. Instagram/Twitter style), or say the channel is actually a DM between two users. There’s a lot of flexibility here, and as long as you’re clear about what you imagine and how you are simulating that within Discord, you’re welcome to adapt in whatever way makes sense for the abuse type you have chosen. Please reach out to the TAs if you have ideas that you aren’t sure how to adapt; they can also potentially give you additional channels if it would be helpful. For this part, your bot should be able to do the following:
Allow users to report content (and/or other users) and follow your reporting flows to completion, including outcomes. Note: if your outcomes include banning users, you can just simulate that with messages; we won’t be banning from the 152 server in the interest of keeping it functional. Allow moderators to moderate reported content using your moderation flow and implement the outcomes to their completion (with the exception of banning users, which you can simulate with a message). A note on sensitive content here; the TAs can see all messages sent in your group channels, but we won’t be actively monitoring them. In order to properly do this milestone, you may have to engage with perverse and hateful content; that is the nature of this kind of abuse-fighting. It is important that you test your bots, but this is not an excuse to behave inappropriately to other students, so make sure that it is clear to your team when you are testing and don’t actually target individuals in a way that might be emotionally harmful. Please take care of yourselves and each other, and reach out to the teaching team if you’re finding that this work places any undue emotional burden.For this milestone we have given you the skeleton of a moderation bot with the following capabilities:
Automatically forward every message to the mod channel Allow users to report messages from the main channel (reports are initiated via DMs)
Please set up your bot within the first week of the assignment going out so we can catch any potential problems. The TAs will be ready to answer any questions and help debug during section!For this part of the milestone you will be submitting a short (up to 6 minute) video demoing all of the functionality of your Discord bot as well as talking through your edge cases. Be sure to begin with a clear description of the context you’ve chosen for your channels.Below are some resources we think might be useful to you for this part of the milestone. <br><a data-tooltip-position="top" aria-label="https://discordpy.readthedocs.io/en/latest/" rel="noopener nofollow" class="external-link is-unresolved" href="https://discordpy.readthedocs.io/en/latest/" target="_self">Here is the documentation for discord.py</a>, Discord’s python package for writing Discord bots. It’s very thorough and fairly readable; this plus google (in addition to the TAs) should be able to answer all of your functionality questions!<br>Discord bots frequently use emoji reactions as a quick way to offer users a few choices - this is especially convenient in a setting like moderation when mods may have to make potentially many consecutive choices. Check out <a data-tooltip-position="top" aria-label="https://discordpy.readthedocs.io/en/latest/api.html?highlight=on_reaction_add#discord.on_raw_reaction_add" rel="noopener nofollow" class="external-link is-unresolved" href="https://discordpy.readthedocs.io/en/latest/api.html?highlight=on_reaction_add#discord.on_raw_reaction_add" target="_self">on_raw_reaction_add()</a> for documentation about how to do this with your bot. You also might want to look into <a data-tooltip-position="top" aria-label="https://discordpy.readthedocs.io/en/latest/api.html?highlight=edit#discord.on_raw_message_edit" rel="noopener nofollow" class="external-link is-unresolved" href="https://discordpy.readthedocs.io/en/latest/api.html?highlight=edit#discord.on_raw_message_edit" target="_self">on_raw_message_edit()</a> to notice users editing old messages.<br>Discord offers “embeds” as a way of getting a little more control over message formatting. Read more about them in <a data-tooltip-position="top" aria-label="https://python.plainenglish.io/send-an-embed-with-a-discord-bot-in-python-61d34c711046" rel="noopener nofollow" class="external-link is-unresolved" href="https://python.plainenglish.io/send-an-embed-with-a-discord-bot-in-python-61d34c711046" target="_self">this article</a> or in the <a data-tooltip-position="top" aria-label="https://discordpy.readthedocs.io/en/latest/api.html?highlight=embeds#discord.Embed" rel="noopener nofollow" class="external-link is-unresolved" href="https://discordpy.readthedocs.io/en/latest/api.html?highlight=embeds#discord.Embed" target="_self">docs</a>.<br><a data-tooltip-position="top" aria-label="https://pypi.org/project/Unidecode/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pypi.org/project/Unidecode/" target="_self">unidecode</a> and <a data-tooltip-position="top" aria-label="https://pypi.org/project/uni2ascii-janin/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pypi.org/project/uni2ascii-janin/" target="_self">uni2ascii-janin</a> are two packages which can help with translating unicode characters to their ascii equivalents.In approximately 500 words, summarize:
Your abuse type An explanation of the user reporting flow and behind-the-scenes moderator flow, and the rationale for the decisions you made
For this milestone your group will be making your very own Discord bot. Discord bots are implemented in Python (or Javascript) - don’t stress if you haven’t written Python before! It’s a pretty readable language, so you should be able to pick it up as you go, and the TAs are always here to help.<br>If you’re not familiar with Discord, that’s totally okay! <a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=rnYGrq95ezA" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=rnYGrq95ezA" target="_self">Check out this short video</a> which overviews Discord’s features and quirks.First, every member of the team (both CS and POLISCI) should join the Discord server using this invite link: [insert link here]<br>Discord can be used in your web browser, although most people prefer the <a data-tooltip-position="top" aria-label="https://discord.com/download" rel="noopener nofollow" class="external-link is-unresolved" href="https://discord.com/download" target="_self">thick client apps</a>.For the next two milestones, you and your group will have two channels to test and develop your bot in: group-#, and group-#-mod, where # is your group’s number. We will give you and your bot a special role such that only you and the staff can see those channels; that way, everyone will have their own small workspace. To get the role for your group, click on the TA Bot user to bring up this window.
Type in: .join # where # is replaced by your group number. <br><img alt="Assets/Pasted image 20260419103820.png" src="assets/pasted-image-20260419103820.png" target="_self">If all goes according to plan, you should receive a message back saying that you have been given a role corresponding to your group number and you should see a new role on your user in the server.Additionally, you should be able to see two new channels under one of the “Group Channels” categories:<br><img alt="Assets/Pasted image 20260419103845.png" src="assets/pasted-image-20260419103845.png" target="_self" style="width: 300px; max-width: 100%;">If you accidentally join the wrong group, just message the TA Bot .leave # to have the role removed and leave those channels. Please let [TA] know if something goes awry in this process! Note: only ONE student per group should follow the rest of these steps.<br>Fork and clone the GitHub repository <a data-tooltip-position="top" aria-label="https://github.com/stanfordio/cs152bots" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/stanfordio/cs152bots" target="_self">here</a>. For instructions on how to fork a github repo, <a data-tooltip-position="top" aria-label="https://docs.github.com/en/get-started/quickstart/fork-a-repo" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.github.com/en/get-started/quickstart/fork-a-repo" target="_self">see this article</a>. In order for your group to be able to collaborate effectively on this project, we recommend you create a shared GitHub repository; when you do, make sure you use the .gitignore file included in the starter code so that you don’t accidentally upload your tokens to GitHub. Our GitHub repository already has tokens.json in its .gitignore file. When you clone your project from there, you will have to create your own tokens.json file in the same folder as your bot.py file. The tokens.json file should look like this, replacing the “your key here” with your key. In the below sections, we explain how to obtain Discord keys.<br>The first thing you’ll want to do is make the bot. To do that, log in to <a rel="noopener nofollow" class="external-link is-unresolved" href="https://discord.com/developers" target="_self">https://discord.com/developers</a> and click “New Application” in the top right corner. <br><img alt="Assets/Pasted image 20260419103918.png" src="assets/pasted-image-20260419103918.png" target="_self">Name your application Group # Bot, where # is replaced with your group number. So, for instance, Group 0 would name their bot like so: <br><img alt="Assets/Pasted image 20260419103952.png" src="assets/pasted-image-20260419103952.png" target="_self" style="width: 500px; max-width: 100%;">It is very important that you name your bot exactly following this scheme; some parts of the bot’s code rely on this format.
Next, you’ll want to click on the tab labeled “Bot” under “Settings.” Click “Copy” to copy the bot’s token. If you don’t see “Copy”, hit “Reset Token” and copy the token that appears (make sure you’re the first team member to go through these steps!) Open tokens.json and paste the token between the quotes on the line labeled “discord”. Scroll down to a region called “Privileged Gateway Intents” Tick the options for “Presence Intent”, “Server Members Intent”, and “Message Content Intent”, and save your changes. See the image for what it should look like.!<br>
<img alt="Assets/Pasted image 20260419104034.png" src="assets/pasted-image-20260419104034.png" target="_self">
An aside: It’s unsafe to embed API keys in your code directly. If you put that code on GitHub, then anyone could find and use that key! (GitHub actually tries to detect code like this and forbids programmers from uploading it.) That’s why we’re storing them in a separate file which can be ignored by version control software.
Next, we’ll add the bot to the 152 Discord server! You’ll need to generate a link that the teaching team can use to invite your bot.
Click on the tab labeled “OAuth2” under “Settings” Click the tab labeled “URL Generator” under “OAuth2”. Check the box labeled “bot”. Once you do that, another area with a bunch of options should appear lower down on the page. Check these permissions, then copy the link that’s generated.<br>
<img alt="Assets/Pasted image 20260419104103.png" src="assets/pasted-image-20260419104103.png" target="_self"> Send that link to any of the TAs via Discord (or by email) - they will use it to add your bot to the server. Once they do, your bot will appear in the #general channel and will be a part of the server!
Note that these permissions are just a starting point for your bot. We think they’ll cover most cases, but it’s entirely possible you’ll run into cases where you want to be able to do more. If you do, you’re welcome to send updated links to the teaching team to re-invite your bot with new permissions. First things first, the starter code is written in Python. You’ll want to make sure that you have Python 3 installed on your machine. Alternatively, you can use a text editor of your choice.<br>Once you’ve done that, open a terminal in the same folder as your bot.py file. (If you haven’t used your terminal before, check out <a data-tooltip-position="top" aria-label="https://www.macworld.com/article/2042378/master-the-command-line-navigating-files-and-folders.html" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.macworld.com/article/2042378/master-the-command-line-navigating-files-and-folders.html" target="_self">this guide</a>!)You’ll need to install some libraries if you don’t have them already, namely:\# python3 \-m pip install requests \# python3 \-m pip install discord.py
Next up, let’s take a look at what bot.py already does. To do this, run bot.py and leave it running in your terminal. Next, go into your team’s private group-# channel and try typing any message. You should see something like this pop up in the group-#-mod channel:<br><img alt="Assets/Pasted image 20260419104128.png" src="assets/pasted-image-20260419104128.png" target="_self">The default behavior of the bot is, any time it sees a message (from a user), it sends that message to the moderator channel with no possible actions. This is obviously not the final behavior you’ll want for your bot - you should update this to match your report flow. However, the infrastructure is there for your bot to automatically flag messages and (potentially) moderate them somehow.Next up, click on your app in the right sidebar under “Online” to begin direct messaging it (or click on its name). First of all, try sending “help”. Try following its instructions from there by reporting a message from one of the channels to get a sense for the reporting flow that’s already built out for you. (Make sure to only report messages from channels that the bot is also in.)If you look through the starter code, you’ll see the beginnings of the reporting flow that are already there. It will be up to you to build that out in whatever way your group decides is best. You’re welcome to edit any part of the starter code you’d like if you want to change what’s already there - we encourage it! This is just meant to be a starting point that you can pattern match off of.If you’re not familiar with Python and asynchronous programming, please come to a section for an introduction. The TAs are happy to walk you through the starter code and explain anything that’s unclear.If you’re seeing this error, it probably means that your terminal is not open in the right folder. Make sure that it is open inside the folder that contains bot.py and tokens.json. You can check this by typing in ls and verifying that the output looks something like this:\# ls bot.py tokens.json
Discord has a slight incompatibility with Python3 on Mac. To solve this, navigate to your /Applications/Python 3.6/ folder and double click the Install Certificates.command. Try running the bot again; it should be able to connect now. If you’re still having trouble, try running a different version of Python (i.e. use the command python3.7 or python3.8) instead. If that doesn’t work, come to section and we’ll be happy to help!This is an issue with the version of Discord API that is installed. Try the following steps: running pip install --upgrade discord in the terminal in your folder in the project that contains this file IF that does not work, try changing the line in bot.py that says intents.message_content = True to intents.messages = True
Percent of Final Grade: 30%Deliverables and due date: Poster, presented in person, on Wednesday, June 7 from 6-8pm. Your poster should be completely set up by 5:30pm. You can arrive as early as 5:00pm. You should have a video showing your bot’s functionality that you can play for judges. We recommend having this video on a tablet, as the space lacks tables and outlets. A PDF of your final poster and all code files are due on Tuesday, June 6 at 11:59pm PT. Submission*:* Please have one group member upload the poster PDF and video to the CS 152 Canvas site. Your code files should be in the Github repository submitted for Milestone 2.Please fill out the work distribution survey to assess how equally different members of each group contributed to the project. We reserve the right to change grades based on the results.Your bot will be responsible for handling both manual reports from users (which you implemented in Milestone 2) as well as automatically detecting and flagging abusive content (the primary goal for this Milestone). This will include finding or collecting a dataset of examples to use to evaluate the efficacy of your solutions. Here are some examples of what we envision you accomplishing for this milestone. Please note that this is not a checklist of things you must accomplish, just ideas.
Training a classifier Using your collected dataset to test a few publicly available packages or APIs and noting their pros/cons We have provided instructions on how to use a few different APIs at the bottom of this document Designing and building a robust backend for logging and maintaining per-user statistics Building a framework by which communities can specify their own regex-like rules for content they don’t want to see Creating a tool that automatically detects and visits all links in a piece of text to see if they host undesirable content Building a system by which users can outsource unwanted content to their friends for review
<br>To handle multiple languages, you can use packages like <a data-tooltip-position="top" aria-label="https://pypi.org/project/google-trans-new/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pypi.org/project/google-trans-new/" target="_self">google_trans_new</a> to automatically translate everything to English or <a data-tooltip-position="top" aria-label="https://pypi.org/project/langdetect/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pypi.org/project/langdetect/" target="_self">langdetect</a> for language detection (make sure to rate limit yourself where necessary).<br>You should develop and implement a strategy for evaluating the efficacy of your back-end solution. If you trained a classifier or utilized external APIs, this might look like utilizing your dataset to generate a <a data-tooltip-position="top" aria-label="https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/" target="_self">confusion matrix</a> and figuring out whether your model is over or under sensitive and what kinds of problems this might cause at scale. If your solution is more user or design focused, you could conduct further user studies; for instance, you could invite friends to interact with the bot to assess the additional functionality and identify cases in which your design is clunky or might not scale well. What scenarios does your bot handle effectively? What scenarios does your bot not handle as effectively? What explains your bot’s strengths and weaknesses? With more time and resources, what would be some of your next steps? A key final deliverable for this milestone will be a poster that you’ll display and discuss with your platform’s “executives” (the teaching team as well as guest industry judges who will stop by your poster during the poster session). You will likely want to explain how your platform currently handles your specific type of abuse (your back-end solution), and address its strengths and shortcomings, leading to clear and specific recommendations for how the platform should move forward. You will also want to answer questions from the “executives.” We encourage you all to think creatively about how to communicate your work! We encourage your group to have a prepared 5 minute pitch, and to make sure all group members who are present have the chance to participate in the pitch. Your poster should include:
Problem Description Policy Language Technical Back-end Evaluation Looking Forward
More details on these components are below. Give a short description of your group’s abuse type and victim profile. You can assume that people viewing your poster have a general awareness of your abuse type.Create a written policy in the kind of language you have seen from the community standards and terms of service you have seen that is specifically targeted at your abuse type. The policy should be less than 400 words and understandable by a normal user.Discuss the original goals and final state of your back-end technology in more detail, explaining the work you did to build it and what its current capabilities are. If there are things you tried which didn’t make it into the final product, be sure to mention them here, along with the reasoning behind not including them. Provide a clear analysis of how well your group’s back-end technology accomplishes what it originally set out to do. Make sure to address both the successes and the shortcomings of your current solution. Discuss any negative unintended consequences you foresee and which users may be more affected by them. Try to think about what critics/stakeholders would say about your technology. Discuss what impact you believe implementation would have on platform safety. Discuss other engineering approaches that your group didn’t pursue but that you’d want to propose going forward.You can print your poster however you’d like. The poster session will have corkboards where you can pin up your poster without a rigid backing. We recommend that your poster be 24”x36”. It should not be larger than that, as we have limited space. <br>Stanford Undergraduate Research has tips for creating good posters <a data-tooltip-position="top" aria-label="https://undergradresearch.stanford.edu/share/surps-asurps/make-good-poster" rel="noopener nofollow" class="external-link is-unresolved" href="https://undergradresearch.stanford.edu/share/surps-asurps/make-good-poster" target="_self">here</a>. We encourage you to avoid having a lot of text on your poster; hit the points you need to make, but keep the poster readable. You are going to want to show off your excellent reporting system to our judges, friends and family! Please record a short demo video that you can show to judges and narrate. We encourage the political science student to provide strong support for the evaluation component of this milestone, for example by leading a rigorous qualitative evaluation of the bot to supplement a quantitative evaluation. The political science student could generate a typology of abuse manifestations, and evaluate how the bot performs on them. The political science student could recruit testers and do a deep dive into why the bot did or did not respond appropriately to the testers. These are just ideas, think creatively about this! We also encourage the political science student to lead the “policy language” section. The political science student could also assist with dataset creation.]]></description><link>assignments/assignment-1-discord-bot.html</link><guid isPermaLink="false">Assignments/Assignment 1 - Discord Bot.md</guid><pubDate>Mon, 27 Apr 2026 21:09:36 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Social Network Analysis Syllabus (Advanced Topic)]]></title><description><![CDATA[This course develops social network analysis (SNA) as both a set of quantitative methods and a lens for studying adversarial behavior, information spread, and influence in complex systems. Lectures progress from foundational concepts illustrated through animal behavior, through core computational methods, into applied analysis of information operations, and finally into simulation-based modeling of social dynamics.For a broader overview of the Trust and Safety space, see the <a data-tooltip-position="top" aria-label="Course Overview" data-href="Course Overview" href="course-overview.html" class="internal-link" target="_self" rel="noopener nofollow">Trust &amp; Safety class</a>.Lecture 1: Animal Sociality and SNA Fundamentals
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/01. sna_animal_networks.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/01. sna_animal_networks.pdf" href="advanced-topics/social-network-analysis/slides/01.-sna_animal_networks.html" class="internal-link" target="_self" rel="noopener nofollow">sna_animal_networks</a>
Uses animal social networks as a politically neutral entry point to introduce core SNA vocabulary: nodes, edges, weighted networks, directed vs. undirected graphs, assortative mixing, and homophily ("birds of a feather flock together"). Case studies draw from marmot fieldwork and broader ethology literature. Establishes foundational intuitions before applying methods to human social media networks.
Lecture 2: Animal Network Robustness and Node Removal
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/02. sna_animal_network_robustness.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/02. sna_animal_network_robustness.pdf" href="advanced-topics/social-network-analysis/slides/02.-sna_animal_network_robustness.html" class="internal-link" target="_self" rel="noopener nofollow">sna_animal_network_robustness</a>
Explores what happens when individuals are removed from a network — strategically or randomly — using both animal and information network examples. Introduces node-level metrics (degree, betweenness centrality) and network-level metrics (connectedness, fragmentation). Applies these concepts to misinformation source rankings, illustrating how targeted interventions affect information flow. Lays groundwork for intervention analysis in later lectures.
Lecture 3: Community Detection
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/03. sna_community_detection.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/03. sna_community_detection.pdf" href="advanced-topics/social-network-analysis/slides/03.-sna_community_detection.html" class="internal-link" target="_self" rel="noopener nofollow">sna_community_detection</a>
Core methodology lecture. Covers clustering coefficient, modularity maximization, the Louvain algorithm, and CONCOR (convergence of iterated correlations). Framed around a "Locate Groups" report assignment. Students learn to identify cohesive subgroups and evaluate the quality of detected communities using modularity scores.
Lecture 4: Stance Detection via Label Propagation
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/04. sna_stance_detection.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/04. sna_stance_detection.pdf" href="advanced-topics/social-network-analysis/slides/04.-sna_stance_detection.html" class="internal-link" target="_self" rel="noopener nofollow">sna_stance_detection</a>
Applied method that builds directly on community structure from Lecture 3. Introduces stance detection using hashtag-seeded label propagation over retweet networks. Covers how stance labels spread from users to hashtags and back, general label propagation algorithms, confidence calibration, and the choice between propagation strategies. Optional extension covers text-based stance detection.
Lecture 5: Information Operations
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/05. sna_information_operations.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/05. sna_information_operations.pdf" href="advanced-topics/social-network-analysis/slides/05.-sna_information_operations.html" class="internal-link" target="_self" rel="noopener nofollow">sna_information_operations</a>
Conceptual overview of adversarial social behavior using the BEND framework (Boost, Engage, Neutralize, Distort). Connects community structure and stance to coordinated inauthentic behavior. Discusses dynamic multi-agent scenarios in which adversarial actors attempt to shift population-level stance. Sets up Lecture 6's detection approach.
Lecture 6: Information Operations Detection
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/06. sna_information_operations_detection.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/06. sna_information_operations_detection.pdf" href="advanced-topics/social-network-analysis/slides/06.-sna_information_operations_detection.html" class="internal-link" target="_self" rel="noopener nofollow">sna_information_operations_detection</a>
Technical case study of detection using paid link schemes and SEO manipulation as the adversarial domain. Covers how to identify coordinated link schemes, distinguish paid from organic linking, and use LLMs to label the political bias of news sites at scale (case study: Iranian news network). Draws on SEO network construction and classification methods introduced in Lectures 2 and 4.
Lecture 7: Social Influence Modeling
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/07. sna_social_influence_modeling.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/07. sna_social_influence_modeling.pdf" href="advanced-topics/social-network-analysis/slides/07.-sna_social_influence_modeling.html" class="internal-link" target="_self" rel="noopener nofollow">sna_social_influence_modeling</a>
Introduces agent-based modeling (ABM) as a complement to network analysis. Develops a co-evolutionary stance-influence model where network structure and agent opinions update simultaneously. Key findings: minority stances exhibit tipping points around 25% adoption; optimal confederates target local ego-networks rather than global hubs. Discusses validation against real data and recovery from polarized states.
Lecture 8: Information Diffusion and Population Modeling
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Social Network Analysis/Slides/08. sna_population_modeling.pdf" data-href="Advanced Topics/Social Network Analysis/Slides/08. sna_population_modeling.pdf" href="advanced-topics/social-network-analysis/slides/08.-sna_population_modeling.html" class="internal-link" target="_self" rel="noopener nofollow">sna_population_modeling</a>
Broadest lens in the course. Applies epidemiological-style population models (SEIRM, Friedkin social influence model) to information spread and opinion dynamics. Multi-agent scenarios allow virtual experiments: given an observed information environment, what policies lead a population toward a desired trajectory? LLMs are introduced as tools for constructing action distributions and translating between the agent-level model and the real information environment.
<br><a data-tooltip-position="top" aria-label="https://docs.google.com/presentation/d/19qc59Lw66TlykabIZBQVpmHvbeATkJ1o7iHynjMI454/edit?usp=sharing" rel="noopener nofollow" class="external-link is-unresolved" href="https://docs.google.com/presentation/d/19qc59Lw66TlykabIZBQVpmHvbeATkJ1o7iHynjMI454/edit?usp=sharing" target="_self">Slides</a><br>
<a data-tooltip-position="top" aria-label="https://www.zotero.org/groups/5743793/social_llms/library" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.zotero.org/groups/5743793/social_llms/library" target="_self">Zotero library</a>]]></description><link>advanced-topics/social-network-analysis/social-network-analysis-syllabus-(advanced-topic).html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Social Network Analysis Syllabus (Advanced Topic).md</guid><pubDate>Mon, 27 Apr 2026 20:46:15 GMT</pubDate></item><item><title><![CDATA[Adversarial Retrieval and LLMs Syllabus (Advanced Topic)]]></title><description><![CDATA[This course examines how large language models handle — and fail to handle — factual knowledge, and how adversaries exploit these failure modes in information retrieval and generation systems. Lectures are organized in four modules that move from internal model mechanics outward to ecosystem-level attacks.For a broader overview of the Trust and Safety space, see the <a data-tooltip-position="top" aria-label="Course Overview" data-href="Course Overview" href="course-overview.html" class="internal-link" target="_self" rel="noopener nofollow">Trust &amp; Safety class</a>.<br>Before taking the class, read through the <a data-href="LLM background literature" href="advanced-topics/adversarial-retrieval-and-llms/llm-background-literature.html" class="internal-link" target="_self" rel="noopener nofollow">LLM background literature</a> (considered pre-requisite)Lecture 1: Memorization, Generalization, and Specialization in LLMs
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Adversarial Retrieval and LLMs/Slides/01. Memorization, Generalization, and Specialization in LLMs.pdf" data-href="Advanced Topics/Adversarial Retrieval and LLMs/Slides/01. Memorization, Generalization, and Specialization in LLMs.pdf" href="advanced-topics/adversarial-retrieval-and-llms/slides/01.-memorization,-generalization,-and-specialization-in-llms.html" class="internal-link" target="_self" rel="noopener nofollow">Memorization, Generalization, and Specialization in LLMs</a>
Introduces the core tension at the heart of the course: LLMs memorize training data (enabling recall but risking privacy leakage and stale knowledge) while also generalizing (enabling zero-shot tasks but introducing hallucinations). Covers finetuning vs. zero-shot prompting on QA tasks, the VoLTA vision-language model as an extended case, and why retrieval-augmented generation (RAG) reduces memorization-driven errors. Establishes the vocabulary for lectures 2–4.
Lecture 2: LLM Hallucinations and Knowledge Conflicts
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Adversarial Retrieval and LLMs/Slides/02. LLM Hallucinations and Knowledge Conflicts.pdf" data-href="Advanced Topics/Adversarial Retrieval and LLMs/Slides/02. LLM Hallucinations and Knowledge Conflicts.pdf" href="advanced-topics/adversarial-retrieval-and-llms/slides/02.-llm-hallucinations-and-knowledge-conflicts.html" class="internal-link" target="_self" rel="noopener nofollow">LLM Hallucinations and Knowledge Conflicts</a>
Deepens the hallucination picture by distinguishing faithfulness hallucinations (model contradicts its context) from factuality hallucinations (model contradicts the world). Introduces knowledge conflicts — situations where parametric knowledge, retrieved knowledge, and real-world facts diverge — and discusses how RLHF safety tuning interacts with faithfulness. Covers entity substitution frameworks and conflict-inducing dataset construction.
Lecture 3: Adversarial Adaptation in Information Systems
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Adversarial Retrieval and LLMs/Slides/03. Adversarial Adaptation In Information Systems.pdf" data-href="Advanced Topics/Adversarial Retrieval and LLMs/Slides/03. Adversarial Adaptation In Information Systems.pdf" href="advanced-topics/adversarial-retrieval-and-llms/slides/03.-adversarial-adaptation-in-information-systems.html" class="internal-link" target="_self" rel="noopener nofollow">Adversarial Adaptation In Information Systems</a>
Broadens scope from model internals to the adversarial information ecosystem. Uses a "means, motives, and opportunities" framework to analyze how actors adapt content to manipulate search rankings, social platforms, and recommendation systems. Covers SEO manipulation, social bot adaptation, memorialization hacking, and the trustworthiness/pluralism tradeoff that constrains platform interventions. This lecture is the conceptual bridge between the model-focused material (Lectures 1–2) and the attack-focused material (Lecture 4).
Lecture 4: Adversarial Attacks on IR Systems
<br>Source: <a data-tooltip-position="top" aria-label="Advanced Topics/Adversarial Retrieval and LLMs/Slides/04. Adversarial Attacks on IR Systems.pdf" data-href="Advanced Topics/Adversarial Retrieval and LLMs/Slides/04. Adversarial Attacks on IR Systems.pdf" href="advanced-topics/adversarial-retrieval-and-llms/slides/04.-adversarial-attacks-on-ir-systems.html" class="internal-link" target="_self" rel="noopener nofollow">Adversarial Attacks on IR Systems</a>
Catalogues specific technical attacks against information retrieval systems: malicious text and image encoding, gradient-based multi-view topic attacks, poisoned corpus attacks, and RAG-specific poisoning. Applies the means/motives/opportunities framework to SEO attack vectors, including evidence that unreliable news sites are disproportionately linked by paid schemes. Covers the AREA (Adversarial REtrieval Attack) literature.
<br>Tutorial: <a data-href="How Do I Make a Good Classifier" href="advanced-topics/adversarial-retrieval-and-llms/tutorials/how-do-i-make-a-good-classifier.html" class="internal-link" target="_self" rel="noopener nofollow">How Do I Make a Good Classifier</a>?
A Python-focused practical guide to binary classification: data collection, annotation and inter-rater reliability (Cohen's Kappa, Fleiss' Kappa, Krippendorff's Alpha), preprocessing, class imbalance handling, model selection, hyperparameter tuning, and evaluation (precision, recall, F1, ROC-AUC). Best delivered as a lab session after Lecture 2, when students have encountered hallucination/conflict classification tasks in context.<br><a data-href="AI Worksheets" href="advanced-topics/adversarial-retrieval-and-llms/tutorials/ai-worksheets.html" class="internal-link" target="_self" rel="noopener nofollow">AI Worksheets</a>
Extended worksheet collection supporting the readings and lectures. Covers alignment, reasoning, RAG, vision-language models, and adversarial scenarios.<br><a data-tooltip-position="top" aria-label="https://www.zotero.org/groups/5176885/multimodal/library" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.zotero.org/groups/5176885/multimodal/library" target="_self">Zotero</a>
Hallucinations + misinformation (SegSub)
Typologies
Generality vs specialization Need for RAG to stay up to date... Adversarial information retrieval
Jailbreaks of agentic AI (benchmark papers, sudo rm -rf agent security)
Ethics
<br>See <a data-href="Advanced Topics/Adversarial Retrieval and LLMs/LLM background literature" href="advanced-topics/adversarial-retrieval-and-llms/llm-background-literature.html" class="internal-link" target="_self" rel="noopener nofollow">Advanced Topics/Adversarial Retrieval and LLMs/LLM background literature</a> for primers on model design and training processes.
Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2203.02155" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2203.02155" target="_self">Training language models to follow instructions with human feedback</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2305.18290" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2305.18290" target="_self">Direct Preference Optimization: Your Language Model is Secretly a Reward Model</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2402.01306" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2402.01306" target="_self"><em></em></a>KTO: Model Alignment as Prospect Theoretic Optimization <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2212.08073" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2212.08073" target="_self"><em></em></a>Constitutional AI: Harmlessness from AI Feedback <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2309.00267" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2309.00267" target="_self"><em></em></a>RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2310.06452" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2310.06452" target="_self"><em></em></a>Understanding the Effects of RLHF on LLM Generalisation and Diversity Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2310.01798" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2310.01798" target="_self">Large Language Models Cannot Self-Correct Reasoning Yet</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.10200" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.10200" target="_self">Chain-of-Thought Reasoning Without Prompting</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2201.11903" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2201.11903" target="_self"><em></em></a>Chain-of-Thought Prompting Elicits Reasoning in Large Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2205.11916" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2205.11916" target="_self"><em></em></a>Large Language Models are Zero-Shot Reasoners <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2203.11171" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2203.11171" target="_self"><em></em></a>Self-Consistency Improves Chain of Thought Reasoning in Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2212.09561" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2212.09561" target="_self"><em></em></a>Large Language Models are Better Reasoners with Self-Verification Readings: <br><a data-tooltip-position="top" aria-label="https://proceedings.neurips.cc/paper_files/paper/2023/hash/efb2072a358cefb75886a315a6fcf880-Abstract-Conference.html" rel="noopener nofollow" class="external-link is-unresolved" href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/efb2072a358cefb75886a315a6fcf880-Abstract-Conference.html" target="_self">On the Planning Abilities of Large Language Models - A Critical Investigation</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2405.04776" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2405.04776" target="_self">Chain of Thoughtlessness? An Analysis of CoT in Planning</a> Recent Trends and Developments after O1, e.g., <br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.arxiv.org/abs/2409.13373" target="_self">https://www.arxiv.org/abs/2409.13373</a> <br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://cdn.openai.com/o1-system-card.pdf" target="_self">https://cdn.openai.com/o1-system-card.pdf</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.08115" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.08115" target="_self"><em></em></a>On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2405.13966" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2405.13966" target="_self"><em></em></a>On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2002.08909" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2002.08909" target="_self">REALM: Retrieval-Augmented Language Model Pre-Training</a> (v) <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2005.11401" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2005.11401" target="_self">Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks</a> (v) <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2310.11511" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2310.11511" target="_self">Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection</a> (v) <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.07630" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.07630" target="_self">G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2404.19705" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2404.19705" target="_self">When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2304.13734" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2304.13734" target="_self">Internal State of an LLM knows when it’s Lying</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2311.05232" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2311.05232" target="_self">A survey on Hallucinations in LLMs</a> Optional: <br><a data-tooltip-position="top" aria-label="https://www.sciencedirect.com/science/article/pii/S266729522400014X" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.sciencedirect.com/science/article/pii/S266729522400014X" target="_self">A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly</a> Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2103.00020" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2103.00020" target="_self">CLIP: Learning Transferable Visual Models From Natural Language Supervision</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2401.02460" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2401.02460" target="_self">Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2408.11039v1" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2408.11039v1" target="_self"><em></em></a>Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2304.08485" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2304.08485" target="_self"><em></em></a>Visual Instruction Tuning (LLaVa) <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2408.12528v2" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2408.12528v2" target="_self"><em></em></a>Show O : One Single Transformer to Unify Multimodal Understanding and Generation ]]></description><link>advanced-topics/adversarial-retrieval-and-llms/adversarial-retrieval-and-llms-syllabus-(advanced-topic).html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Adversarial Retrieval and LLMs Syllabus (Advanced Topic).md</guid><pubDate>Mon, 27 Apr 2026 20:41:13 GMT</pubDate></item><item><title><![CDATA[Assessing T&S Policies]]></title><description><![CDATA[Consider the following controversial political ad examples that expose tensions, inconsistencies, and edge cases in platform policies. These cases are designed to:
Show where Policy A and Policy B give different results
Highlight situations where policies produce seemingly inappropriate or controversial outcomes
Reveal ambiguities and enforcement challenges
Policy definitions: <a data-href="Political Ad Policies" href="quizzes/political-ad-policies.html" class="internal-link" target="_self" rel="noopener nofollow">Political Ad Policies</a>A political action committee creates an ad featuring a stylized image of the opposing candidate with:
Devil horns superimposed on their head
An American flag burning in the background
Text overlay: "Don't let 'Candidate X' destroy America"
Dramatic apocalyptic imagery (crumbling buildings, dark skies)
The ad is targeted to voters in swing states who have previously engaged with content about patriotism and national security.--Policy A (Meta-based) - Likely Outcome: ✅ APPROVED
Reasoning: Stylized imagery and negative campaigning are explicitly allowed
Not a direct call for violence
Not voter suppression
Custom audience targeting is permitted
Dramatic imagery is allowed under free political speech
Note: Controversy: Many would argue this ad demonizes the candidate and could contribute to hostile climate, but policy prioritizes political speech --Policy B (Google-based) - Likely Outcome: ✅ APPROVED Reasoning: False or misleading claims about opponents are allowed
Not direct violence incitement
Not voter suppression
Would be subject to basic targeting only (can't microtarget) Controversy: Same concerns as Policy A; the ad is inflammatory but doesn't violate technical policy provisions
Note: Both policies approve this controversial ad. The main difference is targeting precision.A campaign creates an ad stating:
"Transgender activists are mentally ill and have no place teaching our children"
Features images of a transgender school teacher
Calls for legislation to ban transgender individuals from teaching positions
Text: "Vote 'Candidate X' to protect our kids"
The ad uses custom audience lists of parents with school-age children and targets users who have engaged with content about education policy.--Policy A (Meta-based) - Likely Outcome: ✅ APPROVED (as of January 2025)
Reasoning: Post-January 2025 policy explicitly allows "allegations of mental illness or abnormality when based on gender or sexual orientation, given political and religious discourse about transgenderism"
Policy allows "content arguing for gender-based limitations of military, law enforcement and teaching jobs"
Falls under "religious beliefs" exception
Custom audience targeting permitted
Note: Controversy: This represents a major policy shift. Previously would have been removed as hate speech. Now explicitly permitted despite concerns about: Real-world violence against LGBTQ+ individuals
Creating hostile environment
Spreading misinformation about transgender identity --Policy B (Google-based) - Likely Outcome: ⚠️ UNCLEAR
Reasoning: Google's policy doesn't explicitly address attacks on protected characteristics in political ads
Not technically voter suppression or violence incitement
Would likely be approved under "political speech" but enforcement uncertain Controversy: Lack of clear policy guidance on hate speech in political context creates enforcement ambiguity
Note: Policy A explicitly permits this after January 2025 changes; Policy B is ambiguous. This case highlights how recent policy changes normalized anti-LGBTQ+ content.An ad features a highly realistic AI-generated video showing:
The opposing candidate appearing to say: "I don't care about working-class families"
The candidate never actually said this
The ad includes a small disclaimer at the bottom: "This ad contains AI-generated content"
Targets working-class voters in specific zip codes using custom audience data
--Policy A (Meta-based) - Likely Outcome: ✅ APPROVED (with disclosure)
Reasoning: AI-generated content depicting false statements by candidates IS allowed
Advertiser checked the disclosure box for synthetic content
Small disclaimer present (even if hard to notice)
Custom audience targeting permitted
False claims about opponents are allowed
Note; Controversy: Even with disclosure, many viewers may believe the video is real, especially if disclaimer is small or easy to miss --Policy B (Google-based) - Likely Outcome: ✅ APPROVED (with disclosure)
Reasoning: Synthetic content must be disclosed - advertiser complied
False claims about opponents are allowed
Would use basic targeting only (less precise than Policy A) Note: Controversy: Same as Policy A - disclosure may be insufficient to prevent deceptionKey Insight: Both policies allow deceptive AI content with minimal disclosure requirements. The disclosure is technically compliant but may be functionally inadequate.An ad shows:
Images of migrants at the border
Text: "Illegal immigrants are criminals poisoning our communities"
Dramatic music and red alert-style graphics
Call to action: "Vote to stop the invasion"
Uses lookalike audiences based on people who donated to immigration restriction groups
--Policy A (Meta-based) - Likely Outcome: ✅ APPROVED (as of January 2025)
Reasoning: Post-January 2025 policy allows attacks on immigrants
"Claims about immigrants' character or fitness for participation in society" now permitted
Not direct violence incitement
Lookalike audience targeting allowed
Falls under political discourse about immigration
Note: Controversy: Uses dehumanizing language ("invasion", "poisoning")
Research shows this rhetoric correlates with hate crimes against immigrants
Previously might have been restricted; now explicitly allowed --Policy B (Google-based) - Likely Outcome: ✅ LIKELY APPROVED
Reasoning: Doesn't explicitly call for violence
Political speech about immigration policy
False/misleading claims allowed
Can only use basic targeting (not lookalike audiences)
Note: Controversy: Same concerns about dehumanizing rhetoric Key Insight: Both approve, but Policy A allows more precise targeting of receptive audiences. The language is inflammatory but doesn't cross the line to direct violence incitement.An ad from a political activist group states:
"Patriots: Show up at polling places on Election Day"
"Document suspicious voters and ask them for their papers"
"Don't let them steal our election"
Features images of confrontations at polling places
Targets users who have engaged with election fraud conspiracy content
--Policy A (Meta-based) - Likely Outcome: ❌ REJECTED
Reasoning: Violates prohibition on voter intimidation
"Coordinated calls to interfere with voting or election processes" is prohibited
Could be interpreted as encouraging confrontation at polls
Likely flagged as attempting to suppress voting through intimidation Notes: This should be removed, but enforcement may be inconsistent
--Policy B (Google-based) - Likely Outcome: ❌ REJECTED
Reasoning: Explicitly violates prohibition on "content encouraging others to interfere with democratic processes"
Telling people to confront voters at polling locations is specifically called out as prohibited
"Calls to incite physical conflict... at polling locations to deter voting" is banned Notes: Clear policy violation
Note: Key Insight: Both policies reject this. However, subtle variations (e.g., "observe" instead of "confront") might create grey areas.On election night, before votes are fully counted, a candidate runs ads stating:
"Victory! I've won the election!"
"Despite fake news media, the real numbers show I won"
"Don't believe the lying media when they say I lost"
Uses all available targeting to reach maximum audience
--Policy A (Meta-based) - Likely Outcome: ⚠️ LIKELY REJECTED (but historically inconsistent)
Reasoning: "Premature victory claims" before official certification are prohibited
However, claiming election fraud/stolen election is now allowed (fact-checking removed)
Ambiguous whether "real numbers show I won" crosses the line Note: Controversy: Enforcement has been highly inconsistent
Similar content appeared in 2020 and 2024 elections
Policy was supposed to prevent this but failed in practice
--Policy B (Google-based) - Likely Outcome: ⚠️ UNCLEAR
Reasoning: Previously would have been removed
As of June 2023, false claims about election outcomes are no longer prohibited
However, "premature" claims might still violate voter suppression rules
Policy now prioritizes "free expression" over accuracy Controversy: The 2023 policy rollback makes enforcement unclear
Note: Key Insight: Both policies have been weakened on election misinformation. What was once clearly prohibited is now in a grey area.In a local election with large immigrant population, an ad states:
"[Ethnic Group] voters are destroying our neighborhood"
"They don't share our values and shouldn't have a say in our community"
Shows unflattering images of people from that ethnic group
Uses custom audience targeting to reach residents of specific neighborhoods
--Policy A (Meta-based) - Likely Outcome: ⚠️ AMBIGUOUS - Likely Rejected but Uncertain
Reasoning: Race and ethnicity are still protected characteristics under base policy
However, immigration status-based attacks are now allowed
Ambiguous whether this is "ethnic" discrimination (prohibited) or "immigration" discourse (allowed)
If framed as "immigrants" rather than ethnic group, might be approved
Enforcement would depend on exact wording Note:
Controversy: The line between ethnic discrimination and immigration discourse is blurry
Ads can be reworded to exploit this ambiguity
Custom targeting makes this especially harmful to specific communities --Policy B (Google-based) - Likely Outcome: ⚠️ UNCLEAR
Reasoning: No explicit policy on ethnic/racial attacks in political ads
Not technically voter suppression
Not direct violence incitement
Likely would depend on human review Controversy: Lack of clear policy creates inconsistent enforcement
Note:
Key Insight: Both policies have ambiguities around ethnic attacks that aren't direct violence incitement. Clever wording can exploit these gaps.An ad attacking LGBTQ+ school board candidates:
"Stop the groomers from getting near our children"
Features photos of LGBTQ+ candidates with ominous music
Text: "They want to indoctrinate your kids"
Links LGBTQ+ identity with child predation
Targets parents using custom audience lists from school districts
--Policy A (Meta-based) - Likely Outcome: ⚠️ AMBIGUOUS (Likely Approved post-2025 changes)
Reasoning: Post-January 2025, allows content about LGBTQ+ individuals and their "fitness" for roles involving children
"Mental illness or abnormality" allegations allowed when based on sexual orientation
Might be approved as "political discourse"
However, "groomer" rhetoric has been linked to violence against LGBTQ+ people
Enforcement highly uncertain Note:
Controversy: "Groomer" is a slur falsely associating LGBTQ+ people with pedophilia
This rhetoric has preceded real-world attacks on LGBTQ+ individuals
Policy changes may have emboldened this type of content --Policy B (Google-based) - Likely Outcome: ⚠️ UNCLEAR
Reasoning: No explicit policy on LGBTQ+ attacks in political context
Not technically violence incitement (though may inspire violence)
Would likely depend on individual review Controversy: Dangerous rhetoric in policy grey area
Note:
Key Insight: Policy A's 2025 changes may have opened door to this type of content. Both policies struggle with indirect incitement to violence.An advertiser creates dozens of variations of an ad, each tailored to specific audiences:
To Latino voters: "Your opponent wants to deport your family"
To white rural voters: "Your opponent is giving your jobs to immigrants"
To Black voters: "Your opponent supports policies that target your community"
Each version uses custom audiences, voter file data, and interest targeting
No single message seen broadly; each group sees different (contradictory) claims
--Policy A (Meta-based) - Likely Outcome: ✅ APPROVED
Reasoning: Custom audience targeting explicitly allowed
False/misleading claims about opponents permitted
Each individual ad doesn't violate policy
Ability to show different messages to different groups is a feature, not a bug Note:
Controversy: Prevents public accountability (no one sees all messages)
Allows contradictory claims to different audiences
Makes fact-checking nearly impossible
Enables targeted manipulation
This is exactly what Cambridge Analytica did --Policy B (Google-based) - Likely Outcome: ⚠️ PARTIALLY PREVENTED
Reasoning: Microtargeting is prohibited, limiting precision
Can still target different geographic regions or basic demographics
False claims allowed but less precisely targeted
More likely that contradictory messages would be noticed Note:
Controversy: Basic targeting still allows some audience segmentation with different messages
Key Insight: This case highlights the danger of microtargeting + lack of truthfulness requirements. Policy A enables this; Policy B partially prevents it.An ad encourages viewers to:
"Report suspected voter fraud to our hotline"
"If you see something suspicious at polls, document it"
Shows examples of "suspicious behavior" that are actually normal (people helping elderly voters, non-English speakers, etc.)
Provides a phone number
Doesn't explicitly call for confrontation
--Policy A (Meta-based) - Likely Outcome: ⚠️ AMBIGUOUS
Reasoning: Doesn't explicitly call for confrontation or intimidation
Could be framed as "election integrity" efforts
However, showing normal behavior as "suspicious" could suppress voting
"Coordinated calls to interfere" might apply
Enforcement would be inconsistent
Note: Controversy: Chilling effect on legitimate voters without explicit intimidation
--Policy B (Google-based) - Likely Outcome: ⚠️ AMBIGUOUS
Reasoning: Not technically "instructing" interference
Could be argued as voter education (though misleading)
Doesn't explicitly tell people to create long lines or confront voters
Grey area between observation and intimidation Note:
Controversy: Same as Policy A - subtle intimidation that doesn't explicitly violate policy
Key Insight: Both policies struggle with subtle forms of voter intimidation that don't explicitly call for confrontation.
Case 2 (Mental Illness): Policy A explicitly permits (post-2025); Policy B unclear
Case 9 (Microtargeting): Policy A enables; Policy B restricts
Case 4 (Immigration): Both approve but Policy A allows more precise targeting
--
Case 1 (Devil Horns): Both approve inflammatory demonization
Case 3 (AI Deepfake): Both allow with minimal disclosure
Case 6 (False Victory): Both weakened enforcement on election misinformation
Case 8 (Groomers): Both struggle with indirect violence incitement
--
Case 7 (Ethnic Attack): Depends on exact wording
Case 10 (Fraud Hotline): Subtle intimidation in grey area Policy Effectiveness: Which cases show where truthfulness requirements would matter most?
Targeting vs. Content: Is it worse to have precise targeting of harmful content (Policy A) or broad distribution of harmful content (Policy B)?
Recent Changes: How do Policy A's January 2025 changes affect marginalized communities? What's the trade-off between "free speech" and safety?
Enforcement Gaps: Which cases reveal that written policies don't match actual enforcement?
Indirect Harm: How do policies handle content that doesn't directly incite violence but creates conditions for violence?
Microtargeting: Why is Case 9 particularly problematic? How does it undermine democratic discourse?
AI Content: Is disclosure sufficient for AI-generated content, or do we need stronger restrictions?
Protected Characteristics: Should political speech exceptions exist for attacks on protected groups? Why or why not? ✅ Classify cases according to platform policies
✅ Identify borderline cases with ambiguous determinations
✅ Diagnose how and why policies break down
✅ Criticize automated content moderation on edge cases
✅ Propose measurement strategies for tracking failures
✅ Propose alterations to address borderline cases
Activity 1: Set It Up (LO1 - Classification)
Use Cases 1, 3, 5 (clear outcomes)
Students classify: Approve/Reject under each policy
Use clickers for real-time feedback
Activity 2: Think-Pair-Share (LO2 - Borderline Cases)
Use Cases 2, 7, 8, 10 (ambiguous)
Pairs discuss and justify their determinations
Compare reasoning
Activity 3: Contrasting Cases (LO3 - Policy Diagnosis)
Compare how Policies A and B handle Cases 2, 4, 9
Identify: ambiguity, binary classification issues, false positives/negatives
Small group discussion on different policy failures
These cases are inspired by actual ads that ran on Meta and Google platforms:
Case 1: Based on actual anti-Harris ads (2024)
Case 2: Based on anti-trans political ads (2023-2025) Case 3: Common deepfake concern across platforms
Case 4: Common immigration ad rhetoric
Case 8: "Groomer" rhetoric seen in 2022-2024 school board races
Case 9: Cambridge Analytica-style tactics Policy ≠ Enforcement: Written policies often fail in practice
Grey Areas: Most controversial content lives in ambiguous spaces
Harm Beyond Violence: Indirect harm is real but harder to regulate
Recent Backsliding: Both platforms weakened protections (2023-2025)
Targeting Amplifies Harm: Same content is worse with precise targeting
Course: CSPedagogy / Trust &amp; Safety<br>
Related: <a data-href="old/Trust &amp; Safety Class Old/Quizzes/Political Ad Policies" href=".html" class="internal-link" target="_self" rel="noopener nofollow">old/Trust &amp; Safety Class Old/Quizzes/Political Ad Policies</a>, <a data-href="Active Learning Resources" href=".html" class="internal-link" target="_self" rel="noopener nofollow">Active Learning Resources</a>]]></description><link>quizzes/assessing-t&amp;s-policies.html</link><guid isPermaLink="false">Quizzes/Assessing T&amp;S Policies.md</guid><pubDate>Mon, 27 Apr 2026 19:47:04 GMT</pubDate></item><item><title><![CDATA[Political Ad Policies]]></title><description><![CDATA[This document defines two platform policies for political advertising, based on real-world approaches from major social media platforms. These policies will be used for active learning exercises on content moderation and trust &amp; safety.
Advertiser Verification: All political advertisers must complete identity verification, providing government-issued ID and proof of location
Disclosure Requirements: All political ads must include a "Paid for by __ " disclaimer
Ad Library: All political ads stored in publicly accessible archive showing: Ad content
Who paid for the ad
Amount spent
Targeting parameters used Custom Audiences: Advertisers may upload their own customer lists to target specific individuals
Lookalike Audiences: Allowed - can target users similar to existing supporters
Geographic Targeting: Full geographic targeting available (city, state, region)
Demographic Targeting: Age, gender, and basic demographic targeting permitted
Interest-Based Targeting: Limited - advertisers may target based on general interests but NOT based on specific political, religious, or health-related content users have accessed on the platform
Exclusions: NOT allowed - advertisers cannot exclude specific groups or audiences with opposing interests (as of January 2025) Negative campaign ads criticizing opponents' policies or record
False or misleading claims about political opponents
Dramatic or stylized imagery (e.g., apocalyptic scenes, unflattering photo manipulation)
AI-generated content IF DISCLOSED - must check box indicating synthetic/digitally altered content that depicts: A person saying/doing something they didn't do
Realistic-looking people or events that don't exist
Altered footage of real events Voter Suppression: False information about where, when, or how to vote
False Eligibility Claims: Misleading information about who can vote
Premature Victory Claims: Calling election results before official certification
Direct Violence Incitement: Content that encourages violence against: Election workers
Candidates
Voters
Any individuals at polling locations Dangerous Organizations: Glorification or support of designated terrorist organizations or hate groups
Voter Intimidation: Coordinated calls to interfere with voting or election processes Protected Characteristic Exceptions: While general attacks on protected characteristics are prohibited, the following ARE ALLOWED when based on religious or political beliefs: Allegations that LGBTQ+ individuals are "mentally ill" or "abnormal"
Arguments for gender-based limitations in military, law enforcement, and teaching positions
Arguments for sexual orientation-based limitations in the same professions when based on religious beliefs
Claims about immigrants' character or fitness for participation in society Automated systems screen ads before publication
Community reporting available
Human review for flagged content
Non-compliance results in ad disapproval and potential account suspension Advertiser Verification: All political advertisers must complete Election Ads verification process
Disclosure Requirements: All political ads must include in-ad "Paid for by [Name]" disclosure Visual ads: Disclosure must be visible at all times and sufficiently large for average viewer
Audio ads: Disclosure must be similar in pitch, tone, and speed to rest of ad Transparency Report: All election ads published in Political Advertising Transparency Report with: Ad content
Who paid for the ad
Amount spent
Targeting parameters (limited) Custom Audiences: NOT allowed for granular political targeting
Microtargeting: Explicitly PROHIBITED - never allowed
Basic Political Targeting: Only the following permitted: Public voter records
General political affiliations (left-leaning, right-leaning, independent) Geographic Targeting: Allowed (but limited in precision)
Search-Based Targeting: Ads may appear in response to user search queries
Interest-Based Targeting: Very limited - only broad categories, no granular interests Negative campaign ads criticizing opponents' policies or record
False or misleading claims about political opponents' positions or record
Search ads responding to political queries
Display ads on partner websites
Video ads on platform Voter Suppression: False information about voting methods (e.g., "text your vote to this number")
Made-up voter eligibility requirements
Misleading information about where, when, or how to vote False Candidate Eligibility Claims: False claims that candidates are deceased
False claims about age or citizenship eligibility Interference with Democratic Processes: Instructions to create long voting lines to deter others
Instructions to hack government websites
Calls to incite physical conflict at polling locations Manipulated Content Creating Serious Risk of Harm: Technically manipulated content making government officials appear to say/do things they didn't
Old footage falsely presented as current events
Fabricated events creating serious risk of egregious harm Direct Violence Incitement: Content encouraging violent acts against: Election workers
Candidates Voters Synthetic Content (must be disclosed): AI-generated or digitally altered content depicting people saying/doing things they didn't do
Synthetic content creating realistic portrayals of events that didn't happen False claims about election outcomes (e.g., "the 2020 election was stolen")
General election misinformation that doesn't directly suppress votes Automated screening before ad approval
Human review for verification process
Ads must comply with all policies to run
Violations result in ad disapproval
Repeated violations may result in loss of verification status Which policy provides more protection against targeted manipulation of voters?
Which policy is more permissive regarding hate speech in political contexts?
How do the targeting restrictions in Policy B affect the ability of smaller campaigns to reach specific audiences?
What are the trade-offs between free political speech and preventing harm under each policy?
How might enforcement challenges differ between these two policies?
Which policy better addresses the risk of violence incitement, and why?
These policies are simplified versions based on:
Policy A: Meta's U.S. political ads policy (as of 2025, post-January policy changes)
Policy B: Google/YouTube's political ads policy (as of 2025)
Key sources:
Meta Transparency Center: Political Advertising policies
Google Ads Policy Help: Political Content policy
Documented enforcement challenges and policy changes (2023-2025)
Both platforms banned political ads in the EU (October 2025) in response to TTPA regulation.Created: 2025-10-22
Last Updated: 2025-10-22
Course: CSPedagogy / Trust &amp; Safety]]></description><link>quizzes/political-ad-policies.html</link><guid isPermaLink="false">Quizzes/Political Ad Policies.md</guid><pubDate>Mon, 27 Apr 2026 19:44:54 GMT</pubDate></item><item><title><![CDATA[Pitfalls of Binary Classification]]></title><description><![CDATA[Policies: <a data-href="Trust &amp; Safety Class/Quizzes/Political Ad Policies" href=".html" class="internal-link" target="_self" rel="noopener nofollow">Trust &amp; Safety Class/Quizzes/Political Ad Policies</a><br>
Cases: <a data-href="Trust &amp; Safety Class/Quizzes/Assessing T&amp;S Policies" href=".html" class="internal-link" target="_self" rel="noopener nofollow">Trust &amp; Safety Class/Quizzes/Assessing T&amp;S Policies</a>
Classify easy examples of online harms according to a platform policy
Identify borderline cases where the determination of harm is ambiguous
Diagnose how and why policies begin to breakdown on these examples
(System) Criticize how automated content moderation mechanisms handle such cases
(System) Propose measurement strategies to track such failures
(System) Propose alterations to content moderation mechanisms to account for the identified borderline cases
Learning Objective #1: Enforcing platform policies
Strategy Name: "Set It Up" (problem solving process)
Description: consider a list of 3 example cases against a reference policy (given) to determine is the obvious violation and which is not a violation
Expected Outcomes: Platform policies are understood as a process.
Assessment Method: Clickers
Learning Objective #2: Identifying edge cases
Strategy Name: think-pair-share
Description: discuss and action a series of borderline cases using the same policy
Expected Outcomes: Borderline cases determination is improved by discussion
Assessment Method: Clickers
Learning Objective #3: Diagnosing policy issues
Strategy Name: contrasting cases
Description: given 2 example policies for the same issue, determine where and why each policy is problematic for the previous examples (ambiguity in policy, binary classification where multiple thresholds are needed, false positives, false negatives,...)
Expected Outcomes: each small group may discuss a different set of issues
Assessment Method: Discussion
Active Learning for system based questions is left to the assignmentOnline Trust &amp; Safety tackles a range of online harms. To decide which to cover, use Dotmocracy.
Trust &amp; Safety is about people! Case's presented in the class could be simply portrayed using screenshots in a slide deck, but there is opportunity for more interaction.
Fishbowl method: like charades, have a small number of students act out certain cases, and the rest of the class makes the determination based on a platform policy. Extremely risky: have to be very careful about which topics, cases, and platform policies are chosen for this. Students must have the ability to choose another case or refuse to participate. The lines drawn by platform policies are subject to intense decision making processes (e.g. decisions to take down President Trump's social media accounts during Jan 6th Capitol Riot).
Structured debates on safety (public harm) vs censorship (freedom of speech) Question: how do the decisions made by different platforms reflect the tradeoff between safety and censorship? What active learning strategies to use for homeworks?
How much time to set aside for this in practice?]]></description><link>quizzes/pitfalls-of-binary-classification.html</link><guid isPermaLink="false">Quizzes/Pitfalls of Binary Classification.md</guid><pubDate>Mon, 27 Apr 2026 19:44:03 GMT</pubDate></item><item><title><![CDATA[02. sna_animal_network_robustness]]></title><link>advanced-topics/social-network-analysis/slides/02.-sna_animal_network_robustness.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/02. sna_animal_network_robustness.pdf</guid><pubDate>Mon, 27 Apr 2026 18:41:28 GMT</pubDate></item><item><title><![CDATA[03. sna_community_detection]]></title><link>advanced-topics/social-network-analysis/slides/03.-sna_community_detection.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/03. sna_community_detection.pdf</guid><pubDate>Mon, 27 Apr 2026 18:39:35 GMT</pubDate></item><item><title><![CDATA[04. sna_stance_detection]]></title><link>advanced-topics/social-network-analysis/slides/04.-sna_stance_detection.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/04. sna_stance_detection.pdf</guid><pubDate>Mon, 27 Apr 2026 18:38:45 GMT</pubDate></item><item><title><![CDATA[05. sna_information_operations]]></title><link>advanced-topics/social-network-analysis/slides/05.-sna_information_operations.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/05. sna_information_operations.pdf</guid><pubDate>Mon, 27 Apr 2026 18:38:28 GMT</pubDate></item><item><title><![CDATA[06. sna_information_operations_detection]]></title><link>advanced-topics/social-network-analysis/slides/06.-sna_information_operations_detection.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/06. sna_information_operations_detection.pdf</guid><pubDate>Mon, 27 Apr 2026 18:37:42 GMT</pubDate></item><item><title><![CDATA[07. sna_social_influence_modeling]]></title><link>advanced-topics/social-network-analysis/slides/07.-sna_social_influence_modeling.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/07. sna_social_influence_modeling.pdf</guid><pubDate>Mon, 27 Apr 2026 18:36:43 GMT</pubDate></item><item><title><![CDATA[08. sna_population_modeling]]></title><link>advanced-topics/social-network-analysis/slides/08.-sna_population_modeling.html</link><guid isPermaLink="false">Advanced Topics/Social Network Analysis/Slides/08. sna_population_modeling.pdf</guid><pubDate>Mon, 27 Apr 2026 18:35:56 GMT</pubDate></item><item><title><![CDATA[07. Media Influences]]></title><link>advanced-topics/misinformation/slides/07.-media-influences.html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/07. Media Influences.pdf</guid><pubDate>Mon, 27 Apr 2026 18:34:45 GMT</pubDate></item><item><title><![CDATA[04. Detection and Discovery of Misinformation Sources]]></title><link>advanced-topics/misinformation/slides/04.-detection-and-discovery-of-misinformation-sources.html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/04. Detection and Discovery of Misinformation Sources.pdf</guid><pubDate>Mon, 27 Apr 2026 18:33:54 GMT</pubDate></item><item><title><![CDATA[06. Credibility Pluralism Tradeoff]]></title><link>advanced-topics/misinformation/slides/06.-credibility-pluralism-tradeoff.html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/06. Credibility Pluralism Tradeoff.pdf</guid><pubDate>Mon, 27 Apr 2026 18:29:53 GMT</pubDate></item><item><title><![CDATA[05. Misinformation Resilient Search Rankings]]></title><link>advanced-topics/misinformation/slides/05.-misinformation-resilient-search-rankings.html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/05. Misinformation Resilient Search Rankings.pdf</guid><pubDate>Mon, 27 Apr 2026 18:29:42 GMT</pubDate></item><item><title><![CDATA[04. Adversarial Attacks on IR Systems]]></title><link>advanced-topics/adversarial-retrieval-and-llms/slides/04.-adversarial-attacks-on-ir-systems.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Slides/04. Adversarial Attacks on IR Systems.pdf</guid><pubDate>Mon, 27 Apr 2026 18:27:53 GMT</pubDate></item><item><title><![CDATA[01. Memorization, Generalization, and Specialization in LLMs]]></title><link>advanced-topics/adversarial-retrieval-and-llms/slides/01.-memorization,-generalization,-and-specialization-in-llms.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Slides/01. Memorization, Generalization, and Specialization in LLMs.pdf</guid><pubDate>Mon, 27 Apr 2026 18:27:42 GMT</pubDate></item><item><title><![CDATA[02. LLM Hallucinations and Knowledge Conflicts]]></title><link>advanced-topics/adversarial-retrieval-and-llms/slides/02.-llm-hallucinations-and-knowledge-conflicts.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Slides/02. LLM Hallucinations and Knowledge Conflicts.pdf</guid><pubDate>Mon, 27 Apr 2026 18:27:08 GMT</pubDate></item><item><title><![CDATA[03. Adversarial Adaptation In Information Systems]]></title><link>advanced-topics/adversarial-retrieval-and-llms/slides/03.-adversarial-adaptation-in-information-systems.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Slides/03. Adversarial Adaptation In Information Systems.pdf</guid><pubDate>Mon, 27 Apr 2026 18:26:40 GMT</pubDate></item><item><title><![CDATA[06. Authentication, Identity, and Platform Manipulation]]></title><link>lessons/slides/06.-authentication,-identity,-and-platform-manipulation.html</link><guid isPermaLink="false">Lessons/Slides/06. Authentication, Identity, and Platform Manipulation.pdf</guid><pubDate>Mon, 20 Apr 2026 01:26:12 GMT</pubDate></item><item><title><![CDATA[12. Types of Attack Surfaces]]></title><link>lessons/slides/12.-types-of-attack-surfaces.html</link><guid isPermaLink="false">Lessons/Slides/12. Types of Attack Surfaces.pdf</guid><pubDate>Mon, 20 Apr 2026 01:20:44 GMT</pubDate></item><item><title><![CDATA[15. Emerging_Topics 3 - LLM Hallucinations and Knowledge Conflicts]]></title><link>lessons/slides/15.-emerging_topics-3-llm-hallucinations-and-knowledge-conflicts.html</link><guid isPermaLink="false">Lessons/Slides/15. Emerging_Topics 3 - LLM Hallucinations and Knowledge Conflicts.pdf</guid><pubDate>Sun, 19 Apr 2026 15:33:00 GMT</pubDate></item><item><title><![CDATA[14. Emerging_Topics 2 - Adversarial Retrieval]]></title><link>lessons/slides/14.-emerging_topics-2-adversarial-retrieval.html</link><guid isPermaLink="false">Lessons/Slides/14. Emerging_Topics 2 - Adversarial Retrieval.pdf</guid><pubDate>Sun, 19 Apr 2026 15:31:52 GMT</pubDate></item><item><title><![CDATA[12. Adversarial Adaptation and the Limitations of Interventions]]></title><link>lessons/slides/12.-adversarial-adaptation-and-the-limitations-of-interventions.html</link><guid isPermaLink="false">Lessons/Slides/12. Adversarial Adaptation and the Limitations of Interventions.pdf</guid><pubDate>Sun, 19 Apr 2026 15:30:50 GMT</pubDate></item><item><title><![CDATA[10. Source Credibility]]></title><link>lessons/slides/10.-source-credibility.html</link><guid isPermaLink="false">Lessons/Slides/10. Source Credibility.pdf</guid><pubDate>Sun, 19 Apr 2026 15:28:51 GMT</pubDate></item><item><title><![CDATA[11. Intervention Effectiveness Case Study - Misinformation and Search Rankings]]></title><link>lessons/slides/11.-intervention-effectiveness-case-study-misinformation-and-search-rankings.html</link><guid isPermaLink="false">Lessons/Slides/11. Intervention Effectiveness Case Study - Misinformation and Search Rankings.pdf</guid><pubDate>Sun, 19 Apr 2026 15:20:05 GMT</pubDate></item><item><title><![CDATA[Pasted image 20260419104128]]></title><description><![CDATA[<img src="assets/pasted-image-20260419104128.png" target="_self">]]></description><link>assets/pasted-image-20260419104128.html</link><guid isPermaLink="false">Assets/Pasted image 20260419104128.png</guid><pubDate>Sun, 19 Apr 2026 14:41:28 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419104103]]></title><description><![CDATA[<img src="assets/pasted-image-20260419104103.png" target="_self">]]></description><link>assets/pasted-image-20260419104103.html</link><guid isPermaLink="false">Assets/Pasted image 20260419104103.png</guid><pubDate>Sun, 19 Apr 2026 14:41:03 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419104034]]></title><description><![CDATA[<img src="assets/pasted-image-20260419104034.png" target="_self">]]></description><link>assets/pasted-image-20260419104034.html</link><guid isPermaLink="false">Assets/Pasted image 20260419104034.png</guid><pubDate>Sun, 19 Apr 2026 14:40:34 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419103952]]></title><description><![CDATA[<img src="assets/pasted-image-20260419103952.png" target="_self">]]></description><link>assets/pasted-image-20260419103952.html</link><guid isPermaLink="false">Assets/Pasted image 20260419103952.png</guid><pubDate>Sun, 19 Apr 2026 14:39:52 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419103918]]></title><description><![CDATA[<img src="assets/pasted-image-20260419103918.png" target="_self">]]></description><link>assets/pasted-image-20260419103918.html</link><guid isPermaLink="false">Assets/Pasted image 20260419103918.png</guid><pubDate>Sun, 19 Apr 2026 14:39:18 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419103845]]></title><description><![CDATA[<img src="assets/pasted-image-20260419103845.png" target="_self">]]></description><link>assets/pasted-image-20260419103845.html</link><guid isPermaLink="false">Assets/Pasted image 20260419103845.png</guid><pubDate>Sun, 19 Apr 2026 14:38:45 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419103820]]></title><description><![CDATA[<img src="assets/pasted-image-20260419103820.png" target="_self">]]></description><link>assets/pasted-image-20260419103820.html</link><guid isPermaLink="false">Assets/Pasted image 20260419103820.png</guid><pubDate>Sun, 19 Apr 2026 14:38:20 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260419103604]]></title><description><![CDATA[<img src="assets/pasted-image-20260419103604.png" target="_self">]]></description><link>assets/pasted-image-20260419103604.html</link><guid isPermaLink="false">Assets/Pasted image 20260419103604.png</guid><pubDate>Sun, 19 Apr 2026 14:36:04 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[LLM background literature]]></title><description><![CDATA[CSCE 689 LLMs:: Course Readings as shared by Maria TelekiParameter-Efficient Tuning, Compression
Readings: <a data-tooltip-position="top" aria-label="https://openreview.net/forum?id=nZeVKeeFYf9" rel="noopener nofollow" class="external-link is-unresolved" href="https://openreview.net/forum?id=nZeVKeeFYf9" target="_self">LoRA: Low-Rank Adaptation of Large Language Models</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.14314" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.14314" target="_self">QLoRA: Efficient Finetuning of Quantized LLMs</a> Optional: <br><a data-tooltip-position="top" aria-label="https://openreview.net/forum?id=K30wTdIIYc" rel="noopener nofollow" class="external-link is-unresolved" href="https://openreview.net/forum?id=K30wTdIIYc" target="_self"><em></em></a>Controlling Text-to-Image Diffusion by Orthogonal Finetuning <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.17764" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.17764" target="_self"><em></em></a>The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.09353" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.09353" target="_self"><em></em></a>DoRA: Weight-Decomposed Low-Rank Adaptation Efficient inference
Readings: <br><a data-tooltip-position="top" aria-label="https://lmsys.org/blog/2023-11-21-lookahead-decoding/" rel="noopener nofollow" class="external-link is-unresolved" href="https://lmsys.org/blog/2023-11-21-lookahead-decoding/" target="_self">Break the Sequential Dependency of LLM Inference Using Lookahead Decoding</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2211.17192" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2211.17192" target="_self">Fast Inference from Transformers via Speculative Decoding</a> Optional: <br><a data-tooltip-position="top" aria-label="https://pytorch.org/blog/flash-decoding/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pytorch.org/blog/flash-decoding/" target="_self"><em></em></a>Flash-Decoding for long-context inference Some of Andrej Karpathy’s github repos <br><a data-tooltip-position="top" aria-label="https://github.com/karpathy/nanoGPT" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/karpathy/nanoGPT" target="_self"><em></em></a>https://github.com/karpathy/nanoGPT <br><a data-tooltip-position="top" aria-label="https://github.com/karpathy/llm.c" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/karpathy/llm.c" target="_self"><em></em></a>https://github.com/karpathy/llm.c Some of Georgi Gerganov’s github repos <br><a data-tooltip-position="top" aria-label="https://github.com/ggerganov/llama.cpp" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/ggerganov/llama.cpp" target="_self"><em></em></a>https://github.com/ggerganov/llama.cpp <br><a data-tooltip-position="top" aria-label="https://github.com/ggerganov/ggml" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/ggerganov/ggml" target="_self"><em></em></a>https://github.com/ggerganov/ggml Model distillation
Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.02301" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.02301" target="_self">Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2407.06023" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2407.06023" target="_self">Distilling System 2 into System 1</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.16635" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.16635" target="_self"><em></em></a>Impossible Distillation: from Low-Quality Model to High-Quality Dataset &amp; Model for Summarization and Paraphrasing <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.17888" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.17888" target="_self"><em></em></a>LLM-QAT: Data-Free Quantization Aware Training for Large Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.13116" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.13116" target="_self"><em></em></a>A Survey on Knowledge Distillation of Large Language Models <br><a data-tooltip-position="top" aria-label="https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs" target="_self"><em></em></a>https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs Data Efficiency in the Age of LLMs
References: Sorscher, Ben, et al. "Beyond neural scaling laws: beating power law scaling via data pruning." NeurIPS (2022) Abbas, Amro, et al. "Semdedup: Data-efficient learning at web-scale through semantic deduplication." arXiv preprint arXiv:2303.09540 (2023). Sachdeva, Noveen, et al. "How to Train Data-Efficient LLMs." arXiv preprint arXiv:2402.09668 (2024). Marion, Max, et al. "When less is more: Investigating data pruning for pretraining llms at scale." arXiv preprint arXiv:2309.04564 (2023). Brown, Tom B. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020). Xie, Sang Michael, et al. "Data selection for language models via importance resampling." NeurIPS (2023) Engstrom, Logan, Axel Feldmann, and Aleksander Madry. "Dsdm: Model-aware dataset selection with datamodels." arXiv preprint arXiv:2401.12926 (2024). Fadhel, et al. "Data pruning and neural scaling laws: fundamental limitations of score-based algorithms." TMLR ‘23. Guo, et al. "Deepcore: A comprehensive library for coreset selection in deep learning." arXiv preprint arXiv:2204.08499 (2022). Tools, Agents, and MoE
Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2403.15452" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2403.15452" target="_self">What Are Tools Anyway? A Survey from the Language Model Perspective</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2302.04761" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2302.04761" target="_self">Toolformer: Language Models Can Teach Themselves to Use Tools</a> <br>ReAct: Synergizing Reasoning and Acting in Language Models <a data-tooltip-position="top" aria-label="https://openreview.net/forum?id=dHng2O0Jjr" rel="noopener nofollow" class="external-link is-unresolved" href="https://openreview.net/forum?id=dHng2O0Jjr" target="_self">https://arxiv.org/abs/2210.03629</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2406.04692" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2406.04692" target="_self">Mixture-of-Agents Enhances Large Language Model Capabilities</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2101.03961" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2101.03961" target="_self">Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity</a> Optional: <br>Lilian’s Blog on <a data-tooltip-position="top" aria-label="https://lilianweng.github.io/posts/2023-06-23-agent/" rel="noopener nofollow" class="external-link is-unresolved" href="https://lilianweng.github.io/posts/2023-06-23-agent/" target="_self">LLM Powered Autonomous Agents</a> <br><a data-tooltip-position="top" aria-label="https://openaccess.thecvf.com/content/CVPR2023/html/Gupta_Visual_Programming_Compositional_Visual_Reasoning_Without_Training_CVPR_2023_paper.html" rel="noopener nofollow" class="external-link is-unresolved" href="https://openaccess.thecvf.com/content/CVPR2023/html/Gupta_Visual_Programming_Compositional_Visual_Reasoning_Without_Training_CVPR_2023_paper.html" target="_self"><em></em></a>Visual Programming: Compositional Visual Reasoning Without Training <br>Great collection of papers on tools: <a rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/zorazrw/awesome-tool-llm" target="_self">https://github.com/zorazrw/awesome-tool-llm</a> <br><a data-tooltip-position="top" aria-label="https://openreview.net/forum?id=dHng2O0Jjr" rel="noopener nofollow" class="external-link is-unresolved" href="https://openreview.net/forum?id=dHng2O0Jjr" target="_self"><em></em></a>ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Long context, extending context
Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/1901.02860" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/1901.02860" target="_self">Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2404.07143" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2404.07143" target="_self">Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2306.15595" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2306.15595" target="_self"><em></em></a>Extending Context Window of Large Language Models via Positional Interpolation <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2401.01325" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2401.01325" target="_self"><em></em></a>LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2404.09173" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2404.09173" target="_self"><em></em></a>TransformerFAM: Feedback attention is working memory Scaling Laws
Readings: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2206.07682" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2206.07682" target="_self">Emergent Abilities of Large Language Models</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2211.02011" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2211.02011" target="_self">Inverse scaling can become U-shaped</a> Optional: <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2001.08361" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2001.08361" target="_self"><em></em></a>Scaling Laws for Neural Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2102.01293" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2102.01293" target="_self"><em></em></a>Scaling Laws for Transfer <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2203.15556" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2203.15556" target="_self"><em></em></a>Training Compute-Optimal Large Language Models <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2304.15004" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2304.15004" target="_self"><em></em></a>Are Emergent Abilities of Large Language Models a Mirage? <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.16264" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.16264" target="_self"><em></em></a>Scaling Data-Constrained Language Models Self-play
<br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2401.01335" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2401.01335" target="_self">Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2401.10020" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2401.10020" target="_self">Self-Rewarding Language Models</a>
LLM Applications: Text mining, user modeling, …
<br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2403.12173" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2403.12173" target="_self">TnT-LLM: Text Mining at Scale with Large Language Models</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2405.16363" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2405.16363" target="_self">LLMs for User Interest Exploration in Large-scale Recommendation Systems</a> Optional: <br><a data-tooltip-position="top" aria-label="https://github.com/tencent-ailab/persona-hub" rel="noopener nofollow" class="external-link is-unresolved" href="https://github.com/tencent-ailab/persona-hub" target="_self">Scaling Synthetic Data Creation with 1,000,000,000 Personas</a> VLM Part 2
<br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2408.11039" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2408.11039" target="_self">Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2408.12528" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2408.12528" target="_self">Show-o: One Single Transformer to Unify Multimodal Understanding and Generation</a>
Model developmentTransformers and New Directions (Linear Attention, Linear RNNs, State Space Models)
<br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2006.16236" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2006.16236" target="_self">Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2004.05150" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2004.05150" target="_self">Longformer: The Long-Document Transformer</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/1904.10509" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/1904.10509" target="_self">Generating Long Sequences with Sparse Transformers</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2006.04768" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2006.04768" target="_self">Linformer: Self-Attention with Linear Complexity</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2111.00396" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2111.00396" target="_self">Efficiently Modeling Long Sequences with Structured State Spaces</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2312.00752" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2312.00752" target="_self">Mamba: Linear-Time Sequence Modeling with Selective State Spaces</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.13048" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.13048" target="_self">RWKV: Reinventing RNNs for the Transformer Era</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2402.19427" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2402.19427" target="_self">Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models</a>
Bias
<br><a data-tooltip-position="top" aria-label="https://aclanthology.org/2020.acl-main.485.pdf" rel="noopener nofollow" class="external-link is-unresolved" href="https://aclanthology.org/2020.acl-main.485.pdf" target="_self">Language (Technology) is Power: A Critical Survey of “Bias” in NLP</a> <br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/1608.07187" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/1608.07187" target="_self">Semantics derived automatically from language corpora contain human-like biases</a> <br><a data-tooltip-position="top" aria-label="https://aclanthology.org/2021.acl-long.416/" rel="noopener nofollow" class="external-link is-unresolved" href="https://aclanthology.org/2021.acl-long.416/" target="_self">StereoSet: Measuring stereotypical bias in pretrained language models</a> <br><a data-tooltip-position="top" aria-label="https://aclanthology.org/2020.emnlp-main.154/" rel="noopener nofollow" class="external-link is-unresolved" href="https://aclanthology.org/2020.emnlp-main.154/" target="_self">CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models</a>
Diffusion Models
<br><a data-tooltip-position="top" aria-label="https://arxiv.org/abs/2305.13655" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/abs/2305.13655" target="_self">LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models</a>
Miscellaneous
<br><a data-tooltip-position="top" aria-label="https://ojs.aaai.org/index.php/AAAI/article/view/21390" rel="noopener nofollow" class="external-link is-unresolved" href="https://ojs.aaai.org/index.php/AAAI/article/view/21390" target="_self">Chess as a Testbed for Language Model State Tracking | Proceedings of the AAAI Conference on Artificial Intelligence</a> Topic: How well can an LLM learn Conway’s Game of Life and be prompted to solve for different tasks, such as: given some NxN space how well can it maximize still life or given the opportunity to modify the state of the system minimizing entropy while maximizing stable life. This could test how well they can reason with very simple rules and how far out it can predict for a highly chaotic systems. These kind of questions are typically for mathematicians and very fast computers with lots of RAM
]]></description><link>advanced-topics/adversarial-retrieval-and-llms/llm-background-literature.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/LLM background literature.md</guid><pubDate>Sat, 18 Apr 2026 19:33:44 GMT</pubDate></item><item><title><![CDATA[13. Emerging_Topics 1 - AI in Trust and Safety]]></title><link>lessons/slides/13.-emerging_topics-1-ai-in-trust-and-safety.html</link><guid isPermaLink="false">Lessons/Slides/13. Emerging_Topics 1 - AI in Trust and Safety.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:32 GMT</pubDate></item><item><title><![CDATA[06. Harassment_and_Hate_Speech]]></title><link>lessons/slides/06.-harassment_and_hate_speech.html</link><guid isPermaLink="false">Lessons/Slides/06. Harassment_and_Hate_Speech.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:28 GMT</pubDate></item><item><title><![CDATA[05. Terrorism_Radicalization_and_Extremism]]></title><link>lessons/slides/05.-terrorism_radicalization_and_extremism.html</link><guid isPermaLink="false">Lessons/Slides/05. Terrorism_Radicalization_and_Extremism.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:25 GMT</pubDate></item><item><title><![CDATA[01. Define Misinfo (Consortium Information Environment)]]></title><link>advanced-topics/misinformation/slides/01.-define-misinfo-(consortium-information-environment).html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/01. Define Misinfo (Consortium Information Environment).pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:21 GMT</pubDate></item><item><title><![CDATA[08. Misinformation (information environment)]]></title><link>lessons/slides/08.-misinformation-(information-environment).html</link><guid isPermaLink="false">Lessons/Slides/08. Misinformation (information environment).pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:21 GMT</pubDate></item><item><title><![CDATA[03. Metrics_and_Measurement]]></title><link>lessons/slides/03.-metrics_and_measurement.html</link><guid isPermaLink="false">Lessons/Slides/03. Metrics_and_Measurement.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:17 GMT</pubDate></item><item><title><![CDATA[02. Content Moderation Overview (Consortium)]]></title><link>advanced-topics/misinformation/slides/02.-content-moderation-overview-(consortium).html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/02. Content Moderation Overview (Consortium).pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:14 GMT</pubDate></item><item><title><![CDATA[04. Content_Moderation]]></title><link>lessons/slides/04.-content_moderation.html</link><guid isPermaLink="false">Lessons/Slides/04. Content_Moderation.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:14 GMT</pubDate></item><item><title><![CDATA[07. Government_Regulation]]></title><link>lessons/slides/07.-government_regulation.html</link><guid isPermaLink="false">Lessons/Slides/07. Government_Regulation.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:10 GMT</pubDate></item><item><title><![CDATA[01. Introduction_to_Trust_and_Safety]]></title><link>lessons/slides/01.-introduction_to_trust_and_safety.html</link><guid isPermaLink="false">Lessons/Slides/01. Introduction_to_Trust_and_Safety.pdf</guid><pubDate>Sat, 18 Apr 2026 16:02:06 GMT</pubDate></item><item><title><![CDATA[02. Large Scale Trust & Safety Systems]]></title><link>lessons/slides/02.-large-scale-trust-&amp;-safety-systems.html</link><guid isPermaLink="false">Lessons/Slides/02. Large Scale Trust &amp; Safety Systems.pdf</guid><pubDate>Wed, 15 Apr 2026 18:37:31 GMT</pubDate></item><item><title><![CDATA[Pasted image 20260415142538]]></title><description><![CDATA[<img src="assets/pasted-image-20260415142538.png" target="_self">]]></description><link>assets/pasted-image-20260415142538.html</link><guid isPermaLink="false">Assets/Pasted image 20260415142538.png</guid><pubDate>Wed, 15 Apr 2026 18:25:38 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20260415142449]]></title><description><![CDATA[<img src="assets/pasted-image-20260415142449.png" target="_self">]]></description><link>assets/pasted-image-20260415142449.html</link><guid isPermaLink="false">Assets/Pasted image 20260415142449.png</guid><pubDate>Wed, 15 Apr 2026 18:24:49 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[How do I make a good classifier]]></title><description><![CDATA[Let’s do an example with binary classification.
Collect data (raw samples). Tabular: pandas Annotate data (multiple annotators → compute IRR to check reliability). Annotators are the humans labeling your data (e.g., deciding whether an instance is positive or negative). Since humans can disagree, IRR (inter-rater reliability) measures how consistently annotators label the same items. Common metrics: Cohen’s Kappa → for two annotators, adjusts for agreement by chance. → sklearn.metrics.cohen_kappa_score Fleiss’ Kappa → for multiple annotators. → statsmodels.stats.inter_rater Krippendorff’s Alpha → general, supports missing labels and different data types. → <a data-tooltip-position="top" aria-label="https://pypi.org/project/krippendorff/" rel="noopener nofollow" class="external-link is-unresolved" href="https://pypi.org/project/krippendorff/" target="_self"><code></code></a>krippendorff High IRR means your labels are reliable and can be trusted for training a classifier. Preprocess (tokenization, normalization, feature engineering, embeddings). Text processing: nltk, spaCy Basic Vectorization: sklearn.feature_extraction.text (CountVectorizer, TfidfVectorizer). Deep embeddings: transformers. Split into train/validation/test sets. Tabular: pandas Handle class imbalance (only on the training data, use validation to tune hyperparams): Downsampling → randomly reduce majority-class samples. Pros: balances quickly, smaller dataset.
Cons: throws away information. Upsampling → duplicate or synthetically generate minority-class samples (e.g., SMOTE). Pros: keeps all data, improves minority class signal. Cons: may overfit (duplicates) or add artifacts (synthetic). imbalanced-learn (imblearn): RandomUnderSampler, RandomOverSampler. SMOTE, ADASYN. Train classifier on the training set (e.g., logistic regression, random forest, neural net, whatever). Classic ML: scikit-learn (LogisticRegression, RandomForestClassifier). Boosting: xgboost, lightgbm Deep learning: pytorch, tensorflow Tune the hyperparameters using the validation set: Evaluate using metrics robust to imbalance (precision, recall, F1, ROC-AUC) on test sets that are left in their original class distribution. sklearn.metrics (accuracy, precision, recall, F1, ROC-AUC) Run an ablation study on important hyperparameters Ablation Study = sweeping or adjusting the threshold over a set of values to study effect. For example, Classifiers output probabilities (e.g., P(y=1|x)). You pick a threshold to turn probabilities into binary predictions. Default = 0.5, but this may not be optimal. For example, this could happen: Lower threshold → ↑ recall, ↓ precision. Higher threshold → ↑ precision, ↓ recall. You can… Plot Precision-Recall curves. Plot ROC curves (TPR vs FPR). Compare metrics at multiple thresholds to select the right trade-off (maximize F1, enforce recall, minimize false positives, etc.). This best value totally depends on your problem. AFTER all of this, evaluate on the test set using metrics robust to imbalance (precision, recall, F1, ROC-AUC) on test sets that are left in their original class distribution (no up/downsampling the test set, leave it as is). sklearn.metrics (accuracy, precision, recall, F1, ROC-AUC). ]]></description><link>advanced-topics/adversarial-retrieval-and-llms/tutorials/how-do-i-make-a-good-classifier.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Tutorials/How do I make a good classifier.md</guid><pubDate>Wed, 15 Apr 2026 18:12:52 GMT</pubDate></item><item><title><![CDATA[AI Worksheets]]></title><description><![CDATA[<a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=ncTHBi8a9uA" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=ncTHBi8a9uA" target="_self">The Fundamental Problem with Neural Networks - Vanishing Gradients</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=l4JUChqJkv4" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=l4JUChqJkv4" target="_self">This is how to take your ML models from great to GOAT</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=JvFrJacbt6U" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=JvFrJacbt6U" target="_self">This is why you should care about unbalanced data .. as a data scientist</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=g_icnDoB1ns" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=g_icnDoB1ns" target="_self">What does it mean to subtract one distribution from another?</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=xOB10eTjoQ8" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=xOB10eTjoQ8" target="_self">Gradient Descent : Data Science Concepts</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=eKIX8F6RP-g&amp;t=283s" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=eKIX8F6RP-g&amp;t=283s" target="_self">Loss Functions : Data Science Basics</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=DyxQUHz4jWg" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=DyxQUHz4jWg" target="_self">Curse of Dimensionality : Data Science Basics</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=8ps_JEW42xs" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=8ps_JEW42xs" target="_self">The Softmax : Data Science Basics</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=Aj7O9qRNJPY" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=Aj7O9qRNJPY" target="_self">The Sigmoid : Data Science Basics</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=LPZh9BOjkQs" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=LPZh9BOjkQs" target="_self">Large Language Models explained briefly</a> So LLMs are a mathematical function that are really good at predicting what word comes next for any piece of text. How does probability come into this? [answer goes here] What is backpropegation? Why do we need it? [answer goes here] LLMs are trained with “the goal of autocompleting a random passage of text from the internet” during pretraining. How is this different from RLHF? [answer goes here] Why are GPUs helpful? [answer goes here] Explain the first step shown above: [answer goes here] Write a list of questions you still have during/after watching this video: [answer goes here] <br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://mariateleki.github.io/pdf/CAFE-Talk.pdf" target="_self">https://mariateleki.github.io/pdf/CAFE-Talk.pdf</a> (these are slides from one of my talks and it was to a vet school so forgive my AI example pls, also skip slides 73 to the end)
How do we represent words with numbers? [answer goes here] Why do we have multiple dimensions in neural networks? [answer goes here] Why are LLMs biased? [answer goes here, hint, see slides 66-67]
<br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/shorts/FJtFZwbvkI4" target="_self">https://www.youtube.com/shorts/FJtFZwbvkI4</a> How do we represent words? [answer goes here] What do the directions mean? [answer goes here] Can we visualize 4D, 5D, 6D? [answer goes here]
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/shorts/9Ejh8pPZu_A" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/shorts/9Ejh8pPZu_A" target="_self">https://www.youtube.com/shorts/9Ejh8pPZu_A</a> How many dimensions does GPT3 have for its word embeddings? [answer goes here] How many dimensions do we usually use to “draw” embeddings when we talk about them? [answer goes here]
<br><a rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/shorts/qzRyCEapjFE" target="_self">https://www.youtube.com/shorts/qzRyCEapjFE</a> What data structure do we use in AI stuff? [answer goes here] What is an embedding space? Like, what is it for? [answer goes here] If text has similar meaning, is it closer together or farther together in the embedding space? [answer goes here] What data types (e.g. text) can we use embeddings for? [answer goes here]
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/shorts/h__DQ3LplK0" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/shorts/h__DQ3LplK0" target="_self">https://www.youtube.com/shorts/h__DQ3LplK0</a> What are embeddings? [answer goes here] Are similar words closer together or farther together? [answer goes here] When someone says “embedding space” what are they talking about? [answer goes here] How do you train a word embedding model? [answer goes here] What is the input and what is the output? [answer goes here] What is this model supposed to learn? [answer goes here] List a few word embedding models: [answer goes here] <br><a data-tooltip-position="top" aria-label="https://youtu.be/viZrOnJclY0?si=ExK6MhOzQvHQmnsF" rel="noopener nofollow" class="external-link is-unresolved" href="https://youtu.be/viZrOnJclY0?si=ExK6MhOzQvHQmnsF" target="_self">Word Embedding and Word2Vec, Clearly Explained!!!</a>
I like this video but it’s REALLY LONG – so totally up to you if you’re curious and want to watch it, but no questions from me on this one
Ok what questions do u have after watching all of this about embeddings? [answer goes here]<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=EiMPQsI2__Y" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=EiMPQsI2__Y" target="_self">Taking Control of LLM Outputs: An Introductory Journey into Logits</a> (watch until 12:00)
^ So this is the whole model, and then we zoom in on the logits to look at how the model selects the next token once it has the logits: What are logits? [answer goes here] Why do we do this softmax thing? (Might need to Google/find other videos to answer this) [answer goes here] How does the model pick the next token using the logits? [answer goes here] So it seems like there are LOTS of ways that the model can pick the next token once it has these logit values… what are some of the ways he talked about in this video? [answer goes here] <br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=jnikMver_CE" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=jnikMver_CE" target="_self">What is Temperature in LLM</a>
How do LLMs generate text? [answer goes here] What are the 3 sampling techniques discussed in this video? [answer goes here] Why doesn’t the LLM give back the same response every time you give it the same input prompt? [answer goes here] How do we use the probability distribution? [answer goes here] Explain greedy sampling – how does it pick tokens? [answer goes here] Ok so this is our overall flow: <br><img alt="Pasted image 20260415141102.png" src="assets/pasted-image-20260415141102.png" target="_self"> What is temperature? [answer goes here] What does a high temperature do to the probability distribution? [answer goes here] What does a low temperature do to the probability distribution? [answer goes here] If I want super stable outputs (like same prompt gives me back same output each time), should I use a high temp or a low temp? [answer goes here] Explain Top-P sampling – how does it pick tokens? [answer goes here] Explain Top-K sampling – how does it pick tokens? [answer goes here] Tbh top-p and top-k seem like overkill?? Why do you think we would use them/what are some situations where it might make sense? [answer goes here]
<br><a data-tooltip-position="top" aria-label="https://dylancastillo.co/posts/seed-temperature-llms.html#seed" rel="noopener nofollow" class="external-link is-unresolved" href="https://dylancastillo.co/posts/seed-temperature-llms.html#seed" target="_self">https://dylancastillo.co/posts/seed-temperature-llms.html#seed</a> So from all the previous stuff, now we know that the different decoding strategies (greedy sampling, top-p/top-k sampling, etc.) need to pull a random number. So question: why do we need to set the seed? [answer goes here] What does it mean for an LLM to give deterministic outputs (in contrast to more creative/random outputs)? [answer goes here] What values do I need to fix/set/freeze/choose to get deterministic outputs? List here: [answer goes here]
Ok what questions do u have after watching all of this about temperature/logits/decoding? [answer goes here]<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=6vThlsJ_ASE" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=6vThlsJ_ASE" target="_self">Transformer Architecture Explained</a>
tbd
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=nZrZOI0oRuw" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=nZrZOI0oRuw" target="_self">Transformer Explained</a>
Why do we need to look at every token compared to every other token? [answer goes here] Why does this make LLMs so expensive? [answer goes here] Why don’t we use some of the cheaper options for this comparison (e.g. linformer, reformer, sparse attention, etc.)? [answer goes here] Why does a long context window ( = long input text) make it more expensive? [answer goes here] So, why are transformers bad at reasoning and symbolic data stuff? [answer goes here]
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=eMlx5fFNoYc&amp;t=1s" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=eMlx5fFNoYc&amp;t=1s" target="_self">Attention in transformers, step-by-step | Deep Learning Chapter 6</a>
tbd
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=8jtAzxUwDj0" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=8jtAzxUwDj0" target="_self">Proximal Policy Optimization (PPO) for LLMs Explained Intuitively</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=xT4jxQUl0X8" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=xT4jxQUl0X8" target="_self">DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=-cRedoYETzQ" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=-cRedoYETzQ" target="_self">Training models with only 4 bits | Fully-Quantized Training</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=1GqYpmLjTRQ" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=1GqYpmLjTRQ" target="_self">Why do we use "e" in the Sigmoid?</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=Eeg1DEeWUjA&amp;t=2s" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=Eeg1DEeWUjA&amp;t=2s" target="_self">Recommender Systems</a> – this one also goes over collaborative filtering a little bit <br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=Fmtorg_dmM0" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=Fmtorg_dmM0" target="_self">Collaborative Filtering : Data Science Concepts</a>
What is the BIG IDEA behind collaborative filtering? Like what idea are we trying to model? [answer goes here] What is the data structure setup for collaborative filtering? [answer goes here] What are the rows? [answer goes here] What are the columns? [answer goes here] What does a rating mean? [answer goes here] What does a blank cell mean? [answer goes here] How do we figure out if U1 is more similar to U2 or U3? [answer goes here] What does cosine similarity tell us about the user relationships? [answer goes here] What does high cosine similarity mean? [answer goes here] What does low cosine similarity mean? [answer goes here] What does the hat on r^ mean? [answer goes here] What do you think about the equation we use for getting the estimated rating (r^)? Do you like it/not like it + why? [answer goes here] What are the 3 big barriers to running collaborative filtering (CF) in the real world? Like, why is it hard/when does CF suck? [answer goes here]
<br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=ZspR5PZemcs" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=ZspR5PZemcs" target="_self">How does Netflix recommend movies? Matrix Factorization</a> <br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=2XegvMul_mE" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=2XegvMul_mE" target="_self">Every Ranking Metric : MRR, MAP, NDCG</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=GCPDWXWN55U" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=GCPDWXWN55U" target="_self">Why is the Formula for F1-Score Unnecessarily Complicated?</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=YroewVVp7SM" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=YroewVVp7SM" target="_self">Learning to Rank - The ML Problem You've Probably Never Heard Of</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=yKwTAcsV8K8" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=yKwTAcsV8K8" target="_self">Ranking Methods : Data Science Concepts</a><br><a data-tooltip-position="top" aria-label="https://www.youtube.com/watch?v=UMHQPStCk2w" rel="noopener nofollow" class="external-link is-unresolved" href="https://www.youtube.com/watch?v=UMHQPStCk2w" target="_self">Can You Solve the Ratings Problem?</a><br><a data-tooltip-position="top" aria-label="https://arxiv.org/pdf/2305.19860" rel="noopener nofollow" class="external-link is-unresolved" href="https://arxiv.org/pdf/2305.19860" target="_self">A survey on large language models for recommendation</a> TODO: feature Maria Teleki's work]]></description><link>advanced-topics/adversarial-retrieval-and-llms/tutorials/ai-worksheets.html</link><guid isPermaLink="false">Advanced Topics/Adversarial Retrieval and LLMs/Tutorials/AI Worksheets.md</guid><pubDate>Wed, 15 Apr 2026 18:11:59 GMT</pubDate></item><item><title><![CDATA[Pasted image 20260415141102]]></title><description><![CDATA[<img src="assets/pasted-image-20260415141102.png" target="_self">]]></description><link>assets/pasted-image-20260415141102.html</link><guid isPermaLink="false">Assets/Pasted image 20260415141102.png</guid><pubDate>Wed, 15 Apr 2026 18:11:02 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20251022162657]]></title><description><![CDATA[<img src="assets/pasted-image-20251022162657.png" target="_self">]]></description><link>assets/pasted-image-20251022162657.html</link><guid isPermaLink="false">Assets/Pasted image 20251022162657.png</guid><pubDate>Wed, 22 Oct 2025 20:26:57 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250919122359]]></title><description><![CDATA[<img src="assets/pasted-image-20250919122359.png" target="_self">]]></description><link>assets/pasted-image-20250919122359.html</link><guid isPermaLink="false">Assets/Pasted image 20250919122359.png</guid><pubDate>Fri, 19 Sep 2025 16:23:59 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250919122344]]></title><description><![CDATA[<img src="assets/pasted-image-20250919122344.png" target="_self">]]></description><link>assets/pasted-image-20250919122344.html</link><guid isPermaLink="false">Assets/Pasted image 20250919122344.png</guid><pubDate>Fri, 19 Sep 2025 16:23:44 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250919122308]]></title><description><![CDATA[<img src="assets/pasted-image-20250919122308.png" target="_self">]]></description><link>assets/pasted-image-20250919122308.html</link><guid isPermaLink="false">Assets/Pasted image 20250919122308.png</guid><pubDate>Fri, 19 Sep 2025 16:23:08 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250919122247]]></title><description><![CDATA[<img src="assets/pasted-image-20250919122247.png" target="_self">]]></description><link>assets/pasted-image-20250919122247.html</link><guid isPermaLink="false">Assets/Pasted image 20250919122247.png</guid><pubDate>Fri, 19 Sep 2025 16:22:47 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250912135828]]></title><description><![CDATA[<img src="assets/pasted-image-20250912135828.png" target="_self">]]></description><link>assets/pasted-image-20250912135828.html</link><guid isPermaLink="false">Assets/Pasted image 20250912135828.png</guid><pubDate>Fri, 12 Sep 2025 17:58:28 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[Pasted image 20250912110052]]></title><description><![CDATA[<img src="assets/pasted-image-20250912110052.png" target="_self">]]></description><link>assets/pasted-image-20250912110052.html</link><guid isPermaLink="false">Assets/Pasted image 20250912110052.png</guid><pubDate>Fri, 12 Sep 2025 15:00:52 GMT</pubDate><enclosure url="." length="0" type="false"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;.&quot;&gt;&lt;/figure&gt;</content:encoded></item><item><title><![CDATA[03. Misinformation Detection Tutorial  (IC2S2_25)]]></title><link>advanced-topics/misinformation/slides/03.-misinformation-detection-tutorial-(ic2s2_25).html</link><guid isPermaLink="false">Advanced Topics/Misinformation/Slides/03. Misinformation Detection Tutorial  (IC2S2_25).pdf</guid><pubDate>Sun, 20 Jul 2025 06:32:34 GMT</pubDate></item><item><title><![CDATA[09. Misinformation Detection Tutorial  (IC2S2_25)]]></title><link>lessons/slides/09.-misinformation-detection-tutorial-(ic2s2_25).html</link><guid isPermaLink="false">Lessons/Slides/09. Misinformation Detection Tutorial  (IC2S2_25).pdf</guid><pubDate>Sun, 20 Jul 2025 06:32:34 GMT</pubDate></item></channel></rss>