Assignment 3 - Podcast Factchecking
Credits: Irmetova, A., Liu, H., Teleki, M., Carragher, P., Zhang, J., & Caverlee, J. (2026). PodChecker: An Interpretable Fact-Checking Companion for Podcasts. GitHub
Podcasts are one of the fastest-growing media formats worldwide, yet they receive almost none of the editorial oversight applied to broadcast journalism. Hosts and guests regularly make factual claims — about science, politics, health, history, economics — without correction, rebuttal, or verification. This makes podcasts a significant and underexplored surface for trust and safety concerns: misinformation, misleading framing, unverifiable assertions, and coordinated narrative-pushing can all enter the information ecosystem through podcast audio without triggering any of the automated moderation systems that operate on text.
In this assignment, you will use PodChecker — an automated fact-checking pipeline for podcasts — to collect, analyze, and critically evaluate the factual claims made across a corpus of podcast episodes. PodChecker ingests podcast audio (via file upload or RSS feed), transcribes it using OpenAI Whisper, extracts atomic factual claims using an LLM, and fact-checks each claim using Perplexity's web-search API. The result is a claim-level credibility report — verdict (true / false / misleading / unverifiable) with supporting source URLs — and an episode-level credibility score.
Recall that the Bluesky assignment focused on proactive, real-time moderation of individual posts. PodChecker asks a different question: what does applied computational research look like when the medium is audio, the content is long-form, and the platform has no built-in moderation infrastructure? By the end of this assignment you will have hands-on experience with the full pipeline from data collection to analysis to critical evaluation — the same research cycle used in real trust and safety science.
Your Task
You will select a podcast corpus relevant to a trust and safety harm of your choice, run the PodChecker pipeline across a set of episodes, and produce a written analysis of your findings. In the final milestone, you will either extend the system with a new capability, replicate and apply it to a new domain, or critically evaluate its limitations — documenting your process and findings in a short video presentation.
There is no auto-grader for this assignment. Your grade depends on the quality of your corpus selection rationale, the rigor of your analysis, and the depth of your critical reflection — not on whether PodChecker produces a particular score.
Ethical Guidelines
Podcasts are public media, but analysis of named hosts and guests carries ethical responsibilities. Follow the course's policy on engaging with harmful content throughout this assignment:
- Do not analyze content depicting child exploitation, solicitation of illegal activity, or other severely harmful material. If your chosen podcast unexpectedly contains such content, stop and consult an instructor before proceeding.
- Be precise in your claims about speakers. Reporting that PodChecker labeled a claim as "false" is different from asserting that the host deliberately lied. Automated fact-checking has error rates; your writeup should reflect this.
- Do not publish or publicly share individual claim-level verdicts about named people without explicit instructor approval. The analysis is for academic purposes.
- API costs are real. PodChecker consumes OpenAI Whisper (transcription) and Perplexity Sonar (fact-checking) API calls. Budget your usage — run on a small sample first, cache results aggressively, and use the
MAX_AUDIO_SIZE_MBcap to control costs. Discuss API cost management in your writeup.
Deliverables
- Analysis notebook (
analysis/my_corpus_analysis.ipynb): a documented Jupyter notebook containing your corpus collection, credibility analysis, and visualizations. - Code for any extensions (Milestone 3, Track A): well-commented Python, placed in
analysis/with a shortREADMEexplaining how to run it. - A 10-minute recorded video presentation covering all three milestones (see Presentation Guidelines).
- Your presentation slides or any other materials used in the video.
PodChecker Infrastructure
PodChecker is a research prototype with two usage modes:
Web application — a React + Flask stack that accepts an audio file or RSS feed URL, runs the full pipeline, and renders a results table in the browser. This is the easiest way to verify that the system is working.
Python analysis client — the PodCheckerClient class in analysis/podchecker_client.py allows you to run the pipeline programmatically across many episodes, with built-in audio and results caching. This is what you should use for your corpus analysis.
Pipeline
RSS feed / audio file
↓
Whisper (small.en) ← OpenAI API
↓
transcript (text)
↓
Claim extraction ← OpenAI API (GPT-4o)
↓
Fact-checking loop ← Perplexity Sonar (web search)
↓
claim-level verdicts true / false / misleading / unverifiable
↓
credibility score (true=100%, false=0%, misleading=50%,
unverifiable=excluded from score)
Source reliability ratings (from site/backend/filtered_attrs.csv) assign a 1–6 quality score to fact-check sources; sources rated ≥ 5 are marked "trusted" with a star prefix in results.
Installation and Setup
Requirements
- Python 3.10+
- Node.js 18+ and npm
- ffmpeg (required for Whisper audio processing)
- OpenAI API key (for transcription and claim extraction)
- Perplexity API key (for web-search fact-checking)
Step 1 — Clone the repository
git clone https://github.com/annatastic/PodChecker.git
cd PodChecker
Step 2 — Install ffmpeg
macOS (Homebrew):
brew install ffmpeg
Windows: Download from ffmpeg.org/download.html and add to PATH.
Ubuntu/Debian:
sudo apt install ffmpeg
Verify: ffmpeg -version
Step 3 — Install Python dependencies
cd site/backend
pip3 install --upgrade pip
pip3 install pandas openai openai-whisper perplexityai feedparser requests flask flask-cors
For the analysis notebook only (no backend server needed):
pip3 install pandas openai openai-whisper perplexityai feedparser requests matplotlib
Step 4 — Set API keys
Set your keys as environment variables (do not hard-code them in notebooks you submit):
export OPENAI_API_KEY="sk-..."
export PERPLEXITY_API_KEY="pplx-..."
Or use a .env file (add .env to .gitignore before committing anything):
OPENAI_API_KEY=sk-...
PERPLEXITY_API_KEY=pplx-...
Step 5 — Verify the installation
Run the web application to confirm the full pipeline works:
# Terminal 1: start the backend
cd site/backend
python3 app.py # runs on port 8000
# Terminal 2: start the frontend
cd site/frontend
npm install
npm run dev # runs on port 5173
Open http://localhost:5173 in a browser. Use the sample report dropdown to verify that a pre-processed result loads correctly. You should not need to call any APIs to view sample reports.
Milestone 1: Setup and Corpus Selection
Due: end of Week 14 (submit as a brief written memo, ≤ 500 words + sample output)
Your goal
Select a podcast that is relevant to a trust and safety harm of your choice and verify that PodChecker can process it. Your podcast choice should be motivated by a specific T&S concern — not simply by personal interest or convenience.
Choosing a podcast
Good corpus choices share these properties:
- T&S relevance — the podcast regularly discusses topics where false or misleading claims could cause real-world harm (health misinformation, political disinformation, financial fraud, extremist rhetoric, etc.)
- Public RSS feed — the podcast is accessible via a public RSS feed or individual episode audio URLs; this is what PodChecker uses to ingest content
- Sufficient volume — the podcast has at least 15 recent episodes you can analyze; older archives are fine if they cover a coherent time period
- English audio — Whisper's
small.enmodel performs best on English; non-English podcasts require the multilingual model (you may switch to it, but note this in your writeup)
Here are illustrative examples — you are not limited to these:
| Domain | Example podcasts |
|---|---|
| Health & medicine | Podcasts featuring alternative medicine, supplement promotion, or anti-vaccine content |
| Political commentary | Podcasts with strong partisan framing on contested empirical topics (e.g., crime statistics, immigration, election integrity) |
| Financial advice | Crypto, real estate, or investment podcasts making specific performance claims |
| Science popularization | Podcasts that translate academic research — a useful control case since many claims will be verifiable |
| Conspiracy / extremism | Podcasts associated with known information operations or radicalization pipelines |
Deliverable
Submit a short memo covering:
- Podcast name, RSS URL, and episode count in the period you plan to analyze.
- T&S rationale — which harm type are you investigating, and why is this podcast a good data source for it?
- Sample output — run PodChecker on 1–2 episodes (use the web interface or the analysis client), and include a screenshot or table of the claim-level results.
- API cost estimate — based on your sample run, estimate the total OpenAI and Perplexity API cost for your full corpus (use the usage stats printed by the API calls). Propose a
MAX_AUDIO_SIZE_MBcap if needed.
Milestone 2: Corpus Analysis
Due: end of Week 15 (with Milestone 3)
Your goal
Run PodChecker across a corpus of at least 15 episodes and produce a rigorous quantitative analysis of claim-level credibility across your corpus.
Running the analysis client
Use analysis/episode_credibility_analysis.ipynb as your starting point. Adapt it for your corpus by:
- Pointing it at your RSS feed (or a list of episode audio URLs if no RSS is available):
from analysis import get_recent_episodes, compute_credibility_percentage, PodCheckerClient
RSS_PATH = "my_podcast_rss.xml" # local copy of the RSS feed
NUM_EPISODES = 15
episodes = get_recent_episodes(RSS_PATH, NUM_EPISODES)
- Initializing the client (always use
mode="local"for corpus analysis — it is faster and cheaper than the HTTP mode):
client = PodCheckerClient(
openai_api_key=OPENAI_API_KEY,
perplexity_api_key=PERPLEXITY_API_KEY,
mode="local",
max_audio_size_mb=60 # adjust based on your API budget
)
- Running the analysis loop — results are cached to
data/automatically so you can re-run the notebook without incurring API costs for already-processed episodes:
episode_results = []
for episode in episodes:
result = client.analyze_episode(episode, podcast_name="MyPodcast")
episode_results.append({'episode': episode, 'result': result})
Required analyses
Your notebook must include the following:
A. Episode credibility over time — a line plot of credibility score (0–100%) across episode dates, following the template in the starter notebook. Annotate any notable outliers.
B. Claim-level verdict distribution — a bar chart or table showing the proportion of claims labeled true, false, misleading, and unverifiable across the full corpus. Compute this both per-episode and aggregated.
C. Error analysis — for at least 10 claims that received a "false" or "misleading" verdict, manually verify the verdict by checking the supporting sources PodChecker provides. Report:
- How many did you agree with? Disagree? Find ambiguous?
- What patterns explain errors (hallucinated sources, out-of-date information, opinion framed as fact)?
D. Failure modes — document any episodes that failed to process (rate limits, audio access errors, truncation) and how you handled them. What fraction of your corpus is usable?
E. Cost accounting — report the actual API costs incurred (OpenAI token counts for Whisper + GPT-4o, Perplexity call count). Compare to your Milestone 1 estimate.
Milestone 3: Extension, Replication, or Critical Evaluation
Due: end of Week 15 (submitted together with Milestone 2)
Choose one of the three tracks below. All tracks have equivalent weight in the grading rubric. Your choice should reflect what you find most interesting and what is most tractable given your corpus.
Track A: Extension
Extend PodChecker with a new capability that addresses a gap in the current system. Examples:
- Multi-podcast comparison — run PodChecker on two or more podcasts covering the same topic (e.g., two podcasts on the same health topic with different credibility reputations) and compare their credibility profiles quantitatively.
- Claim-type taxonomy — add a claim-type classifier that categorizes claims before fact-checking (e.g., statistical claims, causal claims, identity claims, predictions) and analyze how credibility varies by claim type.
- Alternative fact-checking source — replace or supplement Perplexity with a structured fact-check database (e.g., Google Fact Check Tools API, ClaimBuster, or Community Notes data) and compare the verdicts produced by each source on the same claims.
- Temporal trend detection — identify claims that recur across episodes and analyze how their veracity changes over time (e.g., does a podcast host's credibility improve after a public correction?).
Your extension should include working code and a brief evaluation demonstrating that it produces meaningful results on your corpus.
Track B: Replication and Application
Replicate the core PodChecker analysis from Irmetova et al. (2026) on a new podcast corpus and apply the findings to a specific T&S question. This track is appropriate if your main interest is empirical analysis rather than system development.
Your writeup should:
- Describe how your corpus differs from the paper's (podcast genre, time period, harm type).
- Report credibility scores, claim distributions, and source reliability breakdowns comparable to the paper's results.
- Apply the findings to a specific T&S question — for example: Does credibility score correlate with the podcast's media bias rating from an external source? Do episodes featuring certain guest types (politicians, scientists, activists) have systematically different verdicts?
- Critically discuss what the system gets right and what it misses for your specific harm type.
Track C: Critical Evaluation
Conduct a systematic evaluation of PodChecker's accuracy, limitations, and potential for harm — without building a new extension or collecting a large new corpus. This track is appropriate if you want to focus on evaluation methodology and ethical analysis.
Your evaluation should address at least three of the following:
- Precision and recall — manually fact-check a stratified sample of claims (e.g., 30–50 claims) and compute precision, recall, and F1 against your ground-truth labels for each verdict category.
- Hallucination analysis — examine the supporting source URLs PodChecker provides. How often do the URLs actually support the verdict? How often are they irrelevant or broken?
- Claim extraction quality — evaluate whether the claims Whisper + GPT-4o extract are the important claims from the episode, or whether the system over- or under-samples certain claim types.
- Domain sensitivity — test the system on a domain where automated fact-checking is particularly risky (contested political topics, rapidly evolving science) and analyze where human judgment would be required.
- Bias and representation — does the system systematically produce different verdicts for claims made by speakers of different political affiliations, genders, or expertise levels? Design a small experiment to test this.
Presentation Guidelines
Record a ~10-minute video covering all milestones. You will submit the video along with your notebook and slides. Structure your presentation as follows:
- Podcast corpus and T&S rationale (2 min) — introduce the podcast(s) you chose, the harm type you are investigating, and why this corpus is an interesting subject for T&S research.
- System overview (1 min) — briefly explain how PodChecker works (pipeline diagram from the README is fine); you can assume your audience knows what transcription and LLMs are.
- Corpus analysis results (3 min) — walk through your key findings from Milestone 2: credibility trend over time, verdict distribution, error analysis highlights. Show at least one chart.
- Extension / Replication / Critical Evaluation (2 min) — present your Milestone 3 track: what you built or analyzed, what you found, and what surprised you.
- Ethical reflection and limitations (1 min) — what are the risks of deploying a system like PodChecker at scale? What should a T&S practitioner know before using automated podcast fact-checking?
- Future directions (30 sec) — one specific, actionable improvement you would make if you had more time.
Testing and Evaluation
Unlike the Bluesky assignment (which has an auto-grader), there is no single correctness score for this assignment. Instead, demonstrate rigor through:
- Reproducibility — your notebook should run end-to-end from a clean environment using cached data. Include a
requirements.txtor environment spec. - Sample size — at minimum 15 episodes with analyzable audio. Justify your sample size in the writeup.
- Manual verification — the error analysis in Milestone 2C is the closest thing to ground-truth evaluation; take it seriously. Spot-checking 10 claims is the minimum; more is better.
- Quantitative reporting — report credibility scores with summary statistics (mean, median, standard deviation); report claim counts and verdict breakdowns with exact numbers; plot over time when the corpus spans more than a week.
- Acknowledgment of failures — episodes that failed to process, API rate limits hit, audio truncations, and claims that were too ambiguous to verify are all expected and should be documented, not hidden.
Resources
PodChecker
- GitHub repository — source code, README, sample reports
analysis/episode_credibility_analysis.ipynb— starter notebook for corpus analysisanalysis/podchecker_client.py—PodCheckerClientAPI documentation (docstrings)analysis/rss_utils.py—get_recent_episodes,compute_credibility_percentagefunctions
APIs
- OpenAI API pricing — Whisper: $0.006/min of audio; GPT-4o: varies by token count
- Perplexity API docs — Sonar model pricing per search call
- Google Fact Check Tools API — free structured fact-check database (useful for Track A/B)
Podcast data sources
- Listen Notes API — podcast search and RSS discovery
- Podcast Index — open podcast RSS directory
- Most major podcast platforms (Spotify, Apple Podcasts) publish RSS feeds for public shows
T&S research context
- Course reading list — Misinformation sections (Lectures 17–21) for background on claim detection, source credibility, and intervention design
- Irmetova et al. (2026) — the PodChecker paper; available in
PodChecker/folder
Grading
| Component | Weight | An excellent submission will… |
|---|---|---|
| Milestone 1: Corpus selection and setup | 15% | Clearly motivate the podcast choice with a specific T&S harm type; demonstrate that PodChecker runs on a sample episode; provide a realistic API cost estimate with a clear management plan |
| Milestone 2: Corpus analysis | 40% | Analyze ≥ 15 episodes; produce all required visualizations; perform a careful manual error analysis on ≥ 10 false/misleading verdicts; document failures and costs honestly; interpret findings in terms of the T&S harm being studied rather than just reporting numbers |
| Milestone 3: Extension / Replication / Evaluation | 35% | For Track A: deliver working code with a meaningful evaluation showing the extension produces new insight. For Track B: make a specific empirical argument about the T&S question using the replication results. For Track C: design a rigorous evaluation, report quantitative results, and draw conclusions that go beyond "the system makes mistakes" |
| Presentation quality | 10% | Well-paced, clearly narrated video; results are shown rather than described; ethical reflection is substantive rather than perfunctory; the narrative arc from corpus → analysis → conclusion is coherent |