Fighting AI Hallucinations One Citation at a Time: Introducing the LLM Citation Verifier

6 minute read

As AI writing assistants become more prevalent in academic and professional settings, we face a growing challenge: how do we maintain the integrity of the scholarly record when AI systems can generate convincing but fabricated citations? This problem has been on my mind lately, so I decided to build a simple tool to address it—and use the project as an opportunity to explore some modern Python practices that have emerged in recent years.

The Problem We’re Solving

Large language models have an unfortunate tendency to hallucinate academic citations. They’ll generate perfectly formatted DOIs that look legitimate but reference papers that don’t exist. This isn’t just an annoyance—it’s a threat to research integrity that affects publishers, researchers, and the broader academic community.

We saw this problem play out dramatically in May 2025, when the White House’s “Make America Healthy Again” report was found to contain multiple citations to nonexistent papers, with experts identifying these as hallmarks of artificial intelligence generation. The Washington Post found that some references included “oaicite” markers attached to URLs—a definitive sign that the research was collected using artificial intelligence. The initial report contained over 500 citations, but at least seven of the cited sources didn’t appear to exist at all.

The problem is particularly acute when AI tools generate content that includes citations, as these fake references can easily slip through editorial review processes if not properly verified. Traditional fact-checking approaches don’t scale when dealing with AI-generated content that might include dozens of citations.

A Learning Project with Real-World Impact

The LLM Citation Verifier started as a personal hobby project to solve a problem I kept encountering, but it also became a playground for exploring modern Python development practices. As someone who’s been coding for years but wanted to catch up on recent ecosystem changes, this project let me experiment with tools like uv for package management and GitHub Copilot as a pair programming partner.

The tool itself is a plugin for Simon Willison’s excellent LLM command-line tool that automatically verifies academic citations against the Crossref database in real-time. It’s designed to catch hallucinated references before they make it into published content.

Key Features

Real-time verification - The tool integrates directly into the LLM generation process, checking citations as they’re created rather than after the fact.

Comprehensive validation - Beyond just checking if a DOI exists, the tool returns full metadata including title, authors, journal, and publication year to help verify context.

Simple integration - Works with any LLM that integrates with the LLM tool, making it easy to add to existing workflows.

Hallucination detection - Clearly flags non-existent DOIs and provides detailed error messages to help identify fabricated citations.

How It Works

The plugin taps into the Crossref API to verify Digital Object Identifiers (DOIs) in real-time. Here’s a typical workflow:

llm -T verify_citation "What's new in dye sensitized solar cells? Check all references." --td

When the LLM generates content with citations, the plugin automatically:

  1. Extracts DOI references from the generated text
  2. Queries the Crossref database to verify each DOI exists
  3. Returns full metadata for valid citations
  4. Flags invalid DOIs with clear error messages

Here’s an example of the plugin in action with a single DOI verification:

❯ llm -T verify_citation "Verify this DOI: 10.1038/nature12373" --td
I'll verify that DOI for you using the Crossref database.
Tool call: verify_citation({'doi': '10.1038/nature12373'})
  {
    "verified": true,
    "doi": "10.1038/nature12373",
    "title": "Nanometre-scale thermometry in a living cell",
    "authors": "G. Kucsko, P. C. Maurer, N. Y. Yao, et al.",
    "journal": "Nature",
    "publisher": "Springer Science and Business Media LLC",
    "year": "2013",
    "url": "https://doi.org/10.1038/nature12373"
  }
The DOI 10.1038/nature12373 has been verified successfully! Here are the details:
**Verified Citation:**
- **Title:** Nanometre-scale thermometry in a living cell
- **Authors:** G. Kucsko, P. C. Maurer, N. Y. Yao, et al.
- **Journal:** Nature
- **Publisher:** Springer Science and Business Media LLC
- **Year:** 2013
- **DOI URL:** https://doi.org/10.1038/nature12373
The DOI is valid and resolves to a legitimate scientific paper published in Nature about nanoscale temperature measurement in living cells.

The --td flag provides transparency by showing the verification process in real-time, so you can see exactly which citations are being checked.

Learning Modern Python Along the Way

Building this tool gave me a chance to explore several Python ecosystem improvements that have emerged in recent years:

Modern package management with uv - I used uv for dependency management and publishing, which is significantly faster than traditional pip-based workflows. The experience was smooth enough that I wrote about it as my first “Today I Learned” post.

GitHub Copilot as a pair programmer - This project was my first serious experiment with Copilot for both writing and testing code. The AI assistance was particularly helpful for generating comprehensive test cases and handling edge cases in the Crossref API integration.

Plugin architecture patterns - Working with the LLM tool’s plugin system taught me about modern Python plugin patterns and how to build tools that integrate cleanly with existing workflows.

Real-World Applications

While this started as a personal project, it addresses several practical use cases I’ve observed in academic and professional contexts:

Editorial workflows - Verify citations in AI-assisted manuscript preparation before publication. No more embarrassing retractions due to fake references.

Research integrity - Audit AI-generated literature reviews and research summaries to ensure all citations are legitimate.

Content quality control - Implement systematic verification for AI writing assistants used in academic contexts.

Fact-checking - Validate suspicious citation claims in submitted manuscripts or peer review processes.

Why This Matters for Publishing

The academic publishing industry is at an inflection point with AI. While these tools can dramatically improve productivity and accessibility, they also introduce new risks to the scholarly record. Research has shown that current AI systems “have considerable room for improvement” when it comes to citation quality, with even the best models lacking complete citation support 50% of the time.

The recent MAHA report controversy demonstrates how citation errors can undermine the credibility of even high-profile documents. As Georges Benjamin from the American Public Health Association noted, “This is not an evidence-based report, and for all practical purposes, it should be junked at this point… It cannot be used for any policymaking. It cannot even be used for any serious discussion, because you can’t believe what’s in it.”

As those of us working in publishing navigate this transition, we need practical tools that help manage AI-related risks without stifling innovation. Tools like this citation verifier represent a grassroots approach to maintaining quality standards while embracing the productivity benefits of AI—though obviously, any production implementation would require much more robust architecture and testing.

Getting Started

Installation is straightforward, assuming you have llm installed:

llm install llm-citation-verifier

The tool works with any topic and integrates seamlessly into existing LLM workflows for models that support tool use. Simply add something like “verify citations” to your prompt and the plugin handles the rest.

For development teams, the plugin’s architecture provides a foundation for building more sophisticated verification systems. The source code is available on GitHub for customization and extension.

Looking Forward

This tool represents one small step toward more trustworthy AI-assisted research—and a fun way to explore modern Python development practices. As the academic community continues to integrate AI tools into research workflows, we need practical solutions that balance innovation with integrity.

The LLM Citation Verifier won’t solve every challenge related to AI and academic publishing, but it addresses a specific, high-impact problem that affects anyone working with AI-generated research content. Plus, building it gave me hands-on experience with tools like uv and GitHub Copilot that I’ll definitely use in future projects.

In a world where AI can fabricate convincing citations in seconds, having automated verification tools isn’t just convenient—it’s essential. Give it a try the next time you’re working with AI-generated research content, and feel free to contribute to the project if you find it useful.

Enabling cookie consent on a Jekyll Minimal Mistakes site 🔗

less than 1 minute read

Paul’s post got me about 80% of the way there, but I was still having issues with

  • A persistent banner on iOS (but not on desktop)
  • Google Analytics cookie still being set if the user clicked decline

I don’t know JavaScript so in the old days I would have been stuck. Instead, I opened up GitHub Copilot in Agent Mode and described the symptoms. It took a couple tries, but it created a test page with buttons for me to test whether cookies were being set correctly, and modified the JavaScript file until it got it right.

I probably could have figured it out given enough time and patience, but frankly I wouldn’t have followed through. Now the site has GDPR-compliant terms and Google Analytics.

TIL: Modern Python Package CI/CD with uv, Trusted Publishing, and GitHub Actions

5 minute read

Today I learned how to set up a complete CI/CD pipeline for Python packages using modern tooling. As a first-time package publisher, I wanted to make sure I was using the current best practices rather than outdated approaches from Stack Overflow posts. Here’s what I discovered about the modern workflow that’s taken over from the old “generate API keys and hope” approach.

The Two-Workflow Pattern

The key insight is separating continuous integration from releases using two different GitHub Actions workflows. This prevents accidental releases while ensuring every release is tested.

CI Workflow (.github/workflows/ci.yml)

This runs on every push to catch issues early during development:

name: CI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11", "3.12"]

    steps:
    - uses: actions/checkout@v4
    
    - name: Set up uv
      uses: astral-sh/setup-uv@v3
    
    - name: Set up Python $
      run: uv python install $
    
    - name: Install dependencies
      run: uv sync --dev
    
    - name: Run linting
      run: uv run ruff check .
    
    - name: Run type checking
      run: uv run mypy src/
    
    - name: Run tests
      run: uv run pytest tests/ -v
    
    - name: Test package build
      run: uv build

Release Workflow (.github/workflows/release.yml)

This runs only on version tags for controlled publishing:

name: Release

on:
  push:
    tags:
      - v*

jobs:
  # First run all the tests
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9", "3.10", "3.11", "3.12"]

    steps:
    - uses: actions/checkout@v4
    
    - name: Set up uv
      uses: astral-sh/setup-uv@v3
    
    - name: Set up Python $
      run: uv python install $
    
    - name: Install dependencies
      run: uv sync --dev
    
    - name: Run linting
      run: uv run ruff check .
    
    - name: Run type checking
      run: uv run mypy src/
    
    - name: Run tests
      run: uv run pytest tests/ -v
    
    - name: Test package build
      run: uv build

  # Only publish if tests pass
  pypi:
    name: Publish to PyPI
    runs-on: ubuntu-latest
    needs: test  # This makes it wait for tests to pass
    environment:
      name: release
    permissions:
      id-token: write
    
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v3
      - run: uv build
      - run: uv publish --trusted-publishing always

The Magic of Trusted Publishing

The biggest game-changer is PyPI’s support for OpenID Connect tokens from GitHub Actions. No more API keys to manage or leak!

Setup Process

  1. Go to PyPI → “Your projects” → “Manage” → “Publishing” → “Add a new pending publisher”
  2. Fill in details:
    • PyPI project name: your-package-name
    • Owner: your-github-username
    • Repository: your-repo-name
    • Workflow: release.yml
    • Environment: release
  3. Create GitHub environment: Settings → Environments → New environment → release

The Security Model

PyPI only accepts packages from the exact combination of:

  • ✅ Specific GitHub repository
  • ✅ Specific workflow file
  • ✅ Specific environment name
  • ✅ Valid OpenID Connect token

No long-lived secrets, no manual token rotation, no security headaches.

Tags vs Commits: The Release Trigger

This was my biggest “aha” moment. The workflow design uses git tags to control releases:

# This triggers CI workflow (tests only)
git add .
git commit -m "Fix citation parser bug"
git push

# This triggers release workflow (tests + publish)
git tag v1.0.0
git push origin v1.0.0

Why This Works

  • Prevents accidents: You can’t accidentally publish by pushing code
  • Ensures testing: Every release runs the full test suite
  • Version control: Tags create clear release points
  • Rollback friendly: Easy to see what was released when

uv Makes Everything Fast

Using uv throughout the pipeline eliminates the traditional Python packaging pain:

Traditional pip approach

pip install -e .[dev] # Slow dependency resolution
python -m pytest      # Hope the environment is right
python -m build       # Fingers crossed
twine upload dist/*   # Manual token management

Modern uv approach

uv sync --dev         # Lightning-fast dependency installation
uv run pytest         # Isolated, reproducible environment
uv build              # Fast, reliable builds
uv publish            # Secure, automatic publishing

The entire CI/CD pipeline runs in under 2 minutes across multiple Python versions.

Project Structure That Works

Your pyproject.toml needs the right configuration:

[project]
name = "your-package-name"
version = "1.0.0"
description = "Your package description"
authors = [{name = "Your Name"}]
license = {text = "MIT"} # or whatever you want
readme = "README.md"
requires-python = ">=3.9"
dependencies = [
    "requests>=2.25.0",
]

[dependency-groups]
dev = [
    "mypy>=1.16.1",
    "pytest>=8.4.0",
    "pytest-cov>=6.1.1",
    "ruff>=0.11.12",  # fast Python linting in Rust
    "types-requests>=2.25.0",  # For mypy type checking
]

[build-system]
requires = ["uv_build>=0.7.19,<0.8.0"]
build-backend = "uv_build"

The Complete Development Flow

Daily Development

# Make changes
git add .
git commit -m "Add new feature"
git push
# → CI runs: linting, type checking, tests across Python versions

Release Process

# Update version in pyproject.toml
git add pyproject.toml
git commit -m "Bump version to 1.0.0"
git push

# Create release
git tag v1.0.0
git push origin v1.0.0
# → Release runs: all tests + publish to PyPI

User Installation

pip install your-package-name

Testing with test.pypi.org

Before going live, test with PyPI’s staging environment:

# In release.yml, temporarily add:
- run: uv publish --trusted-publishing always --publish-url https://test.pypi.org/legacy/

Set up trusted publishing on test.pypi.org first, then users can test install:

pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ your-package-name

The LLM Plugin Installation Gotcha

If you’re building an LLM plugin specifically, there’s a dependency resolution issue with test.pypi.org that I discovered the hard way. The llm install command tries to resolve dependencies from test.pypi.org, but most packages (like requests and llm itself) don’t exist there.

First, make sure you have the LLM tool and the necessary plugins installed:

# Install the LLM tool if you haven't already
pip install llm

# Install the llm-python plugin to get the 'llm python' command
llm install llm-python

Then the workaround is to use LLM’s internal pip with both package indexes:

# This fails - can't find dependencies
llm install --index-url https://test.pypi.org/simple/ your-package-name

# This works - checks test.pypi.org first, falls back to real PyPI for dependencies
llm python -m pip install your-package-name --extra-index-url https://test.pypi.org/simple/

The llm python command runs pip in LLM’s isolated virtual environment, which is especially useful if you installed LLM via Homebrew or pipx. The --extra-index-url flag tells pip to check test.pypi.org for your package but use real PyPI for everything else. This mirrors what your users will experience when installing from real PyPI.

After installation, verify it worked:

llm tools list
# Should show your tool

llm -T your_tool_name "test command" --td
# Should work normally

Why This Matters

What I love about this setup is how it creates a reusable template for all future Python packages. The two-workflow pattern (CI on pushes, releases on tags) combined with trusted publishing gives you automated testing, security without API keys, and fast builds with uv. Once you understand this pattern, setting up the next package takes minutes instead of hours. Publishing Python packages in 2025 is dramatically different from outdated tutorials—the modern approach prioritizes security, reliability, and developer experience. After going through this process once, you have a production-ready pipeline that just works.

Stop Building AI Agents 🔗

less than 1 minute read

Hugo-Bowne Anderson1 argues that agentic workflows shouldn’t be your first choice because of their increased complexity and instability. Remember that GenAI is like a superhuman intern prone to glitchiness and hallucination — now imagine managing a team of them.

…most agent systems break down from too much complexity, not too little. In my demo, I had three agents working together:

  • A researcher agent that could browse web pages
  • A summarizer agent with access to citation tools
  • A coordinator agent that managed task delegation

Pretty standard stuff, right? Except in practice:

  • The researcher ignored the web scraper 70% of the time
  • The summarizer completely forgot to use citations when processing long documents
  • The coordinator threw up its hands when tasks weren’t clearly defined

He recommends trying a simpler pattern first, like prompt chaining or orchestrator-workers.

  1. I learned Python, pandas, and data science from Hugo’s DataCamp classes.