How I'd automate invoice reconciliation for a small business

Invoice reconciliation is one of those problems that sounds simple but eats entire days. An accounts team receives delivery reports from couriers and invoices from suppliers, then has to match them line by line. Copy-paste from PDFs into spreadsheets, cross-reference numbers, flag mismatches. It's the kind of work that's tedious, error-prone, and ripe for automation.
I built a pipeline to show how this problem can be solved. Not a toy demo, but a proper end-to-end system that handles the real-world messiness: inconsistent PDF formats, multiple data sources, and the inevitable edge cases that need human judgement. Here's how it works and what I learned building it.
The problem in detail
In a typical logistics or wholesale business, the reconciliation process looks like this:
- Delivery reports arrive as CSVs from multiple courier APIs (different formats, different column names)
- Invoices arrive as PDF attachments from suppliers (no standard format across suppliers)
- Someone spends hours manually matching these against each other
- Errors creep in because the work is repetitive and mind-numbing
- Disputes with suppliers are slow because digging up the original data takes ages
The key insight: this isn't really an AI problem. It's a data pipeline problem with a small AI layer on top.
What I built
The system has four stages, and only one of them uses anything you'd call "AI."
Stage 1: Ingest. A Python script pulls CSVs from courier APIs on a daily cron schedule. Normalises the column names, timestamps, and currency formats into a single schema, then loads them into a Postgres table.
Stage 2: Extract. PDF invoices get dropped into a watched folder (or forwarded to a dedicated email address). I used a combination of pdfplumber for well-structured PDFs and a small vision model for the messy ones. Each invoice gets parsed into structured line items.
Stage 3: Match. This is the core logic and it's mostly deterministic. Fuzzy matching on reference numbers, date-range overlap checks, and amount tolerance thresholds. For the 5-10% of invoices that don't match cleanly, a lightweight classifier flags them for human review with a confidence score and a reason.
Stage 4: Report. A daily summary email with matched items, flagged items, and anything that failed to parse. A human reviews the flagged ones, approves or rejects, and the system learns from corrections over time.
What surprised me
Most of the value comes from stage 1, not stage 3. Just normalising data from multiple sources into one table eliminates the bulk of manual work. In a real business, people often don't realise how much time they spend simply finding and formatting data before they can even start matching.
PDF parsing is still painful in 2026. Every supplier formats invoices differently. Some use proper table structures, others are basically images with text on top. I ended up maintaining a small library of parser profiles, one per supplier type. Not elegant, but reliable.
The AI classifier was the easy part. Once you have clean, structured data on both sides, training a model to flag mismatches is straightforward. The hard engineering was everything around it: the ingestion, the normalisation, the error handling, the retry logic for failed API calls.
Key Takeaways
- Start with the data pipeline, not the model. Clean data solves more problems than clever AI.
- Automate the 80% that's deterministic. Use AI only for the ambiguous remainder.
- Build for human review, not full autonomy. People trust systems they can override.
- The ROI on these projects is in time saved, not in replacing people.
Where the numbers would land
Based on the pipeline's throughput and accuracy in testing, here's what a business processing a moderate volume of invoices could realistically expect:
- Reconciliation time: a 20+ hour weekly task could drop to roughly 2-3 hours of review
- Error rate: manual reconciliation typically runs 3-4% errors; automated matching with human review can push that below 0.5%
- Dispute resolution: from days of digging through files to same-day, because all the data is searchable
- Payback period: for a business spending 20+ hours a week on this, the build cost pays for itself within a couple of months
Would this work for your business?
If your team spends significant time on repetitive data work, matching, checking, copying between systems, and running the same reports, then some version of this approach probably applies.
The specifics will be different. Maybe it's purchase orders instead of invoices. Maybe it's Xero instead of Postgres. But the pattern is the same: normalise the data, automate the obvious matches, and flag the rest for a human.
Tip
The best automation projects aren't the ones that replace people. They're the ones that give people back the time to do work that actually matters.
If this sounds like the kind of problem you're dealing with, get in touch. I'm happy to have a quick chat about whether automation is the right fit.