Invoice automation case study: property management
A real case study on cutting 20 hours of weekly invoice processing to 2 hours using Python, pdfplumber, and an LLM fallback pipeline. Full architecture included.

A property management company in the Midlands was spending roughly 20 hours a week on invoice processing. Two people, manually keying data from PDF invoices into their accounting system, cross-referencing against purchase orders, and flagging discrepancies for review.
The invoices came from over 40 different contractors: plumbers, electricians, cleaning companies, maintenance firms. Every supplier used a different format. Some sent structured PDFs, some sent scanned images, and a few still sent paper copies that got scanned at the front desk.
This is the kind of problem where automation makes obvious sense. The work is repetitive, rule-based (mostly), and high-volume enough that even a modest speed improvement saves real money.
What I built
A three-stage extraction and reconciliation pipeline:
Stage 1: PDF processing. Each invoice gets processed through pdfplumber for text extraction. A layout analyser identifies common patterns: vendor name, invoice number, date, line items, subtotal, tax, total. This handles about 60% of invoices with no further processing needed.
Stage 2: Pattern matching. For invoices that don't match clean layout patterns, a library of 40+ regex templates covers the most common supplier formats. These templates were built from historical invoice samples over the first two weeks. This catches another 28%.
Stage 3: LLM fallback. The remaining 12% (usually scanned documents or unusual formats) get processed by an LLM with a constrained output schema. The model extracts the same fields, but the output goes through strict validation before being accepted.
After extraction, every invoice is automatically matched against the purchase order database. The matching logic handles fuzzy date matching (invoices often come a few days after the PO date), partial reference matching, and configurable tolerance thresholds for amounts.
The numbers
| Metric | Before | After | |---|---|---| | Hours/week on invoice processing | 20 | 2 (review only) | | Average processing time per invoice | 8 minutes | 12 seconds | | Error rate (wrong data keyed) | ~4% | 0.3% | | Invoices requiring human review | 100% | 14% |
The 2 hours that remain are spent reviewing the 14% of invoices that the system flags as low-confidence. These are usually the scanned copies, invoices with handwritten amendments, or new suppliers whose format hasn't been seen before.
How it was built
Week 1-2: Data collection and analysis. I collected 200 sample invoices from the previous 6 months, categorised them by format, and identified the extraction patterns. This phase revealed that the scanned paper invoices were the biggest challenge, not the digital PDFs.
Week 3-4: Core pipeline development. Built the three-stage extraction engine, purchase order matching logic, and a review queue interface. Used Python, pdfplumber, pandas, and PostgreSQL. The LLM integration uses Gemini with a structured output schema.
Week 5: Integration and testing. Connected the pipeline to their file drop (new invoices arrive in a shared folder) and accounting system API. Ran 500 historical invoices through the system and compared results against the manually entered data.
Week 6: Go-live and tuning. Soft launch with all invoices still manually verified. After one week of parallel running with zero critical errors, the team switched to review-only mode.
Note
The total build time was 6 weeks. Not because the code took 6 weeks to write, but because the first two weeks of data analysis and pattern building were essential. Skipping that phase would have meant building on assumptions instead of data.
What went wrong
Two things didn't work as planned:
Scanned invoices were worse than expected. The initial plan assumed OCR would produce usable text from scans. In practice, about 30% of scanned invoices had poor enough quality that OCR output was unreliable. The solution was adding a confidence threshold: if the OCR confidence for any key field drops below 0.75, the invoice goes straight to the human review queue rather than attempting extraction.
One supplier's invoices broke everything. A large electrical contractor used a custom invoicing system that generated PDFs with overlapping text layers. pdfplumber extracted garbled text, regex couldn't match, and the LLM hallucinated amounts. The fix was a supplier-specific handler that used a different text extraction method (poppler instead of pdfplumber) for that one supplier. Edge cases like this are why I budget an extra 20% of project time for post-launch tuning.
What it cost
The total project cost was £8,500, broken down roughly as:
- Data analysis and pattern building: £1,500
- Core pipeline development: £4,000
- Integration, testing, and go-live: £2,000
- Post-launch tuning (first month): £1,000
Monthly running costs are approximately £60: £40 for LLM API calls (processing ~300 invoices/month), £15 for VPS hosting, and £5 for monitoring.
At 20 hours/week saved (across two part-time roles), the system paid for itself within 8 weeks of going live.
Lessons for similar projects
Start with the data, not the model. Two weeks of analysing invoice formats sounds slow. But it meant the extraction pipeline handled 88% of invoices on day one, rather than shipping a system that needed constant manual intervention.
Build for human review, not full autonomy. The 14% review rate is a feature, not a bug. A system that tries to handle 100% of cases autonomously will make expensive mistakes. A system that flags uncertain cases for humans is trustworthy enough that people actually use it.
Measure the boring stuff. The 0.3% error rate matters more than the 12-second processing time. For a finance team, accuracy is everything. Speed is nice. Accuracy is non-negotiable.
Budget for edge cases. The electrical contractor's weird PDFs consumed 3 days of debugging. Every project has at least one supplier, format, or dataset that breaks assumptions. Plan for it.
Key Takeaways
- 20 hours/week of manual invoice processing reduced to 2 hours of review.
- Three-stage pipeline (layout analysis, regex patterns, LLM fallback) handles 88% of invoices without human input.
- Total build cost: £8,500. Monthly running cost: £60. Paid for itself in 8 weeks.
- Data analysis in weeks 1-2 was the most valuable phase of the project.
- 14% human review rate is intentional: accuracy matters more than full automation.
Dealing with a similar problem?
Invoice processing is one of the most common automation projects I build. If your team is spending hours on manual data entry from PDFs, there's almost certainly a way to cut that down. Get in touch and I'll take a look at your invoices and give you a realistic assessment.
Related reading:
- How much does AI automation cost for a small business?
- PDF parsing in 2026: what actually works
- My AI automation service -- building extraction and reconciliation pipelines