How I'd build a predictive maintenance model for property management

Property management companies sit on years of maintenance data that nobody uses for anything except filing. Thousands of tickets with dates, property IDs, issue categories, contractor notes, costs, and resolution times. The interesting question: can a model predict which properties are about to have problems before tenants complain?
I built a predictive maintenance model to work through this end-to-end. I wanted to see how messy the data challenges really are, what kind of accuracy is achievable, and whether the approach is practical for a small ops team. Here's what I found.
The data problem
This is the kind of data you'd typically find in a property management system. It looks clean at first glance: a Postgres database with structured fields, timestamps, categories, and status codes. But once you start exploring, the cracks show up fast.
Inconsistent categorisation. "Boiler repair", "Heating issue", "Boiler not working", "No hot water", and "Central heating broken" are all separate categories for essentially the same problem. When staff have been free-typing categories for years, you end up with hundreds of variants.
Missing context. Tickets record what was reported and what was done, but not the root cause. A damp complaint might be caused by a leaking roof, poor ventilation, or a burst pipe. The ticket just says "damp."
Survivorship bias. Properties where the landlord is proactive generate fewer tickets, not because the properties are better, but because problems get fixed before tenants report them. The model needs to account for this, or it learns "fewer tickets = good property" when the reality is more nuanced.
Data cleaning and feature engineering took the largest share of the project time. This isn't the glamorous part, but it's where ML projects succeed or fail.
Feature engineering
Raw ticket data isn't useful for prediction. A model can't learn patterns from "boiler not working on 14 January." It needs structured features that capture the underlying signal.
Here's what I built:
Rolling maintenance frequency. For each property, the number of tickets in the last 30, 60, and 90 days. Properties with accelerating ticket rates are the strongest signal.
Category clustering. Grouping the 200+ messy categories into 15 clean ones using a combination of keyword matching and a small text classifier. "Boiler repair", "Heating issue", and "No hot water" all became "Heating."
Seasonal baselines. Some issues are seasonal. Boiler complaints spike in October. Damp reports peak in January. The model needs to know that a boiler ticket in October is normal, but one in July is unusual.
Property age and type. Older properties and different construction types have different maintenance profiles. A Victorian terrace and a 2015 new-build have fundamentally different failure modes.
Contractor resolution patterns. If the same issue at the same property gets "resolved" three times in six months by the same contractor, the fix probably isn't sticking. I created a "repeat resolution" flag for this.
The model
With clean features, the modelling part was relatively straightforward. I tried three approaches:
- Gradient boosted trees (
XGBoost): best overall performance - Logistic regression: good baseline, too simple for the interaction effects
- Random forest: close to XGBoost but slightly less accurate
XGBoost won. The target variable was binary: "will this property generate a maintenance ticket in the next 14 days?" Training on four years of data and validating on the most recent year is a sensible split.
The top predictive features, in order:
- Rolling 30-day ticket count (most important by a wide margin)
- Days since last ticket in the same category
- Repeat resolution flag
- Property age
- Seasonal deviation from baseline
On my test set, the model achieved 78% precision and 72% recall. In plain terms: when it flags a property, it's right about 4 out of 5 times. And it catches about 7 out of 10 actual issues before they'd be reported.
Not perfect. But far better than "wait for the phone to ring."
How this would run in production
The model runs once daily as a scheduled Python job. It pulls the latest ticket data, generates features, runs predictions, and outputs a prioritised list of properties likely to need attention in the next two weeks.
That list goes into a simple dashboard (built with Streamlit for speed, not beauty) that an operations team would check each morning. Each flagged property shows:
- The predicted issue category
- The confidence score
- Recent maintenance history
- Landlord contact details
The team then does one of three things: schedule a proactive inspection, contact the landlord about preventive maintenance, or add it to their watch list for the next routine visit.
Note
The model doesn't decide what to do. It surfaces risk. The operations team makes the call. This is a deliberate design choice that makes buy-in much easier.
What this kind of system can deliver
Based on the model's accuracy and published case studies of similar proactive maintenance approaches:
- Proactive interventions: catching 50-60% of issues before tenants report them is realistic with this accuracy level
- Repair costs: early intervention prevents cascading failures, which typically reduces average repair cost by 30-40%
- Tenant satisfaction: fewer emergency complaints means fewer escalations
- Landlord retention: proactive maintenance is consistently cited as a top factor in landlords staying with an agency
The cultural shift matters too. Operations teams go from reactive firefighting to proactive property care. That changes the job in a meaningful way.
Lessons for anyone building predictive models on business data
Your data is dirtier than you think. Budget at least 40% of the project time for data cleaning and feature engineering. I've never seen business data that was ready to model on day one.
Simple models with good features beat complex models with raw data. I could have thrown a deep learning model at raw ticket text. The XGBoost model with hand-crafted features was faster to build, easier to explain, and performed better.
Explain the model or nobody will use it. An operations team won't trust a system until they can see why it flagged something. "This property is flagged because it's had 3 heating tickets in 40 days and the last repair was done by the same contractor" is actionable. A black-box score isn't.
Key Takeaways
- Data cleaning and feature engineering are the real work. The model is the easy part.
- Gradient boosted trees (XGBoost) are still hard to beat for structured business data.
- Build for human decision-making, not automation. Surface risk, let people decide.
- Explain every prediction. Adoption depends on trust, and trust depends on transparency.
- Proactive beats reactive: catching issues early reduces costs and improves satisfaction.
Could this work for your data?
If your business sits on years of structured operational data, some version of this approach probably applies. It doesn't have to be property maintenance. The same pattern works for equipment servicing, fleet management, or customer churn prediction.
The common thread: you have historical records of problems, and you want to predict the next one before it happens.
If you've got the data and you're curious whether a predictive model makes sense, let's talk. I'll take an honest look at what you've got and tell you whether it's worth building.