What Happens After the AI MVP Ships

Jamie Blair21 January 20267 min read

The hardest part of an AI project is not building the first version. It is keeping it running well six months later.

I have shipped dozens of AI systems for small businesses. The ones that deliver long-term value all share the same post-launch discipline. The ones that get abandoned share the same gaps. Here is what happens after the MVP goes live, and what you need to get right.

The first two weeks: the honeymoon period

Everything looks great. The model is accurate because the test data is fresh. Users are excited because it is new. Stakeholders are happy because it works.

This is the most dangerous period. The temptation is to move on to the next project. Do not. The first two weeks are when you need to watch the system most closely.

What I monitor:

Prediction confidence distribution. Are most predictions high-confidence (above 0.85), or is the model frequently uncertain? A shift toward lower confidence scores is an early warning.
Error rates by category. The model might be 95% accurate overall but 60% accurate for a specific category that was underrepresented in training data.
User override rates. If users are correcting the model's output more than 10-15% of the time, something is off. If they are never correcting it, they might not be checking.
Latency and throughput. Real production traffic behaves differently from test data. Response times that were 200ms in staging might hit 800ms when 50 users are hitting the system simultaneously.

Month one to three: the drift begins

This is where most AI projects start to quietly degrade. The model was trained on historical data, but the real world moves on.

Why models drift

Three things cause accuracy to drop after launch:

Data distribution shifts. The data your users send in does not match the data you trained on. A support ticket classifier trained on last year's tickets will struggle when a new product launches and generates entirely new complaint patterns.

Seasonal patterns. A demand forecaster built on summer data will underperform in December. An email classifier trained during normal operations will fail during a marketing campaign that doubles inbound volume.

Upstream changes. Someone changes a form field, a third-party API returns data in a different format, or a new data source gets added. The model does not know about these changes. It just starts getting inputs it has never seen.

How to detect drift

I set up automated monitoring that tracks three metrics weekly:

Accuracy on labelled samples. Reserve 5-10% of production data for manual labelling. Compare model predictions against human labels. If accuracy drops more than 5 percentage points from baseline, it is time to investigate.
Input distribution statistics. Track the statistical properties of incoming data (mean values, category frequencies, text length distributions). Major shifts in these metrics signal that the data has changed.
Prediction distribution. If the model suddenly starts predicting one category 40% more often than it did last month, either the real world changed or the model is drifting.

The feedback loop: the most important system you will build

The single highest-value post-launch investment is a structured feedback mechanism. Here is how it works:

Users flag bad predictions. When the model gets something wrong, the user marks it. This needs to be trivially easy: a thumbs down button, a "wrong category" dropdown, or a simple "not helpful" flag.
Flagged items get reviewed. A human (often the same user) provides the correct answer. This becomes a labelled training example.
Labels accumulate. Over weeks, you build a dataset of real-world corrections. This data is gold because it represents exactly the cases where the model fails.
Periodic retraining. Every 4-8 weeks (or when accuracy drops below threshold), retrain the model using the original training data plus the new correction data. Test the new model against the held-out sample before deploying.
Deploy and monitor. Swap the model, monitor the metrics, and start the cycle again.

The businesses that do this well see their models improve over time, not degrade. Each retraining cycle patches the weaknesses that users surfaced. After 6 months, the model is significantly better than the V1 that shipped. After a year, it is handling edge cases that would have been impossible to predict at launch.

The businesses that skip the feedback loop get a model that slowly becomes less useful, users who stop trusting it, and a system that gets turned off within 6-12 months.

What does maintenance actually cost?

Here is a realistic annual maintenance budget for a small-business AI system:

| Activity | Frequency | Annual cost | |----------|-----------|-------------| | Monitoring and alerting | Continuous | £500-£1,000 | | Model retraining | Every 6-8 weeks | £1,000-£2,000 | | Bug fixes and edge cases | As needed | £500-£1,500 | | Infrastructure (hosting, APIs) | Monthly | £600-£2,400 | | Quarterly review and improvements | 4x/year | £1,000-£2,000 | | Total | | £3,600-£8,900 |

For context, this is typically 15-25% of the original build cost. It is a fraction of what the system saves, but it is not zero. Budget for it from day one. The clients who build AI systems and assume zero ongoing cost are the ones who end up with dead projects.

What "done" looks like

An AI system is never really "done" in the traditional software sense. But it reaches a steady state where the maintenance workload is predictable and the value is consistent.

Signs your AI system has reached maturity:

Accuracy is stable. Retraining cycles produce small improvements (1-2%), not large swings. The model has seen enough real-world data to handle most cases.
Users trust it. Override rates are low (under 5%). Users rely on the model's output rather than double-checking everything.
Edge cases are documented. You know what the model is bad at, and those cases are routed to humans. No surprises.
Maintenance is routine. Retraining is a scheduled task, not an emergency. Monitoring runs in the background. The system does not demand constant attention.

Most systems reach this state within 6-12 months of active maintenance. After that, the annual cost drops because retraining cycles become less frequent and edge cases are mostly handled.

The three mistakes that kill AI projects post-launch

1. No monitoring. The model ships, nobody watches the metrics, accuracy degrades, users lose trust, the system gets abandoned. This is the most common failure mode.

2. No feedback mechanism. Without a way for users to flag errors, you have no data to improve the model. And users have no way to communicate that the system is failing. They just stop using it.

3. No maintenance budget. The project gets funded as a one-off. When maintenance needs arise (and they always do), there is no budget and no plan. The system limps along until someone switches it off.

All three are avoidable with upfront planning. I always include monitoring, feedback, and a maintenance plan in the initial project scope. It adds 10-15% to the proposal but prevents the project from failing after launch.

Key Takeaways

The MVP is the starting line, not the finish line. Plan for 6-12 months of active maintenance.
Model drift is inevitable. Automated monitoring catches it early, before users lose trust.
Build a feedback loop: users flag errors, you retrain, accuracy improves over time.
Budget 15-25% of build cost annually for maintenance. This is not optional.
A mature AI system reaches steady state in 6-12 months with consistent maintenance.

If you have an AI system that launched well but is starting to drift, or if you are planning a build and want to make sure the post-launch plan is solid, reach out. I help businesses with the full lifecycle, not just the initial build.