← All posts
Customer SupportNLPAITechnical

How I'd build an AI triage system for e-commerce support

5 min read
AI mechanism rapidly triages e-commerce support tickets, reducing first-response times and improving CSAT.

E-commerce support teams run into the same bottleneck everywhere. Three or four agents handling hundreds of tickets a week, all hitting a single queue. Half the tickets are the same five questions: "Where's my order?", "How do I return this?", "Is this product suitable for [specific thing]?", "My discount code doesn't work", and "Can I change my delivery address?"

The agents triage manually, reading each ticket, deciding who should handle it, then responding. First-response times stretch to hours. Customer satisfaction drops. Returns climb because people can't get answers fast enough.

This is a well-understood problem with a clear technical solution. I built a working triage system to demonstrate the architecture and work through the real design decisions involved. Here's what that looks like in practice.

The architecture

The system has three layers: classification, auto-resolution, and smart routing. I kept it deliberately simple because production support systems need to be reliable, not clever.

Layer 1: Intent classification. Every incoming ticket gets classified into one of 12 intent categories. You'd fine-tune a small model on historical tickets (a few thousand labelled examples is enough). The classifier should run in under 200ms per ticket and hit around 93-95% accuracy on well-labelled data.

The 12 categories cover everything from order tracking and returns to product questions and billing issues. Anything the classifier isn't confident about (below a 0.85 threshold) gets flagged as "needs human" and goes straight to the agent queue.

Layer 2: Auto-resolution. For five of those 12 categories, the system can respond without a human. Order tracking pulls the latest status from the Shopify API and sends a formatted response. Return requests generate a pre-filled return label and instructions. Discount code issues get checked against the promo database. Delivery address changes get processed if the order hasn't shipped yet.

These auto-responses use templates with dynamic data, not free-form LLM generation. That's an important design choice. You don't want a language model improvising your returns policy. The templates should be written by the support lead with the data lookups wired in programmatically.

Layer 3: Smart routing. The remaining tickets get scored by urgency (based on sentiment analysis and order value) and routed to the right agent based on their specialisation. One agent handles product knowledge questions, another focuses on billing and refunds, a third deals with shipping and logistics.

Without this system, a billing question might sit in the general queue for hours before the billing-specialist agent even sees it.

The hard parts

Getting training data right takes longer than building the model. Existing ticket labels in most helpdesks are inconsistent. "Return" and "Refund" get used interchangeably. "Complaint" covers everything from a broken product to a late delivery. You'll spend time defining clear categories and relabelling edge cases before training is even possible.

Sentiment analysis is harder than intent classification. Sarcasm, passive aggression, and cultural tone differences make urgency scoring difficult. In my testing, weighting order value more heavily than raw sentiment turned out to be a better proxy for "this customer needs help fast."

Auto-responses need careful guardrails. The tricky case: a customer asks about a return but also mentions a damaged product. The return flow kicks in, but the damage claim needs a different process. You need a "compound intent" detector that flags tickets containing multiple intents and routes those to a human regardless.

Tip

If you're building auto-responses for support, start with the simplest, most clear-cut cases only. "Where is my order?" is perfect. "I want to return this but also it arrived damaged and I used a discount code" is not.

What these systems typically achieve

Based on published benchmarks and the architecture I've described, here's what a well-tuned triage system can realistically deliver:

  • First-response time: from hours down to minutes for auto-resolved tickets (under 30 seconds), with routed tickets reaching the right agent much faster
  • Auto-resolve rate: 40-50% of incoming tickets handled without a human, depending on how repetitive the ticket mix is
  • Agent productivity: each agent handles fewer tickets but spends more time on the complex cases that actually need human judgement
  • CSAT improvement: faster responses and correct routing consistently push satisfaction scores up

The biggest win isn't the speed though. It's that agents stop burning out on "where's my order?" and have energy left for the genuinely difficult conversations.

Design decisions I'd make differently in hindsight

The intent classifier works well initially but it's brittle to product changes. When a new product category launches, tickets about it get misclassified. Building a monthly retraining step into the pipeline from day one is worth the upfront effort.

I'd also plan for knowledge base integration from the start. Template-based auto-responses are easier to ship and maintain, but a retrieval-augmented setup pulling from help docs would handle more edge cases over time.

Key Takeaways

  • Classify first, then automate. Don't try to auto-resolve everything on day one.
  • Template-based responses beat LLM-generated ones for support. Predictability matters more than flexibility.
  • The real ROI is in agent quality of life, not just speed metrics.
  • Build the retraining loop early. Support patterns shift with every product launch and seasonal spike.
  • Compound intents are the edge case that will bite you. Always detect and route those to a human.

Is this relevant to your business?

If your support team spends most of their time on the same handful of questions, this pattern works. The specific tools will vary. Maybe you're on Freshdesk instead of Zendesk, or Intercom instead of email. The architecture stays the same: classify, automate the obvious stuff, route the rest intelligently.

The key question isn't "should I add AI to support?" It's "what percentage of my tickets are repetitive enough to automate?" If the answer is over 30%, it's probably worth building. If you're curious about what that could look like for your setup, get in touch and I'll give you an honest take.