The situation.
A consumer app put AI at the centre of the product: conversational answers, recommendations, search that understands what people mean. The features worked in the demo and wobbled in production. When usage spiked in the evening, answers slowed, costs climbed, and the experience that sold the product became the reason people closed it.
AI features have a property owners learn the hard way: the busiest moments are the slowest and the most expensive, which is backwards from what an audience expects.
What got built.
The system under the features was rebuilt for the load pattern the product actually has. Requests route across more than one model by task: the fast, inexpensive model where it is good enough, the heavyweight one only where it earns its place. Answers that repeat get cached and served instantly. Heavy work moves off the request path, so the person waiting sees a response while the slower steps finish behind it.
Model choices are treated as versioned infrastructure: evaluated against real outcomes, documented, and rolled back when an update underperforms. Cost and speed per feature sit on a dashboard, not in an invoice surprise.
What changed.
The product now holds its speed at three times normal traffic, with no slowdowns at the evening peak, and the cost per user came down with the rebuild. Spikes stopped being incidents.
Model updates ship as routine changes with an evaluation in front of them, instead of as gambles.
What this would look like for you.
An AI feature that works in a demo is the easy half. If yours slows down when real usage arrives, this is the shape of the fix.
See the Build service