Mhimasree

About

Mon, 01 Jan 0001 00:00:00 +0000

I’m a software engineer working on AI systems and distributed infrastructure.

This blog is where I think out loud — about building agents that behave, systems that scale, and code that lasts. No newsletter, no algorithm. Just writing.

If something here was useful, I’d love to hear about it.

Building a multi-agent harness that doesn't fall apart at 3am

Mon, 01 Jan 0001 00:00:00 +0000

I’ve now built three different multi-agent systems in production. Each time, I’ve made roughly the same mistakes in a different order. This post is an attempt to write down the things I wish I’d known before starting the third one.

The loop that eats itself

The core of most agentic systems is a loop: give the model a context, let it call tools, observe the results, feed them back in. Simple enough on a whiteboard. In production, this loop has two failure modes that sound obvious in retrospect but cost me days each.

P99 is a lie — and what to measure instead

Mon, 01 Jan 0001 00:00:00 +0000

Everyone reports P99. Almost nobody questions what it actually tells them.

What P99 hides

P99 measures the 99th percentile of your request latency over a time window. If your P99 is 200ms, that means 99% of requests completed within 200ms. Sounds good. But consider what it doesn’t tell you:

Which requests are in that 1%? If they’re all from one user, one region, or one query type, that matters.
What’s the P99.9? For a service handling 10,000 requests/minute, 0.1% is still 10 requests per minute hitting extreme tail latency.
Is it stable? A P99 of 200ms with high variance is a very different beast than a steady 200ms.

What I measure instead

After enough production incidents, I’ve settled on a layered approach.