<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Mhimasree</title>
    <link>https://himasree.dev/</link>
    <description>Recent content on Mhimasree</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <atom:link href="https://himasree.dev/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>About</title>
      <link>https://himasree.dev/about/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://himasree.dev/about/</guid>
      <description>&lt;p&gt;I&amp;rsquo;m a software engineer working on AI systems and distributed infrastructure.&lt;/p&gt;&#xA;&lt;p&gt;This blog is where I think out loud — about building agents that behave, systems that scale, and code that lasts. No newsletter, no algorithm. Just writing.&lt;/p&gt;&#xA;&lt;p&gt;If something here was useful, I&amp;rsquo;d love to hear about it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building a multi-agent harness that doesn&#39;t fall apart at 3am</title>
      <link>https://himasree.dev/ai/building-a-multi-agent-harness/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://himasree.dev/ai/building-a-multi-agent-harness/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve now built three different multi-agent systems in production. Each time, I&amp;rsquo;ve made roughly the same mistakes in a different order. This post is an attempt to write down the things I wish I&amp;rsquo;d known before starting the third one.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-loop-that-eats-itself&#34;&gt;The loop that eats itself&lt;/h2&gt;&#xA;&lt;p&gt;The core of most agentic systems is a loop: give the model a context, let it call tools, observe the results, feed them back in. Simple enough on a whiteboard. In production, this loop has two failure modes that sound obvious in retrospect but cost me days each.&lt;/p&gt;</description>
    </item>
    <item>
      <title>P99 is a lie — and what to measure instead</title>
      <link>https://himasree.dev/systems/p99-is-a-lie/</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://himasree.dev/systems/p99-is-a-lie/</guid>
      <description>&lt;p&gt;Everyone reports P99. Almost nobody questions what it actually tells them.&lt;/p&gt;&#xA;&lt;h2 id=&#34;what-p99-hides&#34;&gt;What P99 hides&lt;/h2&gt;&#xA;&lt;p&gt;P99 measures the 99th percentile of your request latency over a time window. If your P99 is 200ms, that means 99% of requests completed within 200ms. Sounds good. But consider what it doesn&amp;rsquo;t tell you:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Which requests are in that 1%?&lt;/strong&gt; If they&amp;rsquo;re all from one user, one region, or one query type, that matters.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;What&amp;rsquo;s the P99.9?&lt;/strong&gt; For a service handling 10,000 requests/minute, 0.1% is still 10 requests per minute hitting extreme tail latency.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Is it stable?&lt;/strong&gt; A P99 of 200ms with high variance is a very different beast than a steady 200ms.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;what-i-measure-instead&#34;&gt;What I measure instead&lt;/h2&gt;&#xA;&lt;p&gt;After enough production incidents, I&amp;rsquo;ve settled on a layered approach.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
