MCP

Designing MCP servers real teams actually deploy

What we've learned shipping production-grade Model Context Protocol servers — the patterns that hold up under real agent traffic.

AK

Amara Kessler

Principal Engineer

Apr 12, 2026 · 9 min read

Developers working on laptops around a collaborative table.

Share

After a year of shipping MCP servers for B2B operations, fintech, and content teams, a clear shape has emerged for what production looks like. It rarely matches the demos.

The temptation is to wrap every API endpoint as a tool and ship. Real deployments need a layer of design discipline first — otherwise the agent calls the wrong tool, hallucinates arguments, or quietly burns through your rate limits.

Tool design is the product

Before any code, we list every tool we plan to expose, write a one-sentence description in plain English, and stress-test the names against likely agent queries. If two tools could plausibly answer the same prompt, we collapse them.

One job per tool — never overload arguments to do two things
Names should be verbs from the agent's point of view
Argument shapes should be the smallest viable input
Errors should be readable strings, not stack traces

Auth is rarely optional

Even internal MCP servers should support a per-user identity. The pattern that's worked best for us is short-lived signed tokens minted by the host application, validated at the MCP boundary, and translated into the downstream credentials each tool needs.

server.tool("list_invoices", {
  description: "List invoices for the signed-in customer",
  input: z.object({ status: z.enum(["open", "paid"]).optional() }),
  handler: async ({ input, ctx }) => {
    const customer = await requireCustomer(ctx);
    return billing.invoices.list({ customer: customer.id, ...input });
  },
});

Observability you'll actually open

Every tool call should be logged with: tool name, hashed user identity, input shape, latency, result kind, and a request id you can grep. We pipe these to a single dashboard with three charts — calls per tool, error rate, p95 latency — and almost never need more.

If you can't answer 'which tool does the agent call most often, and how fast' in under ten seconds, the server isn't done.
— Internal Growrix OS Ops playbook

What we're working on next

We're prototyping a small library that lets a single MCP server expose different toolsets per audience — staff, customer, anonymous — without forking the server. More on that soon.

Tags#MCP #Architecture #Observability #Agents

AK

Amara Kessler

Principal Engineer

Amara designs MCP servers and platform infrastructure. Loves observability, hates flaky tests.

Discussion (3)

Be kind. Be specific.

LP
Lena P.Apr 13, 2026
The 'one job per tool' rule changed our agent's accuracy overnight. Confirmed.
- AK
  Amara KesslerApr 13, 2026
  Glad to hear it — the temptation to overload is real.
MT
Marcus T.Apr 14, 2026
How do you handle long-running tools? Do you stream progress back to the agent?
JH
Jin H.Apr 15, 2026
Would love a follow-up on the per-audience toolsets pattern.

Work with us

Want this kind of thinking on your project?

Tell us what you're building. We'll respond with a written plan within 48 hours.

Book Appointment See services