After a year of shipping MCP servers for B2B operations, fintech, and content teams, a clear shape has emerged for what production looks like. It rarely matches the demos.
The temptation is to wrap every API endpoint as a tool and ship. Real deployments need a layer of design discipline first — otherwise the agent calls the wrong tool, hallucinates arguments, or quietly burns through your rate limits.
Tool design is the product
Before any code, we list every tool we plan to expose, write a one-sentence description in plain English, and stress-test the names against likely agent queries. If two tools could plausibly answer the same prompt, we collapse them.
- One job per tool — never overload arguments to do two things
- Names should be verbs from the agent's point of view
- Argument shapes should be the smallest viable input
- Errors should be readable strings, not stack traces
Auth is rarely optional
Even internal MCP servers should support a per-user identity. The pattern that's worked best for us is short-lived signed tokens minted by the host application, validated at the MCP boundary, and translated into the downstream credentials each tool needs.
server.tool("list_invoices", {
description: "List invoices for the signed-in customer",
input: z.object({ status: z.enum(["open", "paid"]).optional() }),
handler: async ({ input, ctx }) => {
const customer = await requireCustomer(ctx);
return billing.invoices.list({ customer: customer.id, ...input });
},
});Observability you'll actually open
Every tool call should be logged with: tool name, hashed user identity, input shape, latency, result kind, and a request id you can grep. We pipe these to a single dashboard with three charts — calls per tool, error rate, p95 latency — and almost never need more.
If you can't answer 'which tool does the agent call most often, and how fast' in under ten seconds, the server isn't done.
What we're working on next
We're prototyping a small library that lets a single MCP server expose different toolsets per audience — staff, customer, anonymous — without forking the server. More on that soon.
Amara designs MCP servers and platform infrastructure. Loves observability, hates flaky tests.
Discussion (3)
Be kind. Be specific.- LPLena P.Apr 13, 2026
The 'one job per tool' rule changed our agent's accuracy overnight. Confirmed.
- AKAmara KesslerApr 13, 2026
Glad to hear it — the temptation to overload is real.
- AK
- MTMarcus T.Apr 14, 2026
How do you handle long-running tools? Do you stream progress back to the agent?
- JHJin H.Apr 15, 2026
Would love a follow-up on the per-audience toolsets pattern.
