Plinth / guides / mcp-production-checklist

The MCP server production checklist

~6 min read · Operations · Model Context Protocol

An MCP server that runs on your laptop and one that runs in production are different programs. The protocol is young, the examples are toys, and the gap is filled with operational concerns nobody writes a quickstart for. This is the list to run through before you let a real agent — and real traffic — anywhere near it.

1. Logs go to stderr, always

On the stdio transport, stdout is the protocol. Any byte you write there that isn't valid JSON-RPC corrupts the stream and the client drops the connection. Route every log line, every library banner, every debug print to stderr. Make it structured JSON so you can actually grep it later. This is the single most common way MCP servers fail in the wild.

2. Auth on privileged tools

By default, any client that can reach the transport can call any tool. If your tools touch a database, spend money, or hit an internal API, gate them behind a token or API key checked before the handler runs. Treat the MCP boundary like any other public API surface.

3. Rate limiting per client

Agents loop. A planning loop or a retry loop on the model's side can call one tool hundreds of times a minute. Without a limiter you'll exhaust a downstream quota, eat a surprise bill, or get the upstream API to ban your key. Throttle per client and return a clean, typed error when the limit trips.

4. Retries with backoff and jitter

Every outbound network call will eventually return a transient error. Without retries, one 503 fails the whole tool invocation and the model gives up. Wrap outbound calls in exponential backoff with jitter, cap the attempts, and respect Retry-After when the upstream sends it.

5. Input validation at the boundary

The model will send arguments that don't match your assumptions — wrong types, missing fields, absurd values. Validate against a schema at the edge (Pydantic, Zod) and reject bad input with a clear error, instead of letting it crash three calls deep.

6. Graceful shutdown

When your orchestrator sends SIGTERM during a deploy, a naive server dies mid-response and the client sees a truncated or dropped reply. Install signal handlers that stop accepting new work, let in-flight calls finish (with a timeout), close the transport, and exit cleanly.

7. Health checks

If you're running over HTTP or behind an orchestrator, expose liveness and readiness endpoints. Liveness answers "is the process alive"; readiness answers "can it serve right now" (dependencies reachable, warm-up done). Without them your platform can't restart a wedged server or hold traffic during startup.

Verify it For every item above, write a test that drives the real protocol, not mocked internals. "Auth rejects an unauthenticated call," "rate limiter trips at N," "SIGTERM drains cleanly." If it isn't tested, assume it's broken.

8. Hardened container

Multi-stage build, slim base image, run as a non-root user, no secrets baked into layers, a defined HEALTHCHECK. This is table stakes for deploying to any VPS or platform.

The short version

☐ Logs to stderr only, structured JSON
☐ Auth gate on privileged tools
☐ Per-client rate limiting
☐ Retries with backoff + jitter on outbound calls
☐ Schema validation on all tool inputs
☐ Graceful shutdown on SIGTERM
☐ Liveness + readiness health checks
☐ Hardened, non-root container
☐ Tests over the real protocol for each of the above

Tick every box on day one

The Plinth MCP Server Starter Kit ships Python and TypeScript templates with every item on this checklist already implemented and tested — 23 Python + 19 TypeScript tests driving the server over the real MCP protocol. Clone, add your tools, ship.

Get the kit — $39