You can write a working MCP server in an afternoon — the build guide covers that part, and the protocol does not fight you. Deploying it is where people lose a week, and almost always to the same mistake: they treat hosting as a packaging problem (which container, which platform, which CLI) when it is actually an architecture decision they made by accident the moment they stored something in a variable.

So let's make the decision on purpose.

Two transports, and only two

The Model Context Protocol defines exactly two transports, and the first choice is which one you need.

One historical note that still bites people: Streamable HTTP replaced the old HTTP+SSE transport in spec revision 2025-03-26. The spec text is blunt about it — "This replaces the HTTP+SSE transport from protocol version 2024-11-05." If you are porting a server written against the SSE transport, you are not adding HTTP next to SSE; you are collapsing onto a single endpoint that does both and only opens an SSE stream when it needs one. Hosts like Cloudflare's McpAgent still mount a legacy /sse route for old clients, but treat that as a compatibility shim, not the road.

The fork everything hangs on: stateless or stateful

Here is the decision that quietly determines your platform, your bill, and your deploy strategy — and the one most tutorials skip.

Under Streamable HTTP, your server MAY assign a session at initialization by returning an Mcp-Session-Id header in the InitializeResult. If it does, the client MUST echo that header on every subsequent request. Read that again with your ops hat on: the moment you issue a session ID and keep that session's state in process memory, you have told your load balancer that this client must always reach the instance that issued the ID. That is a sticky session. And sticky sessions are precisely the constraint that made the old HTTP+SSE transport miserable to scale — long-lived, instance-pinned connections that fight every autoscaler and complicate every rolling deploy.

The alternative is to issue no session ID and hold no per-session memory. Then any replica can serve any request. Your server maps cleanly onto serverless (Vercel Fluid Compute, AWS Lambda behind API Gateway) and onto Kubernetes horizontal pod autoscaling, because the platform's core assumption — that requests are fungible across instances — is true.

Decide stateless-versus-stateful before you choose a platform, not after. It is not a tuning knob you reach for under load; it is the shape of the thing.

The migration trap is specific and common: a team ports an SSE server to Streamable HTTP, keeps its in-memory sessions out of pure reflex, deploys a second replica for redundancy — and only discovers the affinity requirement when that second replica starts dropping requests from clients pinned to the first. Nothing in a single-instance test ever reveals it.

If you genuinely need session continuity — a long multi-turn workflow, a server-side cursor — you have two clean options that don't sacrifice scaling: externalize the session to a shared store like Redis, or give each session its own actor. Cloudflare's Agents SDK takes the actor route literally: McpAgent allocates one Durable Object per session, with the session's state living in that object's zero-latency SQLite and surviving hibernation. That's a stateful server that still scales, because the "stickiness" is a first-class addressable object rather than a coincidence of which box answered first.

Where people actually host it

The market has sorted into a few shapes, and they line up almost exactly with the stateless/stateful axis:

Don't skip the lock

One non-negotiable, regardless of platform: as of the 2025-06-18 revision, any internet-reachable MCP server MUST implement OAuth 2.1, acting as a resource server — not an identity provider — and MUST publish Protected Resource Metadata at /.well-known/oauth-protected-resource per RFC 9728, returning a WWW-Authenticate header on a 401 that points clients to it. The "ship a Bearer sk-..." pattern is fine on localhost and non-compliant the instant your server has a public URL. (The audience-validation details — the part that actually stops token misuse — are their own subject.) Several of the platforms above exist largely to spare you this plumbing, which is a perfectly good reason to use them.

And if you're running a local HTTP server during development, the spec asks one more thing that's easy to forget: validate the Origin header and bind to 127.0.0.1, not 0.0.0.0. Otherwise a webpage in the user's browser can quietly DNS-rebind its way into your tool server. A laptop is not a private network.


The protocol gives you two transports and one genuinely consequential decision. Make the stateless-versus-stateful call first, in the cold light of how you intend to scale, and the platform almost picks itself. Make it by accident — one Mcp-Session-Id, one map in memory — and you'll meet it again at 2 a.m., the day you add the second replica.