You can write a working MCP server in an afternoon — the build guide covers that part, and the protocol does not fight you. Deploying it is where people lose a week, and almost always to the same mistake: they treat hosting as a packaging problem (which container, which platform, which CLI) when it is actually an architecture decision they made by accident the moment they stored something in a variable.
So let's make the decision on purpose.
Two transports, and only two
The Model Context Protocol defines exactly two transports, and the first choice is which one you need.
- stdio runs your server as a subprocess of a single client. The client writes JSON-RPC to your stdin and reads it from your stdout. This is the default for local tools — a desktop assistant, an IDE plugin, a CLI agent — and it is not "deployed" in any cloud sense. The client is the deployment. If that's your use case, you are done; ship the binary and stop reading.
- Streamable HTTP runs your server as an independent process serving many clients over the network. One endpoint path handles both POST (client → server messages) and GET (server-initiated streams), and any response may be upgraded to Server-Sent Events for streaming. This is what "deploy an MCP server" almost always means, and the rest of this piece is about it.
One historical note that still bites people: Streamable HTTP replaced the old HTTP+SSE transport in spec revision 2025-03-26. The spec text is blunt about it — "This replaces the HTTP+SSE transport from protocol version 2024-11-05." If you are porting a server written against the SSE transport, you are not adding HTTP next to SSE; you are collapsing onto a single endpoint that does both and only opens an SSE stream when it needs one. Hosts like Cloudflare's McpAgent still mount a legacy /sse route for old clients, but treat that as a compatibility shim, not the road.
The fork everything hangs on: stateless or stateful
Here is the decision that quietly determines your platform, your bill, and your deploy strategy — and the one most tutorials skip.
Under Streamable HTTP, your server MAY assign a session at initialization by returning an Mcp-Session-Id header in the InitializeResult. If it does, the client MUST echo that header on every subsequent request. Read that again with your ops hat on: the moment you issue a session ID and keep that session's state in process memory, you have told your load balancer that this client must always reach the instance that issued the ID. That is a sticky session. And sticky sessions are precisely the constraint that made the old HTTP+SSE transport miserable to scale — long-lived, instance-pinned connections that fight every autoscaler and complicate every rolling deploy.
The alternative is to issue no session ID and hold no per-session memory. Then any replica can serve any request. Your server maps cleanly onto serverless (Vercel Fluid Compute, AWS Lambda behind API Gateway) and onto Kubernetes horizontal pod autoscaling, because the platform's core assumption — that requests are fungible across instances — is true.
Decide stateless-versus-stateful before you choose a platform, not after. It is not a tuning knob you reach for under load; it is the shape of the thing.
The migration trap is specific and common: a team ports an SSE server to Streamable HTTP, keeps its in-memory sessions out of pure reflex, deploys a second replica for redundancy — and only discovers the affinity requirement when that second replica starts dropping requests from clients pinned to the first. Nothing in a single-instance test ever reveals it.
If you genuinely need session continuity — a long multi-turn workflow, a server-side cursor — you have two clean options that don't sacrifice scaling: externalize the session to a shared store like Redis, or give each session its own actor. Cloudflare's Agents SDK takes the actor route literally: McpAgent allocates one Durable Object per session, with the session's state living in that object's zero-latency SQLite and surviving hibernation. That's a stateful server that still scales, because the "stickiness" is a first-class addressable object rather than a coincidence of which box answered first.
Where people actually host it
The market has sorted into a few shapes, and they line up almost exactly with the stateless/stateful axis:
- Serverless, stateless. Vercel deploys MCP servers via its
mcp-handleradapter on Fluid Compute with built-in OAuth, used in production by Zapier and others. The gotchas are honest ones: it's stateless by design (bring Redis/Upstash if you need sessions), there's a request body-size limit, and you must use the Node.js runtime — the Edge runtime can't run the SDK'sStreamableHTTPServerTransport, which needs Node APIs.
- Edge actors, stateful. Cloudflare gives you
McpAgent(a Durable Object per session, SQLite state, hibernation) plus an OAuth Provider library so the same Worker is both authorization and resource server.npm create cloudflare@latest -- my-server --template=cloudflare/ai/demos/remote-mcp-authlessgets you a running endpoint in minutes.
- Push-a-repo, managed. FastMCP Cloud deploys a Python FastMCP server straight from a GitHub repo to a live URL in about a minute, auto-detecting dependencies and handling the OAuth wiring for you, with a free personal tier and PR preview deploys. Smithery hosts on its own auto-scaling infra (
https://server.smithery.ai/<your-server>) via asmithery.yamland doubles as a 7,000-server public registry for discovery.
- Persistent and cheap-at-idle. Fly.io is the pragmatic choice for wrapping an existing stdio server:
fly mcp launch "uvx mcp-server-time" --claude --server timeprovisions a Machine, installs the stdio server, and wires up the client, with bearer-token auth on by default and auto-stop/auto-suspend so an idle server costs nearly nothing.
Don't skip the lock
One non-negotiable, regardless of platform: as of the 2025-06-18 revision, any internet-reachable MCP server MUST implement OAuth 2.1, acting as a resource server — not an identity provider — and MUST publish Protected Resource Metadata at /.well-known/oauth-protected-resource per RFC 9728, returning a WWW-Authenticate header on a 401 that points clients to it. The "ship a Bearer sk-..." pattern is fine on localhost and non-compliant the instant your server has a public URL. (The audience-validation details — the part that actually stops token misuse — are their own subject.) Several of the platforms above exist largely to spare you this plumbing, which is a perfectly good reason to use them.
And if you're running a local HTTP server during development, the spec asks one more thing that's easy to forget: validate the Origin header and bind to 127.0.0.1, not 0.0.0.0. Otherwise a webpage in the user's browser can quietly DNS-rebind its way into your tool server. A laptop is not a private network.
The protocol gives you two transports and one genuinely consequential decision. Make the stateless-versus-stateful call first, in the cold light of how you intend to scale, and the platform almost picks itself. Make it by accident — one Mcp-Session-Id, one map in memory — and you'll meet it again at 2 a.m., the day you add the second replica.



