Every prompt-management tool on the market sells the same headline feature, and it is a genuinely good one: you can change your agent's prompt without shipping a new build. LangSmith versions each prompt as a commit hash you can tag staging or prod. Langfuse gives every edit an immutable version number and lets a production label point at whichever one you choose. Braintrust, PromptLayer, Latitude — same core promise. Stop redeploying code to fix a wording. Edit in a UI, click promote, done.

Now say that feature back in the language a reliability engineer would use. It lets a person change what your agent does in production with no pull request, no code review, no CI run, no eval gate, and no guarantee the model underneath is the one the prompt was written against. That is the identical feature. It's just described by what it removes instead of what it adds.

The vendor Jozu gave this its correct name: prompt drift is the new shadow deploy. Your agent's outputs change, but none of your normal release signals fire — no version bump, no image digest change, no PR in the history. When something regresses next Tuesday, nothing in your deploy log points at the 11-word edit that caused it.

The tell: they had to add the governance back#

If the decoupled model were simply safe, you'd expect the tools to leave it alone. Instead, watch what they shipped next.

Langfuse added protected prompt labels — a feature that lets an admin lock the production label so it can't be casually edited or deleted. Braintrust's pitch now centers on a GitHub Action that runs evaluations whenever a prompt changes in a pull request, so prompt updates follow "the same review and validation process as code changes." Both companies sell the decoupling and sell you the controls to re-couple it. That's not a contradiction; it's an admission. The raw "edit and promote" primitive was too sharp to hand out unguarded, so they bolted the guardrails back on.

The feature every prompt tool advertises as the benefit — ship a prompt without shipping code — is, stated precisely, the bug: change production without the checks a deploy carries.

The claim: a prompt CMS is not automatically safer than git#

This is the part that runs against the marketing. Moving prompts out of your codebase and into a dedicated store does not, by itself, reduce deploy risk. It relocates it, and often hides it. A prompt in git is at least protected by the machinery around git: a diff, a reviewer, a test run, an atomic deploy alongside a known model. Strip a prompt out of that and drop it in an ungoverned UI, and you've removed all four and replaced them with a "Promote" button.

There are smart people on the pro-CMS side, and they're not wrong about the pain. Giorgos Myrianthous argues prompts are content, not code — that coupling them to code deploys makes every wording change a slow engineering ritual and locks non-engineers out. On the other side, Hamel Husain argues prompts are software artifacts that belong in git, "versioned, reviewed, and deployed atomically with the application code," and warns that dedicated tools "risk creating additional layers of indirection."

The way to end that argument is to notice it's the wrong axis. Where the prompt lives — repo or CMS — is not what determines safety. Four controls do:

A prompt CMS that carries all four beats prompts-in-code, because it adds speed and non-engineer access on top of the safety. A prompt CMS missing them is strictly worse than a hardcoded string, because it has all of the string's rigidity problems solved and all of production's safety problems reintroduced. The store is neutral. The controls are everything.

Why the model pin is not optional#

The control teams skip most often is the third one, and it's the one that quietly ruins the other three. You cannot version a prompt in isolation, because an agent's behavior is a joint function of the prompt and the model and the tool definitions. Freeze a perfect prompt against today's weights and you've frozen one leg of a tripod.

This bites hardest through provider aliases. A string like gpt-4o or a -latest tag is a moving pointer; the weights, and sometimes the safety filters, can change under it without a version bump on your side. Anthropic's own model-migration guidance is explicit that a new model is not behavior-neutral — it tells you to re-run your prompts and evals before the old one retires, which is precisely the promise a naked alias can't make. So a prompt "version" that doesn't record the exact model snapshot it was validated against isn't a version of the thing that produces behavior. It's a version of one input to it. Pin the snapshot, store it inside the prompt version, and treat the unit you promote as the (prompt, model, eval-baseline) triple — because that triple, not the wording alone, is what your users actually experience. It's the same reason a model migration is a project and not a find-and-replace.

The short version#

Keep the convenience; refuse the shadow deploy. However you store prompts, make a prompt change carry what a code change carries: an immutable version, a diff someone (or something) reviews, the model it was validated against, and an eval gate that can say no. Do that in git with a CI check, or do it in a governed prompt tool with protected labels and an eval action — the store doesn't matter. What matters is that "promote to production" stops being a button anyone can press blind, and goes back to being what it always was: a deploy.