Pick any LLM observability tool shipped in 2026 — Langfuse, Arize Phoenix, Honeycomb, Datadog, New Relic — and read the marketing for ten seconds. You will hit the same three words: open, portable, vendor-neutral. The pitch is that your traces aren't hostage to one dashboard; export them, take them anywhere, no lock-in. It's a good pitch. It's also a claim about something most of those pages never name: a shared vocabulary for what a "span" emitted by an LLM call is actually called.
That vocabulary is the OpenTelemetry GenAI semantic conventions, and the uncomfortable thing about it — the thing the marketing skips — is that it is still stamped Development.
The portability is in the attribute names, not the SDK#
When people argue about agent observability they argue about SDKs: OpenLLMetry versus OpenInference versus whatever your framework emits natively. That's the wrong layer to fight on. An SDK is just a thing that produces spans. What makes a span from LangChain interchangeable with a span from CrewAI — what lets one backend chart both without per-framework glue — is that they agree on the keys: that the model is gen_ai.request.model, the prompt tokens are gen_ai.usage.input_tokens, the reason it stopped is gen_ai.response.finish_reasons. Agree on the names and any backend can read any emitter. Disagree, and "portable" means "portable to a tool that wrote a custom adapter for your stack."
So the semantic convention is the product everyone is actually selling. It's the layer that turns observability from a proprietary feature into a commodity. And there is a structural reason a commodity standard is hard to ship: the moment the names are frozen, every vendor's "open and portable" claim becomes true and checkable, which is precisely the differentiation a vendor would rather keep fuzzy. Standardizing the schema is how the incumbents lose their moat. That tension is why this has taken years, not months.
What's actually in the spec#
The current convention defines a small required core and a long recommended tail. Every GenAI span must carry gen_ai.operation.name and gen_ai.provider.name. The model is conditionally required as gen_ai.request.model; token counts, gen_ai.response.id, finish reasons, and the response model are recommended. The actual prompt and completion text — gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions — are opt-in, the spec's polite acknowledgment that your message bodies are usually PII you don't want defaulting into a trace.
The operation names are the tell about ambition. They aren't just chat and embeddings and text_completion. They now include execute_tool, invoke_agent, invoke_workflow, plan, retrieval, and a whole memory family — search_memory, upsert_memory, create_memory_store. OpenTelemetry isn't trying to standardize the LLM call anymore. It's trying to standardize the agent loop: the tool calls, the planning step, the retrieval, the reads and writes to whatever you're using as memory. Sometime in 2026 the GenAI conventions were also split into their own repository, semantic-conventions-genai, separate from core OTel and explicitly scoped to cover GenAI clients, MCP, and provider-specific shapes. A standard gets its own repo when it's about to move fast.
The semantic convention is the layer that turns observability from a proprietary feature into a commodity — which is exactly why the incumbents are in no hurry to finish it.
The env var that admits the truth#
Here is the detail that should calibrate your expectations. To adopt the conventions without breaking the people already depending on the old shape, instrumentations honor an environment variable: OTEL_SEMCONV_STABILITY_OPT_IN. Set it to gen_ai_latest_experimental and the library emits the newest experimental version of the conventions and stops emitting the pre-v1.36.0 one.
Read that again. There is a supported, documented flag whose entire job is to let you choose which incompatible version of the "standard" your spans speak. That is not a footnote; that is the spec telling you, in config, that the names can still move under you — and they have. The provider attribute that's required today, gen_ai.provider.name, is itself a rename of what older instrumentation called gen_ai.system. If you built a dashboard on the old key, the standard's progress is your outage.
What to actually do about it#
None of this is an argument against the conventions. It's the opposite. A shared schema, even a moving one, is worth more than every vendor's bespoke field names, and the adoption is real — the major backends ingest gen_ai.* today, and the popular frameworks emit it. The argument is against believing the marketing word "stable" before the spec says it.
Concretely: instrument against the conventions, not against your SDK's private fields, so you can change tools later. Pin the stability opt-in explicitly instead of inheriting whatever default your library ships this quarter. Treat your own dashboards and alerts as code that breaks when an attribute is renamed, because it will be. And when a vendor tells you their tracing is "open and portable," ask the only question that matters: which version of the GenAI conventions do you emit, and do you honor the opt-in? If they can answer that, the portability is real. If the room goes quiet, you've found the lock-in.



