There's a moment every team building agents hits, usually around 2am, usually after the third "it worked in the demo." You open your logs to find out why the agent did the dumb thing. And the logs are useless. Not empty — useless. They tell you tool_call: search and tool_call: search and tool_call: search, fourteen times, and nothing about why the model decided to search a fourteenth time. The line printed. The line told you nothing.
This is not a logging bug. It's a category error. And the fix that the entire LLM observability industry landed on is more interesting than the tools themselves.
The surrender
Here's the non-obvious thing. When AI got weird and nondeterministic and everyone needed new ways to watch it, the LLM-native observability vendors did not invent a new standard. They didn't roll a bespoke "AgentTrace v1" format and try to make it stick. They surrendered — to OpenTelemetry, the deeply unglamorous, vendor-neutral telemetry spec that backend engineers have been wiring into microservices since 2019.
OTel is the kind of infrastructure nobody writes a launch post about. It's the plumbing under your plumbing. And yet the GenAI tooling world, which loves nothing more than a novel abstraction, looked at the problem and concluded the boring 2019 answer was already the right shape. They didn't replace it. They extended it, with a set of GenAI semantic conventions — agreed-upon attribute names for things like the model, the prompt, the token counts, the tool call.
Six hundred stars. That's the dependency at the bottom of the stack everyone else is building on, and almost nobody stars a spec repo. But the gen-ai directory in there — the agent spans, the tool spans, the gen_ai.operation.name attribute — is the Rosetta Stone. It's why a trace produced by one tool can be read by another. The convergence is real and it is documented in plain Markdown.
Why a trace, not a log
Now the part that actually matters, the reason the surrender was correct and not just convenient.
A log assumes you already know what line will execute. You write logger.info("processing order") because you wrote the code that processes the order, in that order, every time. The log is a confession from deterministic code: here is where I was.
An agent has no such code path. The control flow — search, then read, then call this tool, then give up and ask the user — is decided at runtime, by a model, based on inputs you never saw. There's no line you could have pre-placed a logger.info on, because you didn't write the branch. The branch was generated.
A trace is the right shape for that. A trace is a tree of spans — a parent decision, its children, their children — and crucially it records a structure you couldn't have predicted. It doesn't say "I was at line 40." It says "I chose to do these four things in this nesting, and here's what each cost and returned."
Observability for agents is less about catching errors and more about replaying a decision you didn't write.
That's the whole shift. You are not debugging your logic. You are reconstructing the model's logic, after the fact, from the tree it left behind. The trace is the new log because the log assumed an author the agent doesn't have.
The tools that embody it
So who built the good versions of this. These are the repos worth your time, framed by the thesis: every one of them speaks OTel, and the best of them treat the trace, not the line, as the unit of truth.
Langfuse is the heavyweight — twenty thousand stars, ClickHouse under the hood for the columnar firehose that agent traces become at scale, and it ingests via OpenTelemetry alongside its own SDKs. (ClickHouse liked it enough to acquire the company in January.) It's the one to reach for when "a few traces" has become "millions of spans a day and we need to query them."
Phoenix is the one I'd hand a skeptic. It runs on your laptop, it's built directly on OpenTelemetry, and it carries OpenInference — Arize's complementary convention layer that slots into OTel rather than fighting it. You can have a trace tree on screen in the time it takes to read this paragraph, which is the right way to learn why traces beat logs: by seeing the tree.
OpenLLMetry is the purest expression of the surrender. It is, essentially, OTel instrumentation packaged for the GenAI stack — auto-instrument your OpenAI, Anthropic, Pinecone, Chroma calls and emit standard spans to any OTel backend, Datadog included. No lock-in, no new format. It's the proof that the standard is the product.
And MLflow, the old guard of ML lifecycle tooling, grew a tracing layer that — as of recent releases — offers full OpenTelemetry support, one-line auto-tracing for the major frameworks, and traces that merge cleanly with whatever OTel instrumentation you already had on your HTTP and database calls. Even the incumbent didn't invent its own thing.
The tell
Notice what isn't here: a single proprietary trace format that won. Five serious tools, four organizations, and they all converged on a spec a backend team wrote for microservices before anyone shipped a production agent.
That's the tell. When an industry that prizes novelty quietly adopts the boring standard, it's because the boring standard was already correct about the underlying problem — that the truth about a nondeterministic system lives in its tree of choices, not in the lines someone hoped it would execute. The log assumed an author. The agent doesn't have one. The trace is what's left.



