You need to put a face on an LLM app — a prototype for stakeholders, an internal tool, a demo for the model you just fine-tuned. In Python, three names come up: Streamlit, Gradio, Chainlit. They get lumped together as interchangeable "quick UI" libraries, and that framing will lead you to the wrong one.

They are not three flavors of the same thing. Each is built around a different execution model — the invisible decision about when your code runs and what survives between runs — and that model is exactly what makes one job trivial in a given framework and a running battle in the other two. Pick by the model, not the screenshots.

Streamlit: the whole script, every time

Streamlit (owned by Snowflake since 2022) has the simplest mental model in the category, and it's also the source of every frustration people have with it. On every interaction — a button click, a slider drag, a typed message — Streamlit reruns your entire script from top to bottom in a blank slate. No variable survives unless you explicitly stash it in st.session_state.

Turn Python scripts into data apps; rerun-the-whole-script execution model
★ 45kPythonstreamlit/streamlit

For a dashboard — charts, filters, a table, maybe an LLM summarization panel — this is wonderful. You write top-down procedural code and it just paints. For a stateful, streaming, multi-turn agent, you're swimming upstream: token streaming needs st.write_stream, conversation state needs session_state, and partial updates need st.fragment to avoid redrawing the world on every keystroke. It's all possible. It's just friction you inherited from a model designed for data apps, not conversations.

Gradio: one function, in and out

Gradio (owned by Hugging Face since 2021) is functional. You define inputs, a function, and outputs, and Gradio wires up the event loop. It was born to do one thing supremely well: wrap a single model into a shareable demo in a dozen lines, and it's native to Hugging Face Spaces, so "deploy" is a git push.

Build and share ML model demos in Python; HF Spaces-native, auto REST + MCP
★ 43kPythongradio-app/gradio

The underrated part: a Gradio app isn't just a frontend. It auto-generates a REST API for every function, and recent versions can expose those functions as an MCP server with launch(mcp_server=True) — so the same code that demos your model to a human can serve it as a tool to an agent. For chat specifically, gr.ChatInterface gives you a competent window fast. What you don't get for free is rich rendering of a multi-step agent's reasoning.

Chainlit: the conversation is the framework

Chainlit is the only one of the three that started from the conversation. Its execution model is a chat-message async event loop, and it ships the things the other two make you build by hand: native streaming, message threading, persisted history, user feedback collection, and authentication.

Chat-first Python framework for conversational AI; native agent-step rendering
★ 12kPythonChainlit/chainlit

The real differentiator is Steps: Chainlit automatically renders an agent's intermediate reasoning — tool calls, retrieved context, sub-chain output — as collapsible cards inside the chat. If you've ever wanted users (or yourself, debugging) to see what the agent did between question and answer, that UI is a config flag in Chainlit and a weekend project in the others.

The honest footnote: Chainlit's founding team stepped back from active development on May 1, 2025, and the project is now community-maintained while the original team builds a new venture, Summon. It's still actively released and widely used — but a governance change is a real input for a long-term bet.

How to actually choose

Notice what's not a differentiator: the license. All three are Apache-2.0, so unlike the graph-database and database tiers where licensing is the trap, here you choose purely on fit:

And if your target is a production React app rather than a Python prototype, none of these is the answer — that's the CopilotKit / Vercel AI SDK layer, a different decision entirely. The Python three are for getting from a working agent to a usable interface in an afternoon. Match the execution model to the job and the afternoon stays an afternoon.