The agent the timeline can't stop calling "self-improving" is Hermes, shipped by Nous Research in February 2026 and given a second wind by an NVIDIA launch positioning it as the thing to run around the clock on a DGX Spark or an RTX PC. The pitch is genuinely novel and the demo is genuinely good. But the word doing all the work — self-improving — points at the wrong layer. Hermes does improve. What improves is not the model.
What the agent actually does#
Strip the marketing and the loop is concrete. Hermes runs as a terminal UI, or as a gateway you talk to from Telegram, Discord, Slack, or email. When it solves a task and notices the shape repeating, it writes a reusable skill — a plain Markdown file dropped into ~/.hermes/skills/. It keeps a memory: outcomes go into a local SQLite store that it searches across sessions with FTS5 full-text queries and condenses with LLM summarization. It keeps a running model of the user, deepened each session. And it can schedule itself, running briefings or backups unattended through a natural-language cron.
Every one of those is a real, useful capability. None of them is the model getting smarter. The weights of whatever LLM you've pointed Hermes at — Qwen 3.6, or anything else, since it's model-agnostic — are byte-for-byte identical before and after Hermes "learns" something. The improvement is entirely in the files, the database, and the schedule that surround the frozen model.
Hermes doesn't learn in the training sense. It writes down what worked and reads it back.
Self-improvement is just self-authored context engineering#
This is not a knock — it's the interesting part, and it's worth naming precisely. The field spent the last year converging on the idea that an agent's competence lives in its harness, not only in its model: the loop, the tools, the retrieved context, the skills you load in. We learned to do context engineering by hand — curating what the model sees so a fixed set of weights performs better on your task.
Hermes automates that. It is an agent doing its own context engineering: deciding what's worth turning into a skill, writing the skill, storing the memory, and pulling both back at the right moment. Seen that way, "self-improving" is an honest description of an unusual thing — but the unusual thing is a harness that edits itself, not a model that trains itself. The distinction matters because it tells you where the capability ceiling is (wherever the base model's ceiling is) and where the risk is (in the self-written wrapper).
It also clarifies how Hermes differs from the skills ecosystem it resembles. Claude's Agent Skills and MCP are authored by people and then loaded; Hermes's skills are authored by the agent from your real workflows and stored in a portable, open format. The file is not the novelty. The author is.
The question the demo doesn't answer#
Here's the part the launch glosses. When a human writes an automation, it gets reviewed — by the author, by a teammate, by the first bug it causes. When Hermes distills a skill from experience, nothing checks that the skill is correct. A procedure that worked once because of luck, a stale assumption, or a one-off environment can get written down as a reusable skill and then invoked with full confidence on the next task that looks similar. Memory has the same exposure: a wrong conclusion, summarized and recalled, becomes a premise.
This is the agent-memory problem the field already keeps running into, now with the agent itself as the writer. Persistence without verification doesn't compound learning — it compounds whatever was in the first draft. The same trait that makes Hermes feel alive (it acts on what it wrote yesterday) is the trait that makes a bad skill durable.
None of which makes Hermes less worth running. An always-on local agent that accretes a library of your actual workflows is a real shift from the stateless chat session, and doing it on hardware you own rather than a metered API is the genuinely new affordance NVIDIA is selling. Just price the claim correctly. The model isn't improving itself. The harness is writing itself — and the next hard problem isn't getting it to write more skills. It's deciding which of the skills it writes you're willing to trust.



