Sponsored by

SnackOnAI Engineering | Senior AI Systems Researcher | Technical Deep Dive | May 12, 2026

OpenClaw has 371k stars and a community that describes it as "magical." It also has unbounded memory files, a security architecture that has generated documented prompt injection vulnerabilities, and a skills system where "natural language markdown instruction" is the only API contract. These are not minor limitations. They are architectural choices that reflect OpenClaw's origin as a weekend project that grew faster than its design.

Hermes Agent (NousResearch, MIT, 143k stars, 22.3k forks) was built by the team behind the Hermes, Nomos, and Psyche model series. The design philosophy reflects a different origin: production deployment discipline applied to personal AI infrastructure. Bounded memory with explicit character limits. Pluggable ContextEngine as an abstract base class. Skills with progressive disclosure (three levels, not one). A pluggable MemoryProvider ABC. Integrated RL training pipeline via Atropos. A Curator system that creates and improves skills autonomously.

This newsletter dissects Hermes Agent as a systems engineering document: the bounded memory architecture and why character limits are a feature not a limitation, the three-level progressive skill disclosure system, the ContextEngine ABC and its compression variants, the seven terminal backends and their deployment tradeoffs, and the closed learning loop that makes Hermes the only agent in this class that creates its own skills from experience.

Scope: Hermes Agent core architecture (run_agent.py, prompt_builder.py, context_engine.py, hermes_state.py), the bounded memory model, progressive skill disclosure, terminal backends, Curator, and the RL training integration. Not covered: OpenClaw direct comparison beyond architectural contrasts, or the agentskills.io open standard beyond Hermes compatibility.

What It Actually Does

Hermes Agent is a self-hosted autonomous agent runtime built by NousResearch, MIT-licensed, with 143k stars and 8,014 commits. Its tagline: "The agent that grows with you." The documentation's precise framing: "The only agent with a built-in learning loop, it creates skills from experience, improves them during use, nudges itself to persist knowledge, and builds a deepening model of who you are across sessions."

Key stats from the official docs and repository:

  • 70+ built-in tools across 28 configurable toolsets

  • 7 terminal backends: local, Docker, SSH, Daytona, Singularity, Modal, and more

  • 20+ messaging platform integrations

  • Python (primary runtime), with TypeScript gateway components

  • 3 API modes: chat_completions, codex_responses, anthropic

  • Works with Nous Portal, OpenRouter, OpenAI, Anthropic, or any compatible endpoint

  • Integrated RL training pipeline: batch trajectory export for Atropos

The architectural distinction relative to the broader personal agent ecosystem: Hermes was designed by model trainers building infrastructure for training pipelines. The research-grade discipline shows in the design choices.

The Architecture, Unpacked

Focus on the ContextEngine ABC. This is the abstraction that makes Hermes architecturally superior to a fixed-pipeline agent: context compression is a pluggable interface, not a hardcoded implementation. The default ships with lossy LLM summarization. The interface allows any replacement.

The Code, Annotated

Snippet One: Bounded Memory Architecture (the correct design)

# ~/.hermes/memories/MEMORY.md — the agent's personal notes
# From: hermes-agent.nousresearch.com/docs/user-guide/features/memory

# What the system prompt sees at session start:
MEMORY_PROMPT_INJECTION = """
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations
"""

# Design decisions encoded in this format:
# 1. Hard character limits: MEMORY.md = 2,200 chars (~800 tokens)
#                           USER.md   = 1,375 chars (~500 tokens)
# ← THIS is the trick: bounded memory is a FEATURE, not a limitation.
#   Unbounded memory (OpenClaw's pattern) accumulates stale facts that
#   corrupt context. Bounded memory forces the agent to prioritize:
#   only the most relevant facts survive consolidation.

# 2. Usage percentage displayed to the agent:
#    "67% — 1,474/2,200 chars" tells the agent how much room remains
#    before it must start consolidating. The agent manages its own
#    memory capacity without human intervention.

# 3. Frozen snapshot pattern:
#    Memory is injected ONCE at session start, never updated mid-session.
#    Why? LLM prefix caching. If memory changed mid-session, every turn
#    would produce a different prefix, invalidating the cache and
#    multiplying API costs. The frozen snapshot keeps the prefix stable.

# 4. Changes during a session:
#    When the agent runs `memory(action="add", content="...")`,
#    the change is written to disk immediately (tool response shows live state)
#    but does NOT appear in the current session's system prompt.
#    Next session start: the file is re-read, the new entry appears.

# Memory tool interface:
def memory_tool(action: str, content: str = "", old_text: str = "") -> str:
    """
    action: "add" | "replace" | "remove"
    ← No "read" action: memory content is automatically injected.
      The agent doesn't need to "check" memory — it's always there.
    ← "replace" and "remove" use substring matching via old_text
      to find and modify specific entries without line numbers or IDs.
    """
    if action == "add":
        append_to_memory_file(content)
    elif action == "replace":
        # Finds the entry containing old_text, replaces with content
        replace_in_memory_file(old_text=old_text, new_content=content)
    elif action == "remove":
        remove_from_memory_file(old_text=old_text)

The frozen snapshot pattern combined with bounded character limits is the architectural choice that distinguishes a production memory system from a prototype. Most agent frameworks add memories indefinitely and re-read everything on every turn. Hermes bounds and freezes, enabling prefix caching while forcing the agent to maintain a prioritized, coherent knowledge base.

Snippet Two: Progressive Skill Disclosure and the ContextEngine ABC

# Two architectural decisions working together:
# 1. Progressive skill disclosure: Level 0 → 1 → 2 on demand
# 2. ContextEngine as a pluggable ABC

# ── Progressive Skill Disclosure ──────────────────────────────────────────────
# From: hermes-agent.nousresearch.com/docs/user-guide/features/skills

# What the agent sees at startup (Level 0 only):
LEVEL_0_SKILLS_IN_PROMPT = """
Available skills: [{name, description, category}, ...]   # ~3k tokens
"""
# ← NOT the full SKILL.md content. Just metadata.
# The agent reads the skill listing and decides which skills apply.

def skills_list() -> list[dict]:
    """Level 0: names, descriptions, categories. Always loaded. ~3k tokens."""
    return [{"name": s.name, "description": s.description, "category": s.category}
            for s in installed_skills]

def skill_view(name: str, path: str = None) -> str:
    """
    Level 1: full SKILL.md content — loaded only when the agent decides to use a skill.
    Level 2: specific reference file within the skill — loaded for deep lookup.

    ← THIS is the trick: progressive disclosure matches token cost to actual need.
      If the agent never needs the 'axolotl' fine-tuning skill in a session,
      it never loads its 4,000-token SKILL.md.
      Compare with a system that loads all skill docs into the system prompt:
      wasted context on every turn, regardless of task.
    """
    if path is None:
        return read_skill_markdown(name)        # Level 1: full SKILL.md
    else:
        return read_skill_reference(name, path) # Level 2: specific file

# Conditional activation: skills that appear/disappear based on available tools
# (from SKILL.md frontmatter)
SKILL_METADATA_EXAMPLE = """
metadata:
  hermes:
    fallback_for_toolsets: [web]    # Hide when web toolset available
    requires_toolsets: [terminal]   # Show only when terminal toolset available
"""
# ← duckduckgo-search skill uses fallback_for_toolsets: [web]
#   When FIRECRAWL_API_KEY is set → web toolset active → duckduckgo hidden
#   When key is missing → web toolset absent → duckduckgo appears automatically
#   Zero configuration. The right tool surfaces for the environment.

# ── ContextEngine ABC ──────────────────────────────────────────────────────────
# From: agent/context_engine.py (architecture docs)

from abc import ABC, abstractmethod
from typing import Any

class ContextEngine(ABC):
    """
    Pluggable context management interface.

    ← Why an ABC instead of a fixed implementation?
    Context compression is not a solved problem. Lossy summarization
    (the default) trades faithfulness for brevity. Lossless compression
    (a community alternative) trades speed for faithfulness. Future work
    may produce better methods. Making this an ABC means the agent runtime
    can benefit from improved compression without changing run_agent.py.

    ← This is the architectural pattern that makes Hermes extensible at
    the right layer. Skills extend tool capabilities. ContextEngine extends
    how the agent manages its own cognitive constraints.
    """

    @abstractmethod
    def assemble(self, conversation: list[dict], max_tokens: int) -> list[dict]:
        """Assemble a conversation list that fits within max_tokens."""
        pass

    @abstractmethod
    def after_turn(self, conversation: list[dict]) -> list[dict]:
        """Post-turn hook: update internal state after each agent turn."""
        pass

    @abstractmethod
    def ingest(self, external_content: str) -> str:
        """Pre-process content before injection into context."""
        pass


class DefaultContextCompressor(ContextEngine):
    """
    Default implementation: lossy LLM summarization.
    When context approaches the model's limit, earlier turns are summarized
    using an auxiliary LLM (auxiliary_client.py) to free up token budget.

    Tradeoff: some information is permanently lost.
    Alternative: lossless compression plugins (community-contributed).
    """
    def assemble(self, conversation, max_tokens):
        if estimate_tokens(conversation) < max_tokens * 0.85:
            return conversation  # No compression needed
        # ← Summarize older turns using auxiliary LLM
        summarized = self.auxiliary_llm.summarize(conversation[:len(conversation)//2])
        return [{"role": "system", "content": f"[Summary of prior context]: {summarized}"}] \
               + conversation[len(conversation)//2:]

The ContextEngine ABC is the most forward-looking design decision in Hermes. It represents explicit acknowledgment that context management is an unsolved engineering problem, and that any production agent needs the ability to upgrade its compression strategy without rewriting its core loop.

It In Action: End-to-End Worked Example

Scenario: Hermes on a Modal serverless backend, running a research pipeline via Telegram while the user is on a walk. Task: analyze a dataset of customer support tickets, generate a summary report, and save it to the project directory.

Setup:

# ~/.hermes/config.yaml (representative)
default_provider: nous_portal
default_model: nous-hermes-3-pro
terminal_backend: modal          # serverless: near-zero cost when idle
messaging:
  telegram:
    bot_token: ${TELEGRAM_BOT_TOKEN}
    user_id: ${MY_TELEGRAM_USER_ID}
toolsets_enabled:
  - terminal
  - file
  - web
  - skills
memory_provider: default         # bounded MEMORY.md + USER.md

User (Telegram, while walking):

"analyze the tickets in ~/data/support_q1.csv and write a report with top issues and sentiment"

Step 1: Session start, context assembly

SOUL.md injected:    ~800 tokens (persona, tone rules)
MEMORY.md injected:  ~500 tokens (frozen snapshot, 67% capacity)
  - "User's primary project: ~/data/ contains Q1 analysis datasets"
  - "User prefers reports as markdown in ~/reports/"
  - "Hermes deployed on Modal, file access via SSH tunnel to home server"
USER.md injected:    ~350 tokens
  - "Prefers executive summary first, technical detail after"
  - "Working timezone: UTC-5, language: English"
Skills list injected: ~800 tokens (metadata only, 28 skills available)
Total system prompt: ~2,450 tokens
Model: Nous Hermes 3 Pro (via Nous Portal)
Backend: Modal (spins up on first tool call, ~3s cold start)

Step 2: Agent ReAct loop

Turn 1 [Reason]: "Need to read the CSV. Check if it's large before loading."
  Action: execute_bash("wc -l ~/data/support_q1.csv")
  Result: "14,847 lines"

Turn 2 [Reason]: "14k tickets. Load first 50 to understand schema."
  Action: execute_bash("head -50 ~/data/support_q1.csv")
  Result: columns: [ticket_id, date, category, priority, text, resolved]

Turn 3 [Reason]: "Need to load the skill_view for 'data-analysis' skill before proceeding."
  Action: skill_view("data-analysis")          # Level 1 load: 2,100 tokens
  Result: [Full SKILL.md with analysis procedure, pandas patterns, charting]
  ← Progressive disclosure: this skill was NOT in context until needed

Turn 4 [Reason]: "Run pandas analysis on full dataset."
  Action: execute_code("""
import pandas as pd
df = pd.read_csv('/home/user/data/support_q1.csv')
summary = {
    'total_tickets': len(df),
    'categories': df['category'].value_counts().to_dict(),
    'priority_dist': df['priority'].value_counts().to_dict(),
    'resolution_rate': df['resolved'].mean(),
    'avg_text_length': df['text'].str.len().mean()
}
print(summary)
""")
  Result: {total: 14847, categories: {billing: 4231, technical: 3892, ...},
           priority: {high: 2847, ...}, resolution_rate: 0.73, avg_len: 187}

Turn 5 [Reason]: "User wants sentiment. Run quick sentiment analysis."
  Action: execute_code([sentiment analysis over sample of 1000 tickets])
  Result: {positive: 12%, neutral: 41%, negative: 47%}

Turn 6 [Reason]: "Generate the report. User's memory says ~/reports/."
  Action: write_file(
    "~/reports/q1_support_analysis.md",
    [executive summary + full analysis + charts as ASCII tables]
  )
  Result: File written, 2,847 bytes

Turn 7 [Memory]: "Update memory with this analysis result."
  Action: memory(action="add",
    content="Q1 2026 support analysis complete: 14,847 tickets, 73% resolution rate,
             47% negative sentiment, top issue: billing (28%). Report at ~/reports/")

Response to Telegram: "Done. Q1 support report saved to ~/reports/q1_support_analysis.md.
  14,847 tickets analyzed. Key findings: 73% resolution rate, 47% negative sentiment
  (mostly billing complaints). Report includes breakdown by category and priority.
  Full details and ASCII charts in the file."

Total agent turns: 7
Total time: ~45 seconds (Modal cold start: ~3s, analysis: ~35s, LLM calls: ~7s)
Modal compute cost for this task: ~$0.003 (serverless billing by compute-second)
Memory written to disk: 1 new entry added to MEMORY.md

The Modal backend's serverless billing model is the correct economics for a personal agent. When idle, Hermes costs essentially nothing. When active, it bills by compute-second. The alternative (a always-on VPS) costs $5-20/month regardless of usage. For workloads with burst-and-idle patterns, serverless dominates.

Why This Design Works, and What It Trades Away

The bounded memory architecture is the most defensible design decision in Hermes relative to the unbounded alternatives. A MEMORY.md with 2,200 characters and a USER.md with 1,375 characters force a specific agent behavior: when memory is full, the agent must prioritize. It must decide which facts are most worth keeping. This pressure produces a curated, coherent knowledge base over time rather than an ever-growing pile of stale context. The usage percentage display ("67% — 1,474/2,200 chars") gives the agent explicit capacity information at every session start.

The frozen snapshot pattern (memory injected once at session start, not updated mid-session) is the correct performance tradeoff. Any change to the system prompt mid-session invalidates the LLM's prefix cache, forcing recomputation of the attention patterns for all prior tokens. On long sessions with frequent memory updates, this multiplies API costs significantly. The frozen snapshot keeps the prefix stable across all turns in a session, maximizing cache hit rates. The tradeoff, changes during a session don't appear until next session, is acceptable because most memory writes are not time-critical.

The ContextEngine ABC enables something that most frameworks do not: upgrading context management without touching the core agent loop. The default lossy compressor works for most cases. Teams with strict faithfulness requirements can swap in lossless alternatives. Teams building research pipelines can implement specialized compressors that preserve specific types of information.

What Hermes trades away:

Community ecosystem scale. OpenClaw has 371k stars, 5,400+ ClawHub skills, and a self-sustaining community that produces new integrations daily. Hermes has 143k stars and the agentskills.io standard, but the raw ecosystem scale is smaller. For users who want maximum breadth of community skills immediately, OpenClaw wins on availability. For users building production workflows who need architectural soundness, Hermes wins.

Installation simplicity for non-engineers. The curl-one-liner install works, but Hermes's architecture (pluggable backends, provider configuration, terminal backend selection) has more moving parts than OpenClaw's simpler setup. The Modal and Daytona backend configuration requires understanding of serverless infrastructure concepts.

Single-language coherence. Hermes is primarily Python (with TypeScript gateway components). OpenClaw is TypeScript throughout. For JavaScript/TypeScript-heavy shops, OpenClaw is easier to extend.

Technical Moats

The Curator: autonomous skill creation from experience. Hermes's most distinctive architectural feature is the ability to create skills from its own task trajectories. When Hermes completes a novel task successfully, the Curator can synthesize a SKILL.md from the experience, making that capability persistently available in future sessions. This is Voyager's skill library pattern (arXiv:2305.16291) implemented in production personal agent infrastructure. No other framework in this class ships this by default.

The RL training pipeline integration. batch_runner.py exports trajectories in formats compatible with Atropos (NousResearch's RL training infrastructure). This means Hermes is not just an inference runtime: it is a data collection tool for improving the Hermes model series. Sessions become training data. This is the correct long-term architecture for a research lab's agent infrastructure, and it is a moat that other agent frameworks cannot replicate without also being in the model training business.

The Memory Provider ABC. Like the ContextEngine ABC, the MemoryProvider interface allows replacing the default MEMORY.md system with any persistence backend. Honcho (plastic-labs) dialectic user modeling is the production-grade alternative shipped with Hermes. This gives teams the ability to upgrade their user modeling independently of the rest of the agent stack.

Insights

Insight One: Hermes Agent is architecturally the research lab's version of OpenClaw, and the differences reveal exactly which design choices were made under growth pressure versus engineering discipline.

OpenClaw achieved 371k stars because it packaged the right idea with the right UX at the right moment. Its architecture reflects that origin: markdown files, no type system for skills, unbounded memory, fixed context compression. These choices made setup fast and onboarding easy. They also created documented security vulnerabilities, unpredictable memory accumulation, and compression that cannot be upgraded. Hermes makes the opposite choices in every case: bounded memory, typed SKILL.md frontmatter, pluggable ContextEngine, MemoryProvider ABC, progressive skill disclosure. The architectural quality is higher. The adoption curve is steeper. Whether the higher quality is worth the steeper curve depends entirely on what you are building.

Insight Two: The agentskills.io open standard is the most strategically important decision NousResearch made in this project, and it receives the least attention.

Hermes skills are compatible with agentskills.io, an open standard for portable, shareable agent skills. This means skills written for Hermes can theoretically be used by any agent that implements the standard. OpenClaw's ClawHub skills are OpenClaw-specific. If agentskills.io achieves adoption across multiple agent frameworks, Hermes gains access to a cross-framework skill ecosystem without building a 5,400-skill marketplace from scratch. The standard is early and adoption is limited. The strategic intent is clear.

Takeaway

Hermes ships an integrated RL training pipeline (batch_runner.py → trajectory export → Atropos) that makes it simultaneously a user-facing agent AND a data collection tool for the Hermes model training pipeline. Most users treat it as the former. NousResearch treats it as both.

The batch_runner.py component exports agent trajectories in formats compatible with Atropos, NousResearch's RL training infrastructure. This means that every successful multi-step Hermes session can become training data for the next generation of Hermes models. The agent's good decisions become demonstrations. Its corrections become preference data. This is not a side feature: it is the reason NousResearch built the agent in the first place. A research lab's agent infrastructure should double as its training data collection system. The fact that this architecture also works as a personal AI assistant for end users is a beneficial side effect of the design, not its primary motivation.

TL;DR For Engineers

  • Hermes Agent (NousResearch, MIT, 143k stars, Python, Node.js gateway) is a production-grade personal AI agent runtime with 70+ tools, 7 terminal backends (including serverless via Modal and Daytona), 20+ messaging platforms, and a closed learning loop via the Curator (autonomous skill creation from experience).

  • Bounded memory is the key architectural differentiator: MEMORY.md (2,200 chars, ~800 tokens) + USER.md (1,375 chars, ~500 tokens), both frozen at session start for prefix cache stability. Character limits force agent-managed prioritization rather than unbounded accumulation.

  • Progressive skill disclosure at three levels: L0 (metadata list, ~3k tokens, always present), L1 (full SKILL.md, loaded on demand), L2 (specific reference file, loaded for deep lookup). Conditional activation via frontmatter (fallback_for_toolsets, requires_toolsets) means the right skills surface automatically based on available tools.

  • ContextEngine ABC (pluggable): default is lossy LLM summarization via auxiliary client. Replace with any implementation. MemoryProvider ABC: default is MEMORY.md + USER.md, alternative is Honcho dialectic user modeling.

  • RL training integration: batch_runner.py exports agent trajectories for Atropos. Hermes is simultaneously a personal agent and NousResearch's data collection infrastructure for model improvement. The learning loop is real and extends beyond the agent itself.

The Research Lab Built the Agent Its Models Deserve

Hermes Agent is what happens when the team training the models also builds the agent runtime. The bounded memory forces prioritization. The ContextEngine ABC enables upgrading. The Curator creates skills from experience. The RL pipeline makes the agent self-improving at the model level, not just the session level. Every architectural decision reflects production deployment discipline applied to personal AI infrastructure.

OpenClaw proved the architecture was correct. Hermes proves the architecture can be implemented correctly. Both are true, and both matter.

References

Hermes Agent (NousResearch, MIT, 143k stars) is a production-grade personal AI agent runtime distinguished from peer frameworks by three core architectural choices: bounded memory (MEMORY.md: 2,200 chars, USER.md: 1,375 chars, both frozen at session start for prefix cache stability), a pluggable ContextEngine ABC with default lossy LLM summarization and a MemoryProvider ABC with Honcho as an alternative, and a three-level progressive skill disclosure system compatible with the agentskills.io open standard. The Curator subsystem creates skills autonomously from task experience (Voyager-style skill accumulation in production). The batch_runner.py RL training integration exports agent trajectories to Atropos, making Hermes simultaneously a user-facing agent runtime and NousResearch's data collection infrastructure for model improvement.

Sponsored Ad

If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad — it helps us keep building and delivering value 🚀

You paid $5,000 for that website. You can't even update it

Agencies charge thousands. Take weeks. Hand you something that needs a developer every time you want to make a change.

Readdy builds you a professional, mobile-ready website in minutes, with SEO, hosting, booking, and payment integrations included. You just need to describe your business, and when you need to update something, you just need to ask our AI. No developer call. No extra invoice.

You get the same polished result at a fraction of the price. And it’s all done before your agency would have sent the first draft.

Recommended for you