SnackOnAI Engineering · Senior AI Systems Researcher · March 2026
Source: https://github.com/mohnishbasha/snackonai · License: Apache 2.0
The Problem: Stateless Models in a Stateful World
The dominant failure mode of current AI tooling is architectural, not intellectual. LLMs reason exceptionally well about what to do. They cannot, in any meaningful sense, do it. The result is a degraded human-in-the-loop cycle: paste code, receive suggestion, copy back, execute, observe failure, paste error, repeat. This is not an assistant. This is an expensive autocomplete with a conversation UI bolted on.
The root cause is state isolation. Transformers are stateless sequence mappers. They emit tokens conditioned on input context, with no persistent handle on your runtime, filesystem, process table, or external services. The gap between "knowing" and "acting" is not a prompt engineering problem. It is a systems architecture problem, and it requires a systems architecture solution.
Goose is that solution: an agent runtime that closes the perception-action loop locally, without cloud mediation, by coupling an LLM to a standard tool-invocation protocol.
Why Existing Approaches Are Structurally Insufficient
Copilot-class tools are next-token predictors embedded in an editor. They have no execution substrate. Calling them "AI agents" is a category error.
ChatGPT Code Interpreter executes sandboxed Python in OpenAI's infrastructure. Useful for analytics. Fundamentally incompatible with private codebases, internal APIs, or any data that cannot leave your network perimeter.
LangChain and its descendants let you assemble agents programmatically, but at the cost of hardcoded tool schemas, brittle prompt templates, and zero standardization across team boundaries. The framework becomes the product. Every team maintains their own agent glue. This does not scale.
The systemic gap: no open, runtime-discoverable protocol existed for an LLM agent to connect to and invoke arbitrary external capabilities without rebuilding the integration layer from scratch. Model Context Protocol (MCP) fills that gap, and Goose is the most complete local-first runtime built around it.
System Architecture
Goose decomposes into three layers: the Interface, the Agent, and the Extension mesh. The full crate structure is documented in AGENTS.md.
┌────────────────────────────────────────────┐
│ USER INTERFACE │
│ (Desktop App or CLI) │
└──────────────────┬─────────────────────────┘
│ session / prompt
▼
┌────────────────────────────────────────────┐
│ AGENT (Rust core) │
│ Manages interactive loop │
│ Handles context revision │
│ Routes tool calls │
│ Error recovery │
└────────┬──────────────────┬────────────────┘
│ LLM API calls │ MCP tool calls
▼ ▼
┌──────────────┐ ┌────────────────────────┐
│ LLM Provider│ │ EXTENSIONS (MCP) │
│ (Anthropic, │ │ Developer tools │
│ OpenAI, │ │ File system │
│ Gemini...) │ │ Browser control │
└──────────────┘ │ Custom MCP servers │
└────────────────────────┘
Interface manages session lifecycle and can instantiate parallel agent processes for concurrent workloads. It is deliberately thin, containing no reasoning logic.
Agent is the orchestration core, written in Rust. It runs the interactive loop, manages conversation state, routes MCP tool invocations, and implements context revision. The Rust choice matters: long-running agent sessions with high tool-call volume need predictable memory behavior, not a GC pause.
Extensions are MCP servers exposing typed tool schemas. The LLM selects tools based on schema semantics. The agent executes them. This clean separation of concern is what makes Goose extensible without requiring framework changes.
The Interactive Loop
The interactive loop is the fundamental execution primitive:
User Prompt
│
▼
Agent sends [prompt + tool schemas] to LLM Provider
│
▼
LLM responds: [text | tool_call(name, args)]
│
├── text only → render to user, loop ends
│
└── tool_call → Agent executes via MCP
│
▼
Tool result returned
│
▼
Result injected into context, sent back to LLM
│
▼
LLM responds (may chain further tool calls)
│
▼
Final response → user
A single prompt can trigger dozens of chained tool invocations. The loop terminates when the model produces a text-only response, signaling task completion or a request for human input.
Context Revision: The Underappreciated Core Problem
Token management is where most agent frameworks quietly fail. Every message, tool result, file content, and system instruction competes for a finite context window. Naive frameworks let context accumulate until truncation, which silently corrupts agent behavior.
Goose implements active context revision: verbose command outputs are summarized by faster auxiliary models, file edits use find-and-replace instead of full rewrites, ripgrep skips irrelevant paths, and deletion heuristics prune stale context. This is not a minor optimization. It is what makes multi-step tasks across large codebases tractable.
ACP: The Protocol Boundary Evolving in Real Time
Goose is actively transitioning its client-server protocol to Agent Client Protocol (ACP), a JSON-RPC 2.0 standard built on top of MCP types. The migration (tracked in Issue #6642) replaces a bespoke SSE-based API with a standardized interface that any ACP-compatible client can target. This matters for teams building custom integrations: ACP gives you a stable, documented contract rather than an internal implementation detail.
Failure Modes Worth Understanding
Tool invocation errors are fed back to the LLM as structured tool responses rather than propagated as exceptions, per the error handling design. This enables model-driven recovery: retry with modified arguments, switch tools, or surface the failure to the user.
The failure modes that are not handled: provider-level outages (delegated to router layers like Tetrate Agent Router), context window exhaustion on pathologically large tasks, and non-idempotent side effects from retried tool calls. That last one is the most operationally dangerous and gets insufficient attention in the documentation.
Implementation Details
Install and Configure
# CLI
curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash
goose configure
Full installation options (Desktop, DEB, RPM, Flatpak) are covered in the official installation guide.
Session Management
goose session # new session
goose session -r # resume last session
goose acp # expose as ACP server for editor integration
Context Engineering via .goosehints
This is the highest-leverage configuration surface in Goose and the most underused. Documented in the goosehints guide:
# .goosehints
TypeScript monorepo, pnpm workspaces.
Run `pnpm typecheck` before every commit.
Never touch /generated, these are auto-generated artifacts.
Use Zod for all runtime validation, no alternatives.
The .goosehints file is injected into every session. It is your system prompt for the agent. Teams that treat it as an afterthought produce inconsistent results. Teams that engineer it carefully reduce exploratory token burn by 40-60% on large repos.
Adding Extensions
Extensions are added via the extensions marketplace or the configure menu:
goose configure
# Add Extension > Built-in Extension > Computer Controller
# Set timeout: 300s for long-running tasks
The extensions design guide covers how to build custom MCP servers compatible with Goose.
Tradeoffs
Local execution means real consequences. There is no sandbox. A misguided shell command executes against your actual filesystem. This is a deliberate design choice that enables genuine capability, but it requires engineers to treat Goose as a peer process with write access, not a suggestion engine. The MCP security specification explicitly flags this: tool execution must be treated with appropriate caution and user consent flows are the implementor's responsibility.
MCP flexibility is also MCP variance. The open protocol means tool quality is entirely a function of implementation quality. A vague tool description produces inconsistent invocation behavior. The LLM selects tools based on schema semantics; poorly specified schemas produce selection errors that compound across long task chains.
Provider agnosticism creates optimization blind spots. Claude 3.5 Sonnet and GPT-4o have meaningfully different tool-use calibration. An agent loop designed around one model's tool-selection behavior will not transfer cleanly to another. Goose has no model-specific optimization layer. This is an acceptable tradeoff for flexibility, but engineers should validate behavior when switching supported providers.
Context revision trades completeness for tractability. Aggressive trimming is what makes long sessions viable. It also creates a class of subtle bugs where constraints stated early in a session are pruned before they become relevant. There is no clean resolution to this tension; it is fundamental to finite-window architectures.
Working Example
mkdir api-demo && cd api-demo
goose session
Prompt:
Create a Node.js Express POST /users endpoint with name and email
validation. Return 201 on success, 400 with error details on failure.
Write Jest tests covering both cases and run them.
Goose's execution trace:
create_file(index.js) → Express server with validation logic
create_file(users.test.js) → Jest test suite
run_command(npm init -y && npm install express jest)
run_command(npx jest) → observe output
IF tests fail:
read_file(users.test.js) → diagnose failure
str_replace(index.js) → targeted fix
run_command(npx jest) → verify
LOOP until green
The distinction worth emphasizing: Goose is not generating code for you to run. It is running the code, observing real output, and closing the feedback loop autonomously. The epistemological difference between generating and verifying is enormous, and most AI tooling only does the former.
More examples are available in the official tutorials and recipe cookbook.
Production Lessons
Context blowup scales with repo surface area. Large monorepos trigger extensive exploratory reads. Mitigate with scoped prompts and a well-engineered .goosehints that eliminates ambiguity about which modules are in scope.
Reliability degrades superlinearly with task length. Tasks requiring 20+ sequential tool calls fail at rates far exceeding linear extrapolation from 5-step tasks. Attention degrades, context gets trimmed, and early constraints get "forgotten." Decompose long tasks into bounded sessions with explicit state handoffs. Anthropic's own research on code execution with MCP found that loading tool definitions on demand rather than upfront reduced token usage by 98.7% in high-tool-count environments — the same principle applies to session design.
Idempotency is your responsibility, not Goose's. Failed tool calls get retried. If your MCP tools have side effects, you need idempotency at the tool implementation level. Design defensively.
LLM round-trip latency dominates loop performance. Each tool invocation incurs at least one LLM round-trip. A 10-step task at 2 seconds per call adds 20 seconds of irreducible overhead. Tool execution speed is rarely the bottleneck. Provider TTFT is.
Contrarian Insights
The LLM is the least important component. The dominant discourse around AI agents focuses on which model to use. This is largely irrelevant. Tool schema quality, context engineering, and MCP server design determine agent reliability far more than model choice. A well-designed extension layer with a mid-tier model outperforms a poorly designed one with a frontier model. The model is a reasoning substrate. The tooling is the product. Block's own CTO stated at MCP's launch that open protocols like MCP are the bridges that connect AI to real-world applications — notably not the model itself.
Sandboxing is overrated for expert users and dangerous for everyone else. The security community will push for sandboxed execution environments for AI agents. This is correct for general-purpose consumer tools and completely wrong for staff-level engineers who need full environment access to accomplish anything interesting. Sandboxing strips the capability that makes agentic systems worth using. The correct solution is access controls, audit logging, and the permission flows Goose already supports, not containerization of the agent's action space.
Surprising Takeaway
Error messages are more valuable than tool functionality. The architectural decision to feed errors back to the LLM as structured tool responses rather than halting execution transforms the quality ceiling of agent behavior. A tool that returns a rich, semantically meaningful error enables the model to recover, adapt, and retry intelligently. A tool that returns a generic exception destroys the recovery path entirely. In practice, investing engineering time in error response quality yields higher reliability improvements than adding new tool capabilities. Most teams get this backwards.
What Engineers Must Internalize About Goose
Goose is not an AI assistant with shell access. It is a local-first agent runtime built around an open tool-invocation protocol, designed to own tasks end-to-end without cloud mediation. The architectural primitives — the interactive loop, context revision, MCP-native extensibility — are the right primitives for the class of problems it addresses.
The MCP layer is where the real leverage lives. Invest in well-specified tool schemas, rich error responses, and idempotent implementations. Treat .goosehints as a first-class engineering artifact. Decompose long tasks. And treat Goose as a peer process with production-level write access to your environment, because that is precisely what it is.
References
Goose Core
block/goose GitHub Repository — Source code, crate structure, and active development
Goose Official Documentation — Guides, tutorials, and architecture overview
Goose Architecture Overview — Interactive loop, context revision, and component design
Goose Extensions Design Guide — How MCP tools are exposed and invoked
Goose Error Handling Guide — Error recovery architecture
Goose Installation Guide — CLI, Desktop, platform-specific setup
Goose Quickstart — Five-minute getting-started walkthrough
Goosehints Context Engineering — Session-level context injection
Goose Extensions Marketplace — Available built-in and community extensions
Goose Recipe Cookbook — Reusable task templates
AGENTS.md — Crate Structure — Full repository layout and development conventions
Protocol References
Model Context Protocol Specification — Authoritative MCP spec including security considerations
modelcontextprotocol/modelcontextprotocol GitHub — Spec and SDK source
Anthropic: Introducing MCP — Original announcement with Block CTO commentary
ACP Migration Discussion (Issue #7309) — Goose's transition to Agent Client Protocol
ACP Implementation Issue #6642 — Technical detail of ACP-over-HTTP rollout
Goose ACP Protocol Reference — JSON-RPC method signatures and session lifecycle
Engineering Context
Anthropic: Code Execution with MCP — On-demand tool loading and 98.7% token reduction findings
Goose Supported LLM Providers — Provider configuration and compatibility matrix
Goose Access Controls — Permission flows and audit configuration
Tetrate Agent Router — Multi-provider routing with failover used in Goose deployments
Sponsored Ad
If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad—it helps us keep building and delivering value 🚀
Turn AI into Your Income Engine
Ready to transform artificial intelligence from a buzzword into your personal revenue generator
HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.
Inside you'll discover:
A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential
Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background
Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve
Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

