Web World Models: The Web Stack Is the World Model

Sponsored by

^{SnackOnAI Engineering | Senior AI Systems Researcher | Technical Deep Dive | April 21, 2026}

The world model research community has spent years chasing two dead ends: rigid databases that can't generate anything, and fully generative hallucination engines that can't remember anything. The Ha and Schmidhuber "World Models" paper (2018) trained agents to dream inside their own latent space. Beautiful idea. Practically unusable for persistent, consistent, large-scale environments. Eight years later, Princeton's AI2 Lab answered with something more pragmatic and more powerful: stop trying to make neural networks simulate physics. Let code do it. Let the LLM do what it's actually good at.

That is the core insight of Web World Models (WWM), and it has significant engineering consequences that the community is underreacting to.

What It Actually Does

Web World Models from Princeton's AI2 Lab is not a new model architecture. It is a design pattern for building persistent, explorable, open-ended environments for language agents, grounded on one deliberate constraint: all world state and "physics" live in ordinary web code (TypeScript, JavaScript), while LLMs handle only narrative, context, and high-level content generation.

The paper introduces seven working systems built on this pattern:

Infinite Travel Atlas: globe-scale exploration grounded in real geography, no database
Galaxy Travel Atlas: synthetic sci-fi cosmos with procedural layout and LLM narrative
AI Spire: a roguelike deck-builder where users "Wish" for cards generated in real time
AI Alchemy: a falling-sand simulator where LLMs decide unknown element reactions
Cosmic Voyager: a 3D solar system with view-dependent AI narration
WWMPedia: Wikipedia-style articles composed on demand from live web retrieval
Bookshelf: infinite long-form fiction with user-controlled interface styles and literary tags

The codebase is TypeScript (61.4%) and JavaScript (24.7%), which is itself a signal: this is a web engineering project that happens to use LLMs, not an LLM project that happens to have a UI.

The Architecture

Every WWM is organized around a strict two-layer decomposition. Understanding why this split is where it is, not somewhere else, is the whole paper.

^{Focus on the typed interface boundary. This is where hallucination is structurally prevented: the LLM outputs JSON that must conform to a TypeScript interface, which code validates before execution. The stochastic layer cannot corrupt the deterministic layer.}

Three design decisions define the architecture:

Physics vs. Imagination separation. Code owns invariant state: inventory counts, coordinates, caps, game rules, reaction tables. The LLM owns perceptual content: descriptions, dialogues, narratives, flavor. This is not a soft boundary. It is enforced at the interface layer. An LLM cannot invent a new rule for card damage in AI Spire by writing narrative. The symbolic engine validates and executes only what the schema permits.

Hash-based object permanence. Storing an infinite universe is impossible. WWM solves this with a hash trick: any coordinate or entity ID is passed through a deterministic hash function, producing a seed. That seed fixes the LLM's sampling randomness for that entity. A player who visits Planet Stormglass, leaves, and returns three sessions later gets exactly the same planet, because the same coordinate produces the same seed produces the same LLM output. No database. No storage. Infinite but consistent.

Fidelity slider and graceful degradation. If the LLM is slow or unavailable, WWM degrades to cached content, then to template-based generation. The world never crashes. It gets less vivid. This is the correct engineering decision for production agentic systems and it is almost never implemented in research prototypes.

The Code

Snippet One: Hash-Based Object Permanence (TypeScript pattern)

// ← THE CORE TRICK: No database lookup. No storage. Pure determinism.
// Any coordinate deterministically maps to a seed, seed fixes the LLM.
function getLocationSeed(lat: number, lon: number): number {
  // Quantize coordinates to prevent floating-point drift across sessions
  // ← Without quantization, 48.8567 and 48.8566 produce different seeds
  const qLat = Math.round(lat * 1000) / 1000;
  const qLon = Math.round(lon * 1000) / 1000;

  // Simple but effective: combine coordinates into a stable string key
  const key = `${qLat},${qLon}`;

  // FNV-1a hash: fast, low-collision, deterministic across all JS runtimes
  // ← THIS is the trick: same key → same seed → same LLM output, always
  let hash = 2166136261;
  for (let i = 0; i < key.length; i++) {
    hash ^= key.charCodeAt(i);
    hash = (hash * 16777619) >>> 0; // unsigned 32-bit to prevent overflow
  }
  return hash;
}

// Physics layer: infer hard attributes from code (no LLM needed)
function getPhysicsAttributes(lat: number, lon: number): LocationPhysics {
  return {
    hemisphere: lat >= 0 ? 'northern' : 'southern',
    // ← Code computes climate zone deterministically from latitude bands
    climateZone: Math.abs(lat) < 23.5 ? 'tropical'
               : Math.abs(lat) < 66.5 ? 'temperate' : 'polar',
    isCoastal: isNearCoastline(lat, lon),  // spatial index lookup
    elevation: getTerrainElevation(lat, lon), // SRTM data
  };
}

// Imagination layer: LLM generates narrative ON TOP of physics attributes
async function generateLocationNarrative(
  lat: number,
  lon: number,
  physics: LocationPhysics
): Promise<LocationNarrative> {
  const seed = getLocationSeed(lat, lon);

  // ← LLM receives structured physics facts, not raw coordinates
  // This grounds the narrative in physical reality without hallucinating geography
  const prompt = buildGroundedPrompt(physics, seed);

  // ← Seed is passed as temperature modifier or PRNG seed to fix sampling
  // Same prompt + same seed = same output. Object permanence without storage.
  const response = await llm.generate(prompt, { seed });

  // ← Schema validation: LLM output MUST conform to TypeScript interface
  // Structural hallucinations (wrong keys, wrong types) are caught here
  return validateAgainstSchema<LocationNarrative>(response, LocationNarrativeSchema);
}

^{The hash function is the entire persistence layer. There is no database call, no cache write, no distributed state. A coordinate is its own storage key, and the seed is its own retrieval mechanism.}

Snippet Two: Typed Interface Enforcement (AI Spire "Wish" mechanic)

// The TypeScript interface defines what the LLM is ALLOWED to output
// ← Hard schema = no structural hallucination. LLM can't invent new fields.
interface GeneratedCard {
  name: string;             // Card display name
  cost: number;             // Energy cost (0-10, enforced by schema)
  damage: number;           // Base damage (0-999, enforced)
  effect: CardEffect;       // Enum: 'damage' | 'shield' | 'heal' | 'buff'
  description: string;      // Flavor text (free-form, LLM-controlled)
  specialMechanic?: string; // Optional: freeform mechanic description
}

// User types: "a fireball that freezes enemies"
// System translates this to a structured LLM prompt
async function wishForCard(userWish: string): Promise<GeneratedCard> {
  const prompt = `
    The player wishes for: "${userWish}"
    Generate a game card that fulfills this wish.
    Respond ONLY with valid JSON conforming to this schema:
    ${JSON.stringify(CardJSONSchema)}
    
    Rules the code enforces (do not violate):
    - cost must be between 0 and 10
    - damage must be between 0 and 999
    - effect must be one of: damage, shield, heal, buff
  `;

  const raw = await llm.generate(prompt);

  // ← THIS is where the symbolic engine takes control back
  // Invalid JSON or schema violations throw before reaching game state
  const card = JSON.parse(raw);
  const validated = validateSchema<GeneratedCard>(card, GeneratedCardSchema);

  // ← Clamp numeric values even after schema validation
  // Belt-and-suspenders: never trust LLM arithmetic
  validated.cost = Math.min(10, Math.max(0, validated.cost));
  validated.damage = Math.min(999, Math.max(0, validated.damage));

  return validated;
}

^{Notice the two-layer defense: schema validation rejects structural errors, then numeric clamping catches value-range violations. The LLM controls creativity (what the card does in narrative). Code controls legality (whether it can exist in the game).}

It In Action: End-to-End Worked Example

Scenario: A language agent is navigating the Infinite Travel Atlas and arrives at coordinates (35.6762° N, 139.6503° E) (central Tokyo).

Step 1: Physics layer runs first (no LLM)

const physics = getPhysicsAttributes(35.6762, 139.6503);
// Output:
// {
//   hemisphere: 'northern',
//   climateZone: 'temperate',
//   isCoastal: true,
//   elevation: 40  // meters above sea level
// }

No API call. No latency. Deterministic. These facts cannot be hallucinated.

Step 2: Hash produces seed

const seed = getLocationSeed(35.6762, 139.6503);
// Output: 2847392841  (same every time for this coordinate)

Step 3: LLM generates narrative within physics constraints

const narrative = await generateLocationNarrative(35.6762, 139.6503, physics);
// Output (abbreviated):
// {
//   theme: "Urban Density and Tradition",
//   description: "A temperate coastal lowland dense with layered histories...",
//   beacons: [
//     { name: "Senso-ji Temple", type: "cultural", distanceKm: 12.4 },
//     { name: "Shibuya Crossing", type: "landmark", distanceKm: 3.1 }
//   ],
//   itinerary: "Begin at the Imperial Palace gardens at dawn..."
// }

Step 4: Agent returns to same location three sessions later

Same coordinate → same hash → same seed → LLM regenerates identical narrative output. No database. No cache. The agent sees the same Senso-ji Temple at 12.4 km. Object permanence achieved.

Step 5: Agent takes action (adds item to inventory)

// Inventory is Physics Layer. LLM cannot modify it.
worldState.inventory.add({ item: "Temple Fortune", quantity: 1 });
// Inventory is code-managed state. It persists across sessions in the
// conventional way (server-side or client-side storage). Only narrative
// is regenerated from hash. State changes are stored normally.

Real numbers from the project: The travel atlas covers any geographic coordinate on Earth with no database backing. The galaxy explorer generates procedural star systems at any coordinate in a synthetic cosmos. Both achieve object permanence via hashing alone.

Why This Design Works, and What It Trades Away

The Physics/Imagination split is the correct abstraction because it aligns with what LLMs are actually good and bad at. LLMs are good at generating plausible, contextually coherent narrative text. They are bad at maintaining precise numeric state across turns, enforcing consistent rules, and remembering exact prior outputs without explicit memory. WWM assigns tasks accordingly: code handles the bad cases, LLM handles the good cases.

The hash-based permanence solves a genuine unsolved problem in generative world design. Prior fully-generative approaches (think: LLM-based RPGs) have no object permanence. The player leaves a village and returns to find it described completely differently. WWM eliminates this by turning coordinates into deterministic seeds. This is the single most practically useful idea in the paper.

The TypeScript interface enforcement is the correct production pattern for any system where an LLM must output structured data that code will execute. JSON Schema validation with fallback clamping is exactly how you prevent symbolic systems from executing malformed LLM outputs.

What WWM trades away:

True global consistency. The hash-based approach guarantees local consistency (same coordinate = same output) but not global consistency (entity A knowing about event B that happened elsewhere). If an agent detonates a star in the Galaxy Atlas, distant star systems don't react. Global causality requires a real state layer, which WWM explicitly avoids for scale reasons. This is the right tradeoff for exploration-focused environments; it is the wrong tradeoff for simulation-focused environments.

Novel emergent physics. Because physics rules are code-defined, they cannot evolve. The game cannot discover new mechanics. AI Alchemy is the partial exception: unknown element reactions are LLM-generated and cached, creating a growing reaction table. But the table grows by accretion, not by emergent physical law.

Technical Moats

The hard part is not the architecture pattern. The hard part is what Princeton actually built on top of it.

Seven working systems across radically different domains. An Infinite Travel Atlas grounded in real geography (using terrain elevation data, coastline proximity, and climate band physics), a synthetic sci-fi galaxy with consistent procedural layout algorithms, a roguelike deck-builder with a functional symbolic combat engine, a falling-sand simulator with an LLM-managed reaction table, a 3D solar system renderer with view-dependent narration, a live-retrieval encyclopedia composer, and an infinite novel generator. Proving the pattern generalizes this broadly required actually building all seven. That is the moat: not the idea but the execution surface.

The fidelity slider. Graceful degradation from LLM generation to cached content to template fallback is engineering work that research prototypes skip and that production systems require. The fact that WWM includes it signals that the authors thought about deployment, not just publication.

Real geography integration. The Travel Atlas uses real SRTM elevation data, coastline indices, and climate zone calculations as physics ground truth. The LLM cannot hallucinate that Tokyo is a desert because the physics layer already established it as temperate and coastal. Grounding generative content in authoritative external data is a deployable pattern for any RAG-adjacent world-building system.

Insights

Insight One: WWM is not a world model in the Ha-Schmidhuber sense, and that is precisely why it works.

The original World Models paper (2018) trained a VAE to compress environment states into latent vectors and an RNN to predict future states. The agent dreamed inside its own model. This approach works for fixed-physics environments (car racing, simulated games) and fails for open-ended, semantically rich environments because the VAE/RNN latent space cannot represent novel concepts that weren't in training data. WWM abandons the learned physics model entirely and replaces it with code. This is not a limitation. It is the design. Any environment that has articulable rules benefits from WWM's approach, because articulable rules are better expressed in code than in latent space. Calling WWM a "world model" invites the wrong comparison class. It is closer to a procedural generation engine with an LLM content layer.

Insight Two: The real contribution is not the hash trick or the schema enforcement. It is proving that the web stack is a viable substrate for agentic environments, full stop.

Most agentic AI research assumes that agent environments must be either hand-built simulations (OpenAI Gym, AlfWorld, WebArena) or raw internet access. WWM shows a third path: ordinary TypeScript web applications, deployed on ordinary web infrastructure, serve as controllable, scalable, persistent environments for language agents without requiring custom simulation frameworks. The fact that the codebase is 61.4% TypeScript and 24.7% JavaScript is not incidental. It means any web engineer can build a WWM. No CUDA. No training runs. No special hardware. The barrier to entry for building agentic environments just dropped to "can you write a React app."

Takeaway

Object permanence in an infinite world is achieved with zero storage cost by exploiting the LLM as a deterministic function of its seed.

Most engineers assume that persistent world state requires persistent storage: databases, caches, snapshots. WWM shows this is false when the world content is generative. If the same input always produces the same output (enforced by fixing the seed), then "storing" an entity is equivalent to being able to regenerate it. The hash function IS the storage layer. The LLM IS the retrieval mechanism. A coordinate is simultaneously its own primary key and its own data.

This is not unique to LLMs. Procedural generation in games (Minecraft world gen, No Man's Sky planet generation) uses the same principle. What WWM adds is applying this to semantic content generation (narrative, dialogue, encyclopedic articles) at the density of meaning that LLMs enable, not just geometric noise that Perlin functions generate.

TL;DR For Engineers

WWM separates world state into Physics (deterministic TypeScript code) and Imagination (stochastic LLM), with a typed JSON schema interface as the enforcement boundary between them. Neither layer can corrupt the other.
Infinite world exploration with object permanence is achieved via hash-based seed generation: any coordinate maps to a deterministic seed, seed fixes LLM sampling, same coordinate always produces same output. Zero storage cost.
TypeScript interfaces enforce what the LLM can output before code executes it. Schema validation plus numeric clamping is the correct production pattern for LLM-to-symbolic-system interfaces.
The fidelity slider (LLM → cache → template) is included in the design. This is not a research prototype. It is a deployable pattern.
The codebase is 61.4% TypeScript, 24.7% JS. Any web engineer can build a WWM. No ML infrastructure required.

The Web Was Already a World Model. Princeton Just Made It Official.

The field has been chasing learned physics simulators for years. WWM's answer is quieter and more durable: the physics was always in the code. Every web application already has a state machine, a rule engine, typed interfaces, and a rendering layer. What it lacked was an open-ended content generation layer that could operate within those constraints without breaking them. LLMs, constrained by schemas and seeded by hash functions, are exactly that layer.

The seven systems Princeton built are demos. The pattern they demonstrate is infrastructure. Any agent that needs to act in a persistent, rule-consistent, open-ended environment now has a blueprint that requires no new model training, no custom simulation framework, and no GPU beyond what inference already costs.

The question is whether the agentic AI community will recognize this as the practical scaffolding it is, rather than dismissing it because it lacks a novel loss function.

Web World Models (Princeton AI2 Lab, arXiv:2512.23676) is a design pattern, not a new model: TypeScript code owns deterministic world state and rules (Physics Layer), while LLMs generate narrative and context on top (Imagination Layer), with hash-based seeding providing object permanence at zero storage cost across infinite explorable environments. The typed interface boundary between layers prevents structural hallucination by enforcing JSON schema compliance before any LLM output touches executable code. Seven working systems, including an infinite travel atlas grounded in real geography and a roguelike deck-builder with a "Wish" card generation mechanic, demonstrate that ordinary web infrastructure is a viable substrate for persistent, controllable agentic environments without custom simulation frameworks or training runs.

References

Web World Models, arXiv:2512.23676, Feng et al., Princeton AI2 Lab, December 2025
Web World Models GitHub Repository, 89 stars, CC-BY-4.0
Web World Models Project Page, interactive demos
World Models, arXiv:1803.10122, Ha and Schmidhuber, 2018, the foundational work this paper extends and departs from
Interactive World Models demo, original Ha/Schmidhuber interactive companion
Hugging Face Papers: Web World Models

Sponsored Ad

If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad — it helps us keep building and delivering value 🚀

Stop Losing Your Money. It's time to upgrade your trading platform.

Your current trading platform is probably letting you down

Limited assets (no international stocks, no commodities, no pre-IPO companies)
Limited ability to short
Limited access to leverage
Limited trading hours

Liquid is one of the fastest growing trading platforms, allowing users to trade stocks, commodities, FX, and more 24/7/365 from their phone and computer.

Trading on Liquid is as simple as:

Pick an asset
Pick long or short
Pick your position size and leverage
Place your trade

The best part is that Liquid markets never close. So no matter what is going on in the world, you are able to keep your portfolio positioned properly.

Start trading 24/7 today