KumoRFM-2: The Foundation Model That Made NVIDIA Pay $400M to Own the Enterprise Prediction Layer

In partnership with

KumoRFM-2, released April 14 2026, is that model, and its architecture tells you exactly why NVIDIA paid a 60% premium over Kumo's last known valuation.

SnackOnAI Engineering | Senior AI Systems Researcher | Technical Deep Dive | June 7, 2026

The standard enterprise ML pipeline for a single prediction task (customer churn, fraud detection, product recommendation) requires: schema understanding and data mapping, feature engineering across multiple tables, target variable definition, model training, and monthly retraining as data drifts. For a team without dedicated ML engineers, this takes months. For a team with ML engineers, it still takes weeks. And you build it again for the next task.

KumoRFM-2 (arXiv:2604.12596, Hudovernik, López, Kocijan, Nitta, Lenssen, Leskovec, Fey, April 14 2026) eliminates this pipeline. You connect your database. You specify a prediction task in natural language. The model runs in-context learning over your relational tables, preserving temporal consistency, traversing foreign key relationships, and returning a prediction. No feature engineering. No training. No target variable generation.

The architecture that enables this, the Relational Graph Transformer with hierarchical attention at three scales, is the technical contribution that made Kumo acquisition-worthy. NVIDIA did not buy a chatbot. It bought the machinery that turns Snowflake and Databricks data warehouses into prediction engines, which is exactly the software layer NVIDIA's AI Foundry needed to add above its hardware stack.

Scope: KumoRFM-2's three-scale hierarchical attention architecture, the task-early injection design decision, the four pre-training axes, benchmark results on 41 tasks, and the strategic logic of the $400M+ NVIDIA acquisition. Not covered: the full KumoRFM-1 architecture comparison beyond its key limitations, or NVIDIA's AI Foundry integration timeline.

What It Actually Does

KumoRFM-2 (github.com/kumo-ai/kumo-rfm) is a foundation model for relational data. You give it a database schema with multiple connected tables. You give it a prediction task (in natural language or structured format). It returns predictions without requiring any task-specific model training.

What distinguishes it from tabular models (XGBoost, TabNet, AutoGluon):

Capability	Traditional ML	Tabular FM	KumoRFM-2
Multi-table native	✗ (requires joins)	✗ (single table)	✓ (FK traversal)
Feature engineering	Required	Sometimes	Zero required
Zero-shot prediction	✗	✗	✓
Temporal consistency	Manual	Manual	Built-in
Scale (rows)	Any	Medium	500B rows
Cold start	Poor	Poor	Strong

Benchmark results vs supervised baselines (41 tasks, 4 benchmark suites):

Benchmark	KumoRFM-2 (few-shot)	Best Supervised	Delta
RelBenchV1 AUROC	79.60	78.06 (RelGNN)	+1.54
RelBenchV1 vs RFM-1	79.60	76.71 (RFM-1)	+2.89
SAP SALT MRR (fine-tuned)	0.89	0.79 (CARTE)	+13%
SAP SALT vs AutoGluon	0.89	0.77 (AutoGluon)	+15%
Overall vs supervised	up to +8%	baseline	documented

The Architecture, Unpacked

Focus on Scale 1's early task injection. In KumoRFM-1, the prediction target (e.g., "will this customer churn?") was only available to the model after intra-table processing. That meant every feature in every column contributed to row embeddings before the model knew which features mattered. Early task injection makes every attention computation aware of the target, dramatically improving performance on high-dimensional noisy tables (like the ratebeer benchmark) where most columns are irrelevant to any given prediction.

The Code, Annotated

Snippet One: KumoRFM-2 Prediction via the Kumo Platform API

# KumoRFM-2: enterprise prediction via the Kumo platform
# Source: kumo.ai/docs + kumo-ai/kumo-rfm (Apache 2.0)
# This is the zero-feature-engineering prediction pattern

import kumo

# ── CONNECT: point to your existing database ───────────────────────────────────
# KumoRFM-2 reads directly from Snowflake, Databricks, or any data warehouse
# No data export, no CSV uploads, no schema mapping by hand
# ← The database connector is what makes this "native relational":
#   It reads the actual FK structure from your warehouse's information_schema
client = kumo.connect(
    data_source="snowflake",
    connection_params={
        "account": "my-company.snowflakecomputing.com",
        "warehouse": "ANALYTICS_WH",
        "database": "PROD_DB",
    }
)

# ── SPECIFY TASK: natural language prediction goal ─────────────────────────────
# ← THIS is the key design: the task is specified as a prediction target
#   on a specific entity (user_id in users table) with a temporal label
#   KumoRFM-2 automatically:
#     1. Discovers which tables connect to the entity via FK traversal
#     2. Injects task info at Stage 1 (intra-table) to condition all processing
#     3. Traverses FK graph to pull signals from connected tables
#     4. No feature engineering, no join writing, no target variable generation

prediction_task = kumo.PredictionTask(
    entity_table="users",
    entity_key="user_id",
    target="will_churn",          # ← natural language target
    time_window_days=30,          # ← temporal consistency: only use data before T
    prediction_type="binary",     # binary, regression, or ranking
)

# ── PREDICT: zero-shot with in-context examples ───────────────────────────────
# ICL mode: provide labeled context examples, model generalizes without training
# ← Cross-sample attention (Scale 3) is what makes this work at 0.2% coverage
context_examples = client.sample_labeled_examples(
    task=prediction_task,
    n_examples=100,    # ← 100 labeled examples is sufficient for strong performance
    strategy="recent", # temporal: use most recent labeled examples as context
)

# ← THIS is the trick: no task-specific model is trained here
#   KumoRFM-2 uses the 100 context examples as in-context "training signal"
#   across 3 scales: intra-table, inter-table FK, cross-sample attention
results = client.predict(
    task=prediction_task,
    mode="in_context",            # ← zero-shot with context, no training
    context=context_examples,
    target_entities=["user_001", "user_002", "user_003"],
)

for r in results:
    print(f"User {r.entity_id}: churn probability = {r.score:.3f}, confidence = {r.confidence:.2f}")
# Output:
# User user_001: churn probability = 0.847, confidence = 0.91
# User user_002: churn probability = 0.123, confidence = 0.88
# User user_003: churn probability = 0.634, confidence = 0.79

# ── FINE-TUNE: when you have sufficient labeled data ──────────────────────────
# Fine-tuning gives further improvement beyond few-shot (documented in paper)
# This is KumoRFM-2's other mode: if you have 1000+ labeled examples
fine_tuned_model = client.fine_tune(
    task=prediction_task,
    training_data=client.get_labeled_data(task=prediction_task),
    # ← Fine-tuning on top of few-shot pre-training
    # SAP SALT: few-shot → 0.84 MRR, fine-tuned → 0.89 MRR
    epochs=10,
    base_model="kumorfm-2",
)
# Deploy fine-tuned model for production queries

The mode="in_context" with 100 context examples is the practical implementation of Scale 3's cross-sample attention. The model is performing few-shot learning at inference time, not at training time. The context examples are passed through the hierarchical attention layers alongside the target, and the cross-sample attention mechanism extracts the pattern shared by all positive/negative examples and applies it to the target row.

Snippet Two: Understanding the Four Pre-Training Axes

# KumoRFM-2 pre-training objective reconstruction
# Source: arXiv:2604.12596 Section 3 + Kumo technical blog
# Shows WHY the model generalizes across novel databases at inference time

import torch
import torch.nn as nn

class KumoRFM2PreTrainingObjective:
    """
    Four-axis pre-training that enables zero-shot relational prediction.
    
    The key insight: pre-training across four dimensions means the model sees
    patterns from both WITHIN tables and ACROSS tables during training.
    This is what allows it to generalize to unseen databases at inference time.
    """

    def row_axis_loss(self, table_embeddings, masked_row_idx):
        """
        Axis 1: Masked row reconstruction within a single table.
        ← Learns: "given all other rows in this table, reconstruct the masked row"
        Why: Captures distributional patterns within a table type
             (what values are typical for a 'customer' row vs a 'product' row)
        """
        context_rows = table_embeddings[[i for i in range(len(table_embeddings))
                                         if i != masked_row_idx]]
        target_row   = table_embeddings[masked_row_idx]
        # Predict the masked row from the context rows
        predicted    = self.row_decoder(context_rows.mean(dim=0))
        return nn.MSELoss()(predicted, target_row)

    def column_axis_loss(self, table_embeddings, masked_col_idx):
        """
        Axis 2: Masked column prediction within a single table.
        ← Learns: "given all other columns in this row, predict the masked column"
        Why: Captures feature co-occurrence patterns that transfer across schemas
             (e.g., 'age' and 'tenure' correlate in most customer tables)
        """
        context_cols = table_embeddings[:, [i for i in range(table_embeddings.shape[1])
                                             if i != masked_col_idx]]
        target_col   = table_embeddings[:, masked_col_idx]
        predicted    = self.col_decoder(context_cols)
        return nn.MSELoss()(predicted, target_col)

    def fk_axis_loss(self, anchor_row_emb, fk_connected_rows):
        """
        Axis 3: Predict values in FK-connected tables from anchor row.
        ← THIS is the critical axis: learns cross-table relational patterns
        Why: Pre-training on FK traversal means the model knows how to use
             relational structure at inference time on unseen databases
             
        Example: customer_row → [order1, order2, order3] (FK connection)
        Loss: can the model predict order attributes from customer attributes?
        """
        anchor_proj = self.fk_encoder(anchor_row_emb)
        fk_embeddings = self.table_encoder(fk_connected_rows)
        
        # ← Contrastive: anchor should be similar to its FK-connected rows
        #   and dissimilar to rows from other entities
        similarities = torch.cosine_similarity(
            anchor_proj.unsqueeze(0),
            fk_embeddings, dim=-1
        )
        return nn.BCEWithLogitsLoss()(similarities, torch.ones_like(similarities))

    def cross_sample_axis_loss(self, context_examples, query_example, label):
        """
        Axis 4: Few-shot generalization across context examples.
        ← Learns the in-context learning mechanism itself:
          "given N labeled examples, predict the query"
        
        Pre-training this axis on diverse tasks is what enables
        KumoRFM-2 to generalize to NEW tasks at inference time with
        as few as 0.2% of available training data as context.
        """
        # Context examples are all labeled (positive/negative for binary tasks)
        context_embs  = self.encoder(context_examples)
        query_emb     = self.encoder(query_example)
        
        # Cross-sample attention: attend from query to context examples
        # ← This is the Scale 3 attention in inference mode
        attended = self.cross_sample_attn(
            query=query_emb.unsqueeze(0),
            key=context_embs,
            value=context_embs,
        )
        prediction = self.classification_head(attended.squeeze(0))
        return nn.BCELoss()(prediction, label)


# ── THE COMBINED PRE-TRAINING LOSS ────────────────────────────────────────────
# All four axes trained simultaneously on synthetic + real-world databases
# ← Synthetic data generation simulates diverse relational schemas:
#   e-commerce, social graphs, financial transactions, medical records
# This diversity is what gives KumoRFM-2 zero-shot generalization

total_loss = (
    row_loss * 0.25 +
    col_loss * 0.25 +
    fk_loss  * 0.25 +
    icl_loss * 0.25
)
# ← Equal weighting (approximate): each axis contributes to generalization
#   Row + column axes: within-table representation
#   FK axis: cross-table relational structure
#   ICL axis: in-context learning mechanism

The fk_axis_loss contrastive objective is what separates KumoRFM-2 from all tabular foundation models. Tabular models (TabPFN, CARTE, AutoGluon) are pre-trained on individual tables. KumoRFM-2 is pre-trained on FK-connected table pairs, which means it has seen the pattern of "customer attributes predict order attributes" across thousands of synthetic databases. This prior knowledge is what enables zero-shot cross-table prediction on completely new schemas.

It In Action: End-to-End Worked Example

Task: Predict which customers will churn in the next 30 days for a SaaS company

Database schema (real-world scenario):

customers table:     user_id, signup_date, plan_type, country, age
subscriptions table: sub_id, user_id (FK), start_date, amount, status
events table:        event_id, user_id (FK), event_type, timestamp
support_tickets:     ticket_id, user_id (FK), created_at, severity, resolved

Step 1: Schema discovery (automatic)

KumoRFM-2 reads information_schema → detects FK links:
  subscriptions.user_id → customers.user_id
  events.user_id → customers.user_id
  support_tickets.user_id → customers.user_id

Temporal index detected: event timestamps, subscription dates
Context window: use data from T-90 days to T (configurable)

Step 2: Context examples (100 recently churned/retained users)

Labeled context:
  user_A: churned=True  → high support tickets, decreasing event frequency
  user_B: churned=False → low support tickets, stable event frequency
  ... (98 more labeled examples)

Step 3: Hierarchical attention execution

Scale 1 (Intra-table, task-conditioned):
  Prediction target "churned" injected into customers table
  Column attention: "plan_type" and "country" receive high weight for churn task
  Row attention: users with similar plan+age patterns cluster
  Irrelevant columns (e.g., internal UUIDs) dampened early
  ← This is the task-early injection innovation
  Cost: ~2ms per table, runs in parallel across all 4 tables

Scale 2 (FK graph traversal):
  Graph attention: customer → subscription events → support tickets
  Signal extracted: "frequent high-severity support tickets = churn signal"
  Signal extracted: "event frequency drop in last 14 days = churn signal"
  Cost: ~5ms for FK traversal across 4 tables

Scale 3 (Cross-sample):
  100 context examples → cross-sample attention over query user
  Model recognizes: "query user matches 78% of churned context pattern"
  Cost: ~3ms for cross-sample attention over 100 examples
  
Total inference: ~10ms per prediction batch
Processing capacity: 5 GB/s → ~50,000 users/second at full scale

Step 4: Output

Target users (500,000 customers):
  High churn risk (>0.7):     12,847 users
  Medium churn risk (0.4-0.7): 45,230 users
  Low churn risk (<0.4):      441,923 users

Top-3 signals for high-risk segment:
  1. Support tickets (severity=HIGH) in last 7 days: 3.2x average
  2. Event frequency drop: -62% vs prior 30-day baseline
  3. Subscription downgrade in last 30 days: 89% co-occurrence

vs. Traditional ML pipeline:
  Time to build: 8 weeks (ETL + feature engineering + model training)
  Time with KumoRFM-2: 2 hours (schema connection + context labeling)
  Accuracy comparison: KumoRFM-2 few-shot ≥ supervised ML on 41 benchmarks

Databricks customer outcome (documented): Conversion rates from leads to opportunities improved from 1.2x to 6x, and the volume of high-intent, quality leads entering the pipeline doubled.

Why This Design Works, and What It Trades Away

The early task injection is the architectural improvement that makes KumoRFM-2 qualitatively better than RFM-1. When task information arrives after intra-table processing, the lightweight Stage 1 network has already mixed relevant and irrelevant features into the row embeddings. These contaminated embeddings then propagate through FK graph attention, spreading noise across all connected tables. Early injection at Stage 1 stops noise at the source: the attention heads in Stage 1 learn to weight task-relevant features heavily and task-irrelevant features lightly, producing cleaner embeddings for every downstream stage.

The hierarchical architecture (lightweight Stage 1, larger Stage 2) is the correct efficiency tradeoff. Processing each table independently in Stage 1 is parallelizable: all tables run simultaneously on separate GPU workers. The larger graph-level model in Stage 2 handles the cross-table reasoning that requires seeing all tables' embeddings simultaneously. This staging avoids the quadratic attention cost of processing all tokens from all tables in one pass.

The four-axis pre-training is the design decision that enables zero-shot generalization. Cross-table patterns (FK axis), within-table patterns (row/column axes), and in-context generalization (ICL axis) are learned jointly. A model trained only on single-table data can never learn FK traversal signals. KumoRFM-2's pre-training corpus includes synthetic relational databases across e-commerce, social graphs, financial transactions, medical, academic, and ERP domains, which is what allows it to generalize to SAP SALT (ERP data it may not have seen at scale during pre-training) and achieve 0.89 MRR.

What KumoRFM-2 trades away:

Scale 1 context capacity. In-context learning with 0.2% coverage is impressive, but the quality gap between 0.2% and 5% coverage is real. Very small context sets produce noisier cross-sample attention. The performance improvements shown at larger context sizes are monotonic.

Fine-tuning is still better than few-shot. The paper explicitly shows fine-tuned KumoRFM-2 outperforms the few-shot version. If you have sufficient labeled data, the optimal deployment is fine-tuning, not pure in-context inference. The "zero training" claim is accurate for few-shot mode but the system still has a training-optional path.

Model opacity for auditable enterprise decisions. KumoRFM-2 produces predictions and feature importance scores, but the multi-scale hierarchical attention is not trivially interpretable. For regulated industries (credit scoring, medical risk), explainability requirements may need additional tooling on top of raw prediction scores.

The NVIDIA Acquisition Logic

The acquisition price (reported $400M+ vs ~$250M pre-acquisition valuation from PitchBook) represents a ~60% strategic premium. The strategic logic is specific and defensible from NVIDIA's position.

The enterprise data warehouse position. Snowflake and Databricks are the two dominant cloud data platforms. Kumo has native integrations with both and active customers (Databricks, Snowflake listed as partners). NVIDIA acquiring Kumo is NVIDIA acquiring the prediction layer sitting on top of those two warehouses. This is software real estate that no amount of GPU hardware could substitute.

The AI Foundry completion. NVIDIA's AI Foundry helps enterprises customize AI models for their specific data. Kumo's technology is the natural completion of this story: businesses can combine their private relational data with KumoRFM-2's foundational relational patterns to build prediction models that are specific to their schemas, without dedicated ML teams. The "help businesses combine private data with domain knowledge" framing from multiple acquisition reports is precisely this.

The talent moat. The three co-founders: Vanja Josifovski (former CTO of both Airbnb and Pinterest), Hema Raghavan (former Senior Director of Engineering at LinkedIn), and Jure Leskovec (Stanford professor, GNN pioneer whose research is the foundation of graph-based relational learning). This is one of the most technically credentialed founding teams in the enterprise ML space. All three transitioned to NVIDIA in the acquisition.

The $37M raised vs $400M+ acquisition. Kumo raised only $37 million (Sequoia Capital lead) across its entire history. The 10x+ exit multiple on invested capital, achieved without a large funding round, reflects a capital-efficient path to an acqui-hire-plus-technology deal. NVIDIA was not primarily buying revenue: it was buying a technical capability, a data warehouse integration position, and a founding team.

Technical Moats

The relational pre-training corpus. KumoRFM-2's zero-shot generalization comes from pre-training on diverse synthetic and real relational databases across multiple domains. Building and curating this corpus is the primary technical moat. Any competitor must assemble an equivalent corpus, which requires schema generation, synthetic data simulation across realistic business patterns, and temporal ordering preservation. This is not replicable from publicly available tabular benchmark datasets.

The FK-axis pre-training objective. The contrastive pre-training on FK-connected table pairs is what gives the model its cross-table reasoning capability. This requires a training infrastructure that processes graph-structured data (not just flat tables) at scale, with temporal consistency across FK relationships. This is a different engineering challenge from single-table foundation model pre-training.

Enterprise data warehouse integrations. Native Snowflake and Databricks connectors, built and validated with paying customers (DoorDash, Reddit, J Sainsbury, Databricks itself), represent accumulated integration debt. New competitors must rebuild these connectors and earn customer trust to test on production warehouses.

Insights

Insight One: KumoRFM-2 is the first few-shot foundation model to outperform fully supervised ML on standard benchmarks. This claim from the paper is the most precisely stated claim in the entire release: "To our knowledge, this is the first time a few-shot foundation model has been shown to surpass supervised approaches on common benchmark tasks." This is a stronger claim than "competitive with supervised models" or "comparable to supervised models." It is an empirical claim that the architecture is sufficiently general to predict better than a custom-trained model, using only a small fraction of the labeled data that the supervised model uses. NVIDIA acquiring the company five weeks after this paper appeared is not a coincidence.

Insight Two: KumoRFM-2 ships two modes, and they are not equivalent. Few-shot in-context inference requires zero training and zero feature engineering: connect your database, provide a small set of labeled examples, get predictions. Fine-tuning requires labeled data collection, training runs (documented at under two minutes per task on SAP SALT), and evaluation. The paper is explicit that fine-tuning improves performance further: on SAP SALT, fine-tuning adds approximately 16% over the few-shot base model and yields +10 MRR points over the strongest supervised baseline. The "zero training" claim is precisely accurate for the few-shot mode and does not apply to the fine-tuning mode. Neither claim is misleading on its own; the risk is conflating the two. For teams evaluating KumoRFM-2: few-shot is the correct entry point for exploration and cold-start scenarios where labeled data is scarce; fine-tuning is the documented path to highest accuracy when labeled data exists. Both modes outperform traditional ML pipelines that require months of feature engineering, but they solve different parts of the deployment spectrum.

Surprising Takeaway

KumoRFM-2 scales to 500 billion rows at 5 GB/s with 20 million lookups per second. This is not the architecture you would design for 500-billion-row inference if you were designing a system from scratch. It is the result of the FK-axis pre-training requiring the model to efficiently traverse graph-structured data at training time. The same infrastructure that processes FK relationships during pre-training is what enables the retrieval speed at inference time. The billion-scale deployment capability was an architectural consequence of how the model was trained, not a separate engineering effort. KumoRFM-1 was limited to in-memory datasets. KumoRFM-2 hits 500 billion rows. That is a five-order-of-magnitude jump in a single iteration, driven by the decision to train the model on FK traversal at scale from the beginning.

TL;DR For Engineers

KumoRFM-2 (arXiv:2604.12596, April 14 2026, CC BY 4.0) is a relational foundation model with hierarchical attention at three scales: intra-table (row + column attention), inter-table (FK graph attention), cross-sample (in-context few-shot). Context targets injected at Stage 1, making all processing task-conditioned from the start. First few-shot FM to outperform supervised ML on 41 benchmarks.
Results: RelBenchV1 AUROC 79.60 (+1.54 vs supervised RelGNN). SAP SALT MRR 0.89 fine-tuned (+13% vs best baseline). Scales to 500B rows, 5 GB/s, 20M lookups/sec.
Four pre-training axes: row, column (intra-table), FK traversal, cross-sample ICL. Synthetic + real-world corpus. ICL effective at 0.2% labeled data coverage.
NVIDIA acquired Kumo AI for $400M+ on June 4 2026. ~60% premium over ~$250M pre-acquisition valuation. Strategic rationale: prediction layer on Snowflake/Databricks, AI Foundry completion, co-founders include former CTOs of Airbnb and Pinterest and GNN pioneer Jure Leskovec. Customers: DoorDash, Reddit, Databricks, Snowflake, J Sainsbury.
Two deployment modes: few-shot in-context (zero training, ~10ms inference) and fine-tuned (training required, higher accuracy). Fine-tuning outperforms few-shot; both outperform traditional ML pipelines.

The Prediction Layer Is the Prize

NVIDIA's acquisition of Kumo at $400M+ is the market confirming that the most valuable unsolved problem in enterprise AI is not "how do we generate text from private data" but "how do we predict business outcomes from existing relational databases without months of ML pipeline work." KumoRFM-2 is the most technically mature answer to that question that exists today.

The architecture, pre-training at four relational axes with task-early injection, is not a minor improvement over prior work. Outperforming supervised models in few-shot mode, on 41 benchmarks, with 0.2% of labeled data coverage, on databases ranging from in-memory to 500 billion rows, is a qualitative capability shift. NVIDIA recognized it seven weeks after the paper appeared. The acquisition terms reflect how much of a head start Kumo had built.

References

KumoRFM-2: Scaling Foundation Models for Relational Learning, arXiv:2604.12596, Hudovernik et al., April 14 2026
kumo-ai/kumo-rfm GitHub
Kumo official KumoRFM-2 blog post
NVIDIA Acquires Kumo AI for $400M+, The Information, June 4 2026
KumoRFM-2 PR Newswire announcement
RelBench: A Benchmark for Deep Learning on Relational Databases, arXiv:2407.20060 — the RelBenchV1 benchmark used in KumoRFM-2 evaluations
Position Paper: Relational Deep Learning, ICML 2024 — the research foundation for KumoRFM-2's approach

KumoRFM-2 (arXiv:2604.12596, April 14 2026) is the first few-shot foundation model to outperform supervised ML on 41 relational benchmarks, using a Relational Graph Transformer with hierarchical attention at three scales (intra-table row/column, inter-table FK graph, cross-sample ICL) and a critical task-early injection design that conditions all processing on the prediction target from Stage 1. It scales to 500 billion rows at 5 GB/s, requires zero feature engineering in few-shot mode (effective at 0.2% labeled data coverage), and achieves 0.89 MRR on SAP SALT fine-tuned (+13% vs best baseline). NVIDIA acquired Kumo AI for $400M+ on June 4 2026, seven weeks after this paper appeared, to own the prediction layer running natively on Snowflake and Databricks and integrate it into the NVIDIA AI Foundry; founding team includes Vanja Josifovski (former CTO of Airbnb and Pinterest), Hema Raghavan (former Sr. Director Engineering at LinkedIn), and Jure Leskovec (Stanford, GNN pioneer).

Sponsored Ad

If you enjoy practical AI insights, check out SnackOnAI and support the newsletter by subscribing, sharing, and exploring our sponsored ad — it helps us keep building and delivering value 🚀

AI Agents Are Reading Your Docs. Are You Ready?

Last month, 48% of visitors to documentation sites across Mintlify were AI agents, not humans.

Claude Code, Cursor, and other coding agents are becoming the actual customers reading your docs. And they read everything.

This changes what good documentation means. Humans skim and forgive gaps. Agents methodically check every endpoint, read every guide, and compare you against alternatives with zero fatigue.

Your docs aren't just helping users anymore. They're your product's first interview with the machines deciding whether to recommend you.

That means: clear schema markup so agents can parse your content, real benchmarks instead of marketing fluff, open endpoints agents can actually test, and honest comparisons that emphasize strengths without hype.

Mintlify powers documentation for over 20,000 companies, reaching 100M+ people every year. We just raised a $45M Series B led by @a16z and @SalesforceVC to build the knowledge layer for the agent era.

Make Your Docs Agent-Ready