Logo
About Us
Sponsor Us
Github Repo
Search
Log In
Subscribe
Logo
Search
Oliver Buchannon
Mohinish S

Serverless Ventures | Cloud, Data & Distributed Systems | Angel & Advisor | Infra & Data Startups

SnackOnAI Blog

Sashiko: The AI That Reviews Linux Kernel Code Better Than Most Humans (And Everyone Knows It)

Jun 9, 2026

•

8 min read

Sashiko: The AI That Reviews Linux Kernel Code Better Than Most Humans (And Everyone Knows It)

When AI Catches the Bugs That 100% of Human Reviewers Missed, the Question Isn't Whether to Use It. It's Whether You Can Afford Not To.

Mohinish S
Mohinish S

SnackOnAI Blog

MOSS-TTS: Why the Audio Tokenizer Is the Entire Stack

Jun 8, 2026

•

12 min read

MOSS-TTS: Why the Audio Tokenizer Is the Entire Stack

Every component in the MOSS-TTS family, the flagship TTS, the spoken dialogue model, the voice generator, the sound effects model, the realtime streamer, sits on top of one shared foundation: MOSS-Audio-Tokenizer, a 1.6-billion-parameter pure Transformer audio tokenizer trained on 3 million hours of audio.

Mohinish S
Mohinish S
KumoRFM-2: The Foundation Model That Made NVIDIA Pay $400M to Own the Enterprise Prediction Layer

Jun 7, 2026

•

13 min read

KumoRFM-2: The Foundation Model That Made NVIDIA Pay $400M to Own the Enterprise Prediction Layer

NVIDIA acquired Kumo AI for over $400 million on June 4 2026. The acquisition was not about chips or inference hardware. It was about a specific technical bet: that the most valuable layer in enterprise AI is not the model that generates text, but the model that predicts outcomes directly from business databases, without feature engineering, without a data science team, and without months of ML pipeline work.

Mohinish S
Mohinish S

SnackOnAI Blog

rmux: Playwright for Terminals, Written in Rust

Jun 7, 2026

•

11 min read

rmux: Playwright for Terminals, Written in Rust

Every AI agent that needs to drive a CLI or TUI application has the same problem: there is no reliable, typed API for terminal interaction.

Mohinish S
Mohinish S

SnackOnAI Blog

vLLM Semantic Router: The Infrastructure Layer That Decides Which Model Should Handle Your Request Before the Model Sees It

Jun 6, 2026

•

12 min read

vLLM Semantic Router: The Infrastructure Layer That Decides Which Model Should Handle Your Request Before the Model Sees It

The hard problem in multi-model LLM deployments is not having good models. It is routing every request to the right model, at inference time, under simultaneous constraints on cost, privacy, latency, and safety, without building a custom decision system for each deployment scenario. vLLM Semantic Router (arXiv:2603.04444, vllm-project/semantic-router, 4.3k stars) solves this with composable signal orchestration: extract heterogeneous signals from the request, compose them through Boolean rules into deployment-specific decisions, execute through plugin chains. The same architecture expresses a cost-optimized deployment and a privacy-regulated enterprise deployment as different signal-decision configurations, without code changes.

Mohinish S
Mohinish S

SnackOnAI Blog

Gemma 4 QAT: How Google Trained the Quantization Into the Model Instead of Bolting It On After

Jun 5, 2026

•

13 min read

Gemma 4 QAT: How Google Trained the Quantization Into the Model Instead of Bolting It On After

Quantization-Aware Training (QAT) is not a compression technique applied after the model is done. It is a training technique that makes the model learn to be quantizable. Gemma 4's QAT models, released June 5, 2026, demonstrate why this distinction matters: the 12B QAT at Q4_0 scores 67.07% on MMLU versus the BF16 baseline's 67.15%, a gap of 0.08%. Standard post-training quantization of the same model drops 2-4 points. The difference is architectural, not cosmetic.

Mohinish S
Mohinish S

SnackOnAI Blog

Vibe Code Bench: The Benchmark That Finally Asks If AI Can Build Software, Not Just Write Code

Jun 4, 2026

•

9 min read

Vibe Code Bench: The Benchmark That Finally Asks If AI Can Build Software, Not Just Write Code

Vibe Code Bench (VCB) asks exactly that question.

Mohinish S
Mohinish S

SnackOnAI Blog

Multica: The Managed Agents Platform That Runs Code on Your Machine, Not Theirs

Jun 3, 2026

•

11 min read

Multica: The Managed Agents Platform That Runs Code on Your Machine, Not Theirs

Multica (MIT, 19.1k stars, multica-ai/multica) is a task collaboration platform where humans and AI agents work in the same workspace. Assign an issue to an agent, @mention it in a comment, start a chat, or schedule a recurring autopilot. The agents execute on your machine via a local daemon, not on Multica's servers. Your API keys, code directories, and toolchain never leave your infrastructure.

Mohinish S
Mohinish S

SnackOnAI Blog

Feynman: The AI Research Agent That Verifies Before It Summarizes

Jun 2, 2026

•

11 min read

Feynman: The AI Research Agent That Verifies Before It Summarizes

Every AI research tool today rushes to produce a summary. Feynman (companion-inc/feynman, MIT, 7k stars, April 2026) is built on the opposite philosophy: verify first, summarize second. It dispatches four specialized sub-agents in parallel (Researcher, Reviewer, Writer, Verifier), grounds every claim to a direct URL, and produces a structured research brief with live citation verification. The architecture is grounded in the source, not in the model's training data.

Mohinish S
Mohinish S

SnackOnAI Blog

gstack: Why the Y Combinator CEO Turned His Claude Code Setup Into a Software Factory With 23 Specialist Roles

Jun 1, 2026

•

12 min read

gstack: Why the Y Combinator CEO Turned His Claude Code Setup Into a Software Factory With 23 Specialist Roles

gstack (MIT, 105k stars, March 2026) is Garry Tan's published Claude Code configuration: 23 opinionated slash commands that assign specialist roles (CEO, Eng Manager, Designer, QA Lead, Security Officer, Release Manager, Doc Engineer) to Claude, cycling through a fixed Think → Plan → Build → Review → Test → Ship → Reflect loop. The design thesis is that Claude performs better with role identity and process structure than with free-form prompting, and the self-reported numbers are specific enough to be interesting: 600,000 lines of production code in 60 days.

Mohinish S
Mohinish S

SnackOnAI Blog

Open-Generative-AI: The Free AI Studio That Is Not Actually Running Models on Your Machine

May 31, 2026

•

11 min read

Open-Generative-AI: The Free AI Studio That Is Not Actually Running Models on Your Machine

Open-Generative-AI (MIT, 17.5k stars, trending April 2026) is billed as a self-hosted, uncensored alternative to Higgsfield, Freepik, and Krea. It is genuinely useful. It is also an API aggregator with a polished Next.js frontend, not a local inference stack. Understanding exactly what runs where, what "free" means, and what the MuAPI dependency implies for production use is the analysis most coverage skips.

Mohinish S
Mohinish S

SnackOnAI Blog

ADI Reasoning: The Symbolic Scaffold That Forces LLMs to Separate Hypothesis Generation From Verification

May 30, 2026

•

14 min read

ADI Reasoning: The Symbolic Scaffold That Forces LLMs to Separate Hypothesis Generation From Verification

Chain-of-thought prompting lets LLMs perform abduction, deduction, and induction simultaneously in a single autoregressive pass, with no separation and no accountability for which mode is active at any step. The ADI Protocol formalizes Peirce's tripartite inference as an explicit scaffold, enforces consistency through five algebraic invariants (the Gamma Quintet), and uses the Weakest Link bound to ensure no conclusion can exceed the reliability of its least-supported premise.

Mohinish S
Mohinish S

SnackOnAI Blog

JEPA: Why Predicting in Pixel Space Was the Wrong Goal All Along

May 29, 2026

•

13 min read

JEPA: Why Predicting in Pixel Space Was the Wrong Goal All Along

Self-supervised learning has been dominated by two ideas: reconstruct masked pixels (MAE), or force representations of different views to be similar (DINO, BYOL, SimCLR). JEPA (Joint-Embedding Predictive Architecture) rejects both. It predicts abstract representations of masked regions, not pixels. This single architectural choice produces richer semantic features with 10x less compute than MAE and zero hand-crafted augmentations. Yann LeCun has been arguing for this design for decades. The empirical results are now here.

Mohinish S
Mohinish S

SnackOnAI Blog

TurboQuant: The Quantization Algorithm That Actually Proves Its Distortion Rate Is Near-Optimal

May 28, 2026

•

13 min read

TurboQuant: The Quantization Algorithm That Actually Proves Its Distortion Rate Is Near-Optimal

Every quantization method claims minimal quality loss. TurboQuant (Google Research, ICLR 2026) is among the first to prove it: the distortion rate is within a constant factor of the information-theoretic lower bound. The proof comes with a two-stage algorithm that works online, requires zero per-vector quantization overhead, and directly addresses the KV cache memory bottleneck that limits long-context LLM inference.

Mohinish S
Mohinish S

SnackOnAI Blog

MiniMax M2.7: The Model That Ran Its Own RL Experiments and Got 30% Better Without a Human Touching the Code

May 27, 2026

•

13 min read

MiniMax M2.7: The Model That Ran Its Own RL Experiments and Got 30% Better Without a Human Touching the Code

MiniMax M2.7 is not a model that was trained by engineers. It is a model that participated in training itself. An internal version of M2.7 ran over 100 autonomous rounds of scaffold optimization, evaluated its own outputs, decided which changes to keep, and achieved a 30% performance improvement on internal benchmarks. This is not a demo. It is the production pipeline that built the model you can use today.

Mohinish S
Mohinish S

SnackOnAI Blog

RelBench v2: Four New Databases and What They Reveal About Where Relational Deep Learning Breaks Down

May 26, 2026

•

9 min read

RelBench v2: Four New Databases and What They Reveal About Where Relational Deep Learning Breaks Down

RelBench v2 does not just add more databases. It adds databases specifically chosen to stress-test relational deep learning in domains where the pkey-fkey graph hypothesis is hardest to satisfy: high-cardinality sparse interactions, long-tail distributions, and temporal dynamics that defeat simple neighborhood aggregation. The leaderboard results on the four new databases tell a more honest story than the headline benchmark numbers.

Mohinish S
Mohinish S

SnackOnAI Blog

RelBench v1: The Benchmark That Forced Honest Evaluation on Relational Deep Learning

May 25, 2026

•

9 min read

RelBench v1: The Benchmark That Forced Honest Evaluation on Relational Deep Learning

Every published result on relational database ML before RelBench was incomparable: different temporal splits, different leakage handling, different metrics. RelBench v1 fixed all three simultaneously by making correct temporal evaluation the default behavior, not the careful choice. The benchmark is the infrastructure. The databases are the test suite. The enforced defaults are the contribution.

Mohinish S
Mohinish S

SnackOnAI Blog

FST: The Dual-Engine Training Method That Reaches Peak Performance With Three Times Fewer Steps

May 24, 2026

•

12 min read

FST: The Dual-Engine Training Method That Reaches Peak Performance With Three Times Fewer Steps

Reinforcement learning trains LLMs by updating parameters. Prompt optimization adapts LLMs by updating context. Everyone picks one. Fast-Slow Training (FST) runs both simultaneously, treating the prompt as fast weights that absorb task-specific information and the parameters as slow weights that preserve general reasoning, reaching higher performance in fewer steps while maintaining the model's ability to keep learning.

Mohinish S
Mohinish S

SnackOnAI Blog

R2Code: Why Your LLM Knows What Code to Write But Not Which Requirement It Satisfies

May 23, 2026

•

13 min read

R2Code: Why Your LLM Knows What Code to Write But Not Which Requirement It Satisfies

LLM-generated code has a traceability problem. The model produces code that works, but cannot reliably tell you which requirement each function implements, which requirement has no code at all, and which code has no requirement to justify it. R2Code is the self-reflective framework that closes this gap with an iterative generate-verify-reflect loop and outperforms prior approaches on precision, recall, and F1 across standard benchmark datasets.

Mohinish S
Mohinish S

SnackOnAI Blog

Smolagents: The Agent Framework That Proves JSON Tool Calling Was the Wrong Abstraction All Along

May 22, 2026

•

11 min read

Smolagents: The Agent Framework That Proves JSON Tool Calling Was the Wrong Abstraction All Along

Every major AI framework ships agents that describe tool calls as JSON objects. Smolagents ships agents that write Python. This is not a superficial difference. Python is a better language for expressing actions than JSON, and the research agrees. Smolagents is the framework that takes this seriously, keeps the entire implementation under ~1,000 lines, and benchmarks the result.

Mohinish S
Mohinish S

SnackOnAI Blog

Kunlun: Why Meta's Ads Models Are Wasting 83% of Their GPU, and How They Fixed It

May 19, 2026

•

13 min read

Kunlun: Why Meta's Ads Models Are Wasting 83% of Their GPU, and How They Fixed It

Recommendation system models at Meta achieve 3-15% Model FLOPs Utilization on the same NVIDIA B200 GPUs where LLMs achieve 40-60%. This is not a scaling problem. It is an efficiency problem. Kunlun is the architecture that fixes it, raising MFU from 17% to 37%, doubling scaling efficiency, and establishing predictable power-law scaling for one of the most economically important ML workloads on the planet.

Mohinish S
Mohinish S

SnackOnAI Blog

RepForge: The Tool That Watches Claude Code Build Your App and Quizzes You on the CS You Missed

May 18, 2026

•

14 min read

RepForge: The Tool That Watches Claude Code Build Your App and Quizzes You on the CS You Missed

Every developer using AI coding agents has the same silent problem: the code ships, the PR merges, and the developer has no idea why the implementation works. RepForge is the tool that sits beside Claude Code and Codex, extracts the computer science concepts from each session, and turns them into spaced repetition review challenges before you forget them. The learning happens. You just have to show up for the review.

Mohinish S
Mohinish S

SnackOnAI Blog

BountyBench: The First Cybersecurity Benchmark That Measures Dollar Impact, Not Just Success Rate

May 17, 2026

•

13 min read

BountyBench: The First Cybersecurity Benchmark That Measures Dollar Impact, Not Just Success Rate

Bug bounties pay real money for real vulnerabilities. BountyBench is the first cybersecurity AI benchmark that inherits this economic framing: every task has a dollar value attached, every success rate maps to a bounty total, and the headline result is not "67.5% exploit rate" but "$14,422 in defended patches." The unit of measurement is the correct one.

Mohinish S
Mohinish S

SnackOnAI Blog

CyberGym: The Benchmark Where AI Agents Try to Break Real Software, and Mostly Fail

May 16, 2026

•

12 min read

CyberGym: The Benchmark Where AI Agents Try to Break Real Software, and Mostly Fail

The best AI agent on CyberGym, a benchmark of 1,507 real-world vulnerabilities from production software, achieves a 22% success rate. That number has two implications: AI agents are already capable enough to reproduce one in five real vulnerabilities autonomously, and four in five vulnerabilities remain beyond their reach. CyberGym is the first benchmark large enough and realistic enough to make both implications defensible.

Mohinish S
Mohinish S

SnackOnAI Blog

LTX-2: The First Open-Weights Model That Generates Video and Audio in One Pass

May 15, 2026

•

13 min read

LTX-2: The First Open-Weights Model That Generates Video and Audio in One Pass

Every text-to-video model released before LTX-2 generates silent video. The audio you see in demos is added afterward by a separate model or manually. LTX-2 (Lightricks, arXiv:2601.03233, January 6, 2026) generates synchronized video and audio jointly in a single diffusion pass. The architecture required to make that work, an asymmetric dual-stream transformer with 14B video parameters and 5B audio parameters, is the story.

Mohinish S
Mohinish S
Load more

Quick Links

Subscription

Search

Socials

© 2026 Snack On AI.
Report abusePrivacy policyTerms of use
beehiivPowered by beehiiv