vLLM Semantic Router: The Infrastructure Layer That Decides Which Model Should Handle Your Request Before the Model Sees It
The hard problem in multi-model LLM deployments is not having good models. It is routing every request to the right model, at inference time, under simultaneous constraints on cost, privacy, latency, and safety, without building a custom decision system for each deployment scenario. vLLM Semantic Router (arXiv:2603.04444, vllm-project/semantic-router, 4.3k stars) solves this with composable signal orchestration: extract heterogeneous signals from the request, compose them through Boolean rules into deployment-specific decisions, execute through plugin chains. The same architecture expresses a cost-optimized deployment and a privacy-regulated enterprise deployment as different signal-decision configurations, without code changes.