Introducing Superserve

February 6, 2026 • By Pavitra Bhalla

Today we're excited to announce Superserve, a production hosting platform for AI agents.

We started as RayAI, focused on compute orchestration for agent workloads using Ray. Over the past couple of months, we learned a few important lessons.

Multi-agent systems break on coordination, not inference. Past five agents, the dominant cost is agents restating and reconciling each other's outputs. The pattern that consistently wins is a stateful orchestrator spawning stateless workers (subagents or tasks).
Code-executing agents are becoming the foundation for domain-specific AI applications: customer support, financial analysis, browser automation, autonomous coding. Anthropic open-sourced quickstart templates for these use cases.
The Claude Agent SDK gives agents built-in tools for files, commands, and code editing. Building the agent is now the easy part. Deploying it in production is not.

These lessons redefined our focus. We also heard from The Linux Foundation about the Ray trademark, so we picked a new name. The old one described our infrastructure. The new one describes what we do: serve agents in production. We're now Superserve.

The state of agents in production

Building a capable agent is no longer the hard part. The Claude Agent SDK, OpenAI Agents SDK, and LangGraph give you tool use, reasoning, and autonomy out of the box. An afternoon of work produces an agent that researches topics, analyzes documents, or automates a workflow.

Deploying that agent is the hard part.

Traditional compute was built for stateless, short-lived, deterministic workloads. Ship a container, route a request, return a response. The entire stack, orchestration, scheduling, scaling, optimizes for running the same code the same way every time.

Agents don't work like that, and the infrastructure they need doesn't exist yet. They run for minutes or hours, not milliseconds. They make unpredictable numbers of LLM calls that can silently drain budgets. They need filesystems to read from and write to, isolated per user. They need to be restarted without losing progress, scaled without sharing state, and debugged without reconstructing causal history across multiple processes.

We've seen what happens when teams force agents into traditional infrastructure. A fraud detection pipeline with four agents looked clean in staging: median latency under 420ms, alerts made sense, ops was happy. In production, agents started sharing context to "improve accuracy." Within weeks, nearly half the token spend went to agents summarizing and reconciling outputs for each other. The p99 blew past the SLA. GPU utilization was fine. The bottleneck was everything between the agents.

The orchestrator-plus-workers pattern from lesson one needs infrastructure purpose-built for coordination. Code-executing agents from lesson two need sandboxing so one customer's data doesn't leak to another. The built-in tools from the Claude Agent SDK need an isolated filesystem per session. Teams building agent products today spend months solving these problems before they can focus on what their agent actually does.

Superserve provides this infrastructure as a platform.

What Superserve does

Isolated execution. Each user interaction runs in its own sandbox with a dedicated filesystem. Agents can read, write, and execute code freely without affecting other users. We use Firecracker microVMs for hardware-level isolation between sessions.

Cost controls. Agents can be expensive. Superserve tracks token usage in real-time and enforces budgets you set. Sessions that exceed their limit terminate gracefully rather than silently draining your API account.

Streaming. Agents produce output incrementally - reasoning steps, tool calls, intermediate results, files. Superserve streams these to your application as they happen, so you can build interfaces that show users what the agent is doing.

Batch processing. Run an agent across hundreds or thousands of inputs with managed concurrency, automatic retries, and result collection.

Tool hosting. Deploy your own MCP servers alongside your agents. Connect agents to your databases, internal APIs, or custom tools - Superserve handles the routing and scaling.

Who this is for

Superserve is built for teams turning agents into products.

Consider the use cases that Anthropic's Cowork enables for individuals: sales research and deal preparation, financial analysis and modeling, legal document review, customer support triage, marketing content generation, data analysis and visualization. These are valuable for individual productivity. They're transformative when built into products that serve thousands of customers.

Our early customers are building exactly these products. Legal tech companies offering contract review for law firms. Fintech startups providing automated financial analysis. Support platforms with AI-powered ticket resolution. Sales tools that research prospects and prepare briefings.

These teams face the same infrastructure challenges. They need to run agents for many concurrent users with strong isolation. They need to control costs per customer. They need visibility into what's happening across thousands of sessions. They need to scale without rebuilding their deployment architecture.

Superserve is the platform where these products run.

Built on Ray

Superserve runs on Ray. The API layer uses Ray Serve for autoscaling and request routing. Batch workloads fan out as Ray tasks with managed concurrency. Tool servers deploy as independent Ray Serve applications that scale based on demand.

We chose this architecture because the pattern that works at scale is a control plane that owns task decomposition, retries, and validation, with stateless workers that execute once, emit an artifact, and terminate. Ray gives us the scheduling, deployment graphs, and resource-aware placement for that orchestration layer. But agents themselves need hard isolation with dedicated filesystems, which Ray isn't designed to provide. We use Firecracker sandboxes for that layer.

As workloads grow, this architecture scales where it matters. More users means more sandbox capacity. Heavy tool usage scales the relevant servers. GPU-intensive tools get Ray's batching and placement optimization. The deployment handles this automatically.

Getting started

Superserve currently supports agents built with the Claude Agent SDK. Support for OpenAI Agents SDK is coming.

The developer experience is straightforward: point Superserve at your agent code, deploy, and you get an endpoint with sandboxed execution, cost tracking, and streaming built in.


bash
superserve deploy my-agent

Your application creates sessions through the API or SDK, sends input, and streams output to your users. Superserve handles the infrastructure underneath.

We're working on open sourcing the CLI and SDK. For now, we're opening signups for early access. If you're building agent products and want to spend your time on the agent rather than the deployment infrastructure, we'd like to work with you.

Sign up for early access at superserve.ai

We're especially interested in teams already running agents in production. What infrastructure problems are you solving yourself? What's holding you back from shipping faster? Reach out at hello@superserve.ai - we're building this with early customers, and your input shapes the platform.