Sonnet-class
164K context · 143 tok/s
$0.60 in / $0.90 out per 1M tokens
[ MWS API ]
One API. Both Anthropic and OpenAI shapes. Drop-in: change your baseURLand your existing SDK works unchanged. Powered by frontier open-weight models — so we're still cheaper than the providers we replace.
Already using Anthropic or OpenAI? Add a baseURL override and an MWS API key. Your existing tool calls, streaming, and message shapes work unchanged.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "https://api.mws.run/v1",
apiKey: process.env.MWS_API_KEY,
});
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
});Two profiles are live today: Sonnet-class (default) — a drop-in for claude-sonnet-*model strings at much lower cost — and Pro (Think) for extended reasoning with a 1M-token context window. Pass any Claude or OpenAI model name and we'll route to the right one.
164K context · 143 tok/s
$0.60 in / $0.90 out per 1M tokens
1048K context · 178 tok/s
$3.48 in / $6.96 out per 1M tokens
Your rate limit grows automatically as your account spends and as you upgrade plans. There's no support ticket for higher throughput, and creating extra API keys doesn't increase your capacity — every key under your account draws from the same pool. Use additional keys to separate environments and rotate credentials, not to scale.
Every response carries the six standard x-ratelimit-* headers (compatible with OpenAI SDK retry logic), so your client can self-regulate without guessing or burning a request to discover the ceiling.
Pay-as-you-go. Buy credits in dollar amounts ($25 minimum). Credits are denominated in tokens at the cheapest profile's rate, so you always get at least the implied capacity. We charge via Stripe.
Sonnet-class (default): ~143 tok/s, sub-second TTFT in normal conditions. Pro (Think): ~178 tok/s with extended reasoning. Each profile's measured TPS is shown in the Models section above.
Yes — that's the whole point. We translate Anthropic's Messages format (system, content blocks, tool_use, tool_result) to the underlying OpenAI-shape upstream. Streaming, tool round-trips, and stop reasons round-trip correctly.
Currently not supported — the underlying providers don't expose an Anthropic-equivalent caching API. cache_control blocks are silently ignored. Even without caching the cost difference vs Sonnet is large enough that most workloads come out ahead.
Credits stay valid for 12 months from purchase. After that they may be cleaned up. We'll email you before that happens.
Unused credits within 30 days of purchase: yes. Consumed tokens: no — we've already paid the upstream for them. Email support@vellora.ai if you have a billing dispute.
The same gateway has been in production powering our IDE for months. Public API access is in early access — expect occasional incidents and the odd rough edge during the first weeks. Email support@vellora.ai if anything breaks.
No. Rate limits are per-account, not per-key, and they scale automatically based on your plan and cumulative spend. Extra keys are useful for separating environments (prod / staging / sandbox) and revoking access independently — they don't add capacity.