💎 Transparent Pricing

Pay only for what you use.
Route smarter, spend less.

No hidden fees. No vendor lock-in. NexToken routes your requests to the optimal provider — automatically.

Monthly
Annual Save 20%
−96%
nex-pro vs GPT-4o
−40%
vs OpenAI Embeddings
+20%
Flat routing margin (3rd-party)
12+
Providers on one API key
Developer
$0
Free forever. Start building immediately.
Start Free
$5 one-time credit · no card needed
100 RPM rate limit
3 API keys
Standard routing
REST API access
Custom routing rules
SLA guarantee
Priority support
Pro
$29/mo
For indie developers and growing projects.
Start Pro
$10 monthly credit included
200 RPM rate limit
20 API keys
Smart routing + nex-auto
Prompt-cache savings auto-applied
Streaming + batch + vision
Usage analytics
SLA guarantee
Enterprise
Custom
Negotiated pricing for large-scale deployments.
Contact Sales
Unlimited credits (prepaid)
Custom RPM limits
Dedicated routing cluster
Volume discounts negotiable
99.99% uptime SLA (contracted, with credits)
Dedicated Slack channel
Custom invoicing / PO
Reserved throughput (custom RPM floor)
Fine-tune endpoint + DPA
On-premise option
Platform fee + pass-through model costs. Subscription fees cover your NexToken platform access, rate limits, and features. Model usage is billed separately at cost-plus-markup rates shown in the table below. All prices in USD. Singapore users are charged 9% GST on platform fees per IRAS regulations.
🚀 NEW · MAY 2026

23 upgrades shipped — same API, lower bills.

Every chat completion now passes upstream prompt-cache savings through automatically. Near-duplicate prompts bill at 5% of normal retail via our self-hosted semantic cache. Switch model: "nex-auto" and let the gateway pick the right tier per prompt. No client code changes required.

Cost
Prompt-cache pass-through · semantic cache 95% off · nex-auto smart router · batch endpoint 30% off · tokenize + estimate-cost
Compliance
Content moderation · PII redaction (PDPA / GDPR) · prompt-injection defence · context-window pre-flight
Reliability
Redis-backed circuit breaker · in-provider retry + backoff · region-aware routing · multi-key load balance · live P95 scoring
Reach
DALL-E 3 · Whisper · TTS · vision token math · Cohere · Perplexity · xAI Grok · prompt templates · fine-tune API
All additive. Zero breaking changes. View API docs →

NexToken vs. Official Provider Prices

Transparent breakdown of what you pay via NexToken versus going direct. NexToken native models beat the market — third-party access adds a flat 20% routing margin that covers infrastructure, compliance, and reliability.

⭐ NexToken Native

Market-beating prices on nex-series models

−96%
nex-pro vs GPT-4o
$0.12 vs $2.50 per 1M input tokens
−20%
nex-pro vs GPT-4o-mini
$0.12 vs $0.15 per 1M input tokens
−40%
nex-embed-zh vs text-embedding-3-small
$0.012 vs $0.020 per 1M tokens
Self-hosted Singapore GPU · OpenAI SDK drop-in · Strong Chinese & English · PDPA compliant · Zero external API dependency
⏳ 采购谈判中 · Procurement In Progress

Target: below-market prices + routing

Third-party model prices currently reflect pre-contract spot rates — no bulk procurement discounts are yet applied. Once wholesale contracts are finalised:
−10%+
NexToken target: below official prices
Buy at <80% of market → sell at <96% of market (still includes 20% routing margin)
  • Procurement at <80% of official rates currently under negotiation
  • Routing, fallback, APAC edge, PDPA compliance included in margin
  • Prompt-cache savings from providers passed through automatically
  • Unified billing — one wallet, one invoice for all providers
nex-pro & nex-embed-zh already beat the market today — self-hosted Singapore GPU, no procurement deal needed.
Model NexToken (You Pay) input / output per 1M Official Direct vs. Direct
⭐ NexToken Native Models
nex-pro 32k ctx $0.12 / $0.48 vs GPT-4o-mini: $0.15 / $0.60OpenAI.com −20% input
nex-embed-zh 512 ctx $0.012 / — text-embedding-3-small: $0.020OpenAI.com −40%
OpenAI — via NexToken vs. platform.openai.com
gpt-4o $3.00 / $12.00 $2.50 / $10.00platform.openai.com Trial Rate
gpt-4o-mini $0.18 / $0.72 $0.15 / $0.60platform.openai.com Trial Rate
gpt-4.1 $2.40 / $9.60 $2.00 / $8.00platform.openai.com Trial Rate
o3-mini / o4-mini $1.32 / $5.28 $1.10 / $4.40platform.openai.com Trial Rate
Anthropic — via NexToken vs. anthropic.com/pricing
claude-sonnet-4-6 $3.60 / $18.00 $3.00 / $15.00anthropic.com/pricing Trial Rate
claude-haiku-4-5 $0.96 / $4.80 $0.80 / $4.00anthropic.com/pricing Trial Rate
Google DeepMind — via NexToken vs. ai.google.dev
gemini-2.5-pro $1.50 / $12.00 $1.25 / $10.00ai.google.dev Trial Rate
gemini-2.5-flash $0.18 / $4.20 $0.15 / $3.50ai.google.dev Trial Rate
DeepSeek — via NexToken vs. platform.deepseek.com
deepseek-v3 $0.324 / $1.32 $0.27 / $1.10platform.deepseek.com Trial Rate
deepseek-r1 $0.66 / $2.64 $0.55 / $2.20platform.deepseek.com Trial Rate
Official prices are provider list prices as of June 2026. Third-party model prices shown are current trial rates (pre-contract spot pricing). NexToken is actively negotiating wholesale procurement contracts targeting <80% of official rates — upon finalisation, NexToken prices will be below official market prices. NexToken native models (nex-pro, nex-embed-zh) already beat market rates. Pro (−1%), Business (−2.5%), and Enterprise (up to −15%) tiers reduce the effective markup further.

Model Token Pricing

Prices shown are what you pay at Developer tier. Prices per 1,000,000 tokens (1M). Input / Output billed separately.

🔄 试用价格
价格谈判中,敬请期待更大折扣 · Price negotiation in progress 第三方模型当前价格为现货试用费率,尚未包含批量采购折扣。NexToken 正积极与各大提供商进行批量采购协议谈判,目标采购价格低于官网公开价格的 80%。协议生效后,NexToken 路由价格将低于官网直接购买价格,同时涵盖路由、故障切换、APAC 优化及合规保障。
Third-party models currently reflect pre-contract spot rates. Procurement contracts targeting <80% of market rates are under active negotiation — final prices will undercut direct provider pricing.
✓ nex-pro 和 nex-embed-zh 已低于市场价格 — 新加坡 GPU 自托管,无需等待采购协议生效。
NexToken Native Models Singapore GPU
Built for Asia-Pacific developers. Self-hosted on NexToken's Singapore GPU. OpenAI SDK drop-in, PDPA compliant, APAC-optimised latency. Single stable API endpoint regardless of upstream changes.
ModelNexTokenin/out per 1MComparable Providerin/out per 1MBest ForCapabilities
nex-pro32k ★ Default $0.12 / $0.48 vs GPT-4o-mini$0.15 / $0.60 The default Nex model. Chat, code, content, summarisation. Self-hosted Singapore GPU. Strong Chinese + English. ~96% cheaper than GPT-4o stream tools
nex-embed-zh512 $0.012 vs text-embedding-3-small$0.020 Chinese-strong embeddings, 1024-dim. BGE-M3, self-hosted Singapore. ~40% cheaper than text-embedding-3-small /v1/embeddings
Quick pick
💬 Chat, code, anything → nex-pro
🇨🇳 Chinese embeddings → nex-embed-zh
Aliases nex-smart and nex-coder transparently route to nex-pro — no migration needed.
OpenAI
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
gpt-4o128k$3.00 / $12.00$2.50 / $10.00128Kstream vision tools
gpt-4o-mini128k$0.18 / $0.72$0.15 / $0.60128Kstream vision tools
gpt-4.11M$2.40 / $9.60$2.00 / $8.001Mstream vision tools
gpt-4.1-mini1M$0.48 / $1.92$0.40 / $1.601Mstream tools
gpt-4.1-nano1M$0.12 / $0.48$0.10 / $0.401Mstream tools
o3200k$12.00 / $48.00$10.00 / $40.00200Kstream
o3-mini200k$1.32 / $5.28$1.10 / $4.40200Kstream
o4-mini200k$1.32 / $5.28$1.10 / $4.40200Kstream
Anthropic
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
claude-opus-4-6200k$18.00 / $90.00$15.00 / $75.00200Kstream vision tools
claude-sonnet-4-6200k$3.60 / $18.00$3.00 / $15.00200Kstream vision tools
claude-haiku-4-5-20251001200k$0.96 / $4.80$0.80 / $4.00200Kstream vision tools
Google DeepMind
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
gemini-2.5-pro1M$1.50 / $12.00$1.25 / $10.001Mstream vision tools
gemini-2.5-flash1M$0.18 / $4.20$0.15 / $3.501Mstream vision tools
DeepSeek
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
deepseek-v3128k$0.324 / $1.32$0.27 / $1.10128Kstream tools
deepseek-r1128k$0.66 / $2.64$0.55 / $2.20128Kstream
Meta Llama
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
llama-4-scout512k$0.18 / $0.72$0.15 / $0.60512Kstream tools
llama-4-maverick256k$0.60 / $2.40$0.50 / $2.00256Kstream tools
llama-3.3-70b131k$0.708 / $0.948$0.59 / $0.79131Kstream tools
Mistral AI
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
mistral-large-latest128k$2.40 / $7.20$2.00 / $6.00128Kstream tools
mistral-medium-latest128k$0.96 / $2.88$0.80 / $2.40128Kstream tools
mistral-small-latest128k$0.24 / $0.72$0.20 / $0.60128Kstream tools
codestral-latest256k$0.36 / $1.08$0.30 / $0.90256Kstream
mixtral-8x7b-3276832k$0.288 / $0.288$0.24 / $0.2432Kstream
Qwen / Alibaba
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
qwen-3.5-397b131k$1.08 / $1.08$0.90 / $0.90131Kstream
qwen-max32k$1.92 / $7.68$1.60 / $6.4032Kstream tools
qwen-plus128k$0.48 / $1.44$0.40 / $1.20128Kstream tools
qwen-turbo128k$0.24 / $0.72$0.20 / $0.60128Kstream tools
qwen-long10M$0.60 / $2.40$0.50 / $2.0010Mstream
ZhiPu GLM
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
glm-4-plus128k$0.84 / $0.84$0.70 / $0.70128Kstream tools
glm-4128k$0.84 / $0.84$0.70 / $0.70128Kstream
glm-4-air128k$0.168 / $0.168$0.14 / $0.14128Kstream
glm-4-flash128k$0.084 / $0.084$0.07 / $0.07128Kstream
Cohere
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
command-r-plus128k$3.00 / $12.00$2.50 / $10.00128Kstream tools
command-r128k$0.18 / $0.72$0.15 / $0.60128Kstream tools
Perplexity
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
sonar128k$0.12 / $0.12$0.10 / $0.10128Kstream
sonar-pro200k$3.60 / $18.00$3.00 / $15.00200Kstream
xAI Grok
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
grok-3131k$3.60 / $18.00$3.00 / $15.00131Kstream tools
grok-3-mini131k$0.36 / $0.60$0.30 / $0.50131Kstream tools
Google Gemma (via Groq)
ModelNexTokenTrial · in/out per 1MOfficial Directin/out per 1MContextCapabilities
gemma2-9b-it8k$0.24 / $0.24$0.20 / $0.208Kstream
Pricing note: All prices are retail at Developer tier (wholesale × 1.20). Pro, Business, and Enterprise tiers receive volume discounts of 1%, 2.5%, and up to 15% — see Volume Billing Tiers below. Third-party model prices are current trial rates pending procurement contracts. Prompt-cache discounts from OpenAI, Anthropic, DeepSeek, and Google pass through to your wallet automatically.

Volume Billing Tiers

The higher your monthly token spend, the lower your effective markup. Tiers reset on the 1st of each month.

TierMonthly Token SpendMarkup RateEffective SavingUnlocks
Developer$0 – $500Standard3 keys, 20 RPM
Pro$500 – $5,000−1%Up to $50/mo200 RPM, analytics
Business$5,000 – $50,000−2.5%Up to $1,250/moCustom routing, SLA
Enterprise$50,000+NegotiatedUp to 15%+Dedicated cluster, custom terms

* Billing tiers are independent from Loyalty tiers. Billing tiers reflect monthly spend volume; Loyalty tiers reflect cumulative top-up history.

Loyalty Tiers

Cumulative top-up milestones unlock permanent wallet bonuses. Tiers do not reset — they track your total lifetime top-up.

🥉
Bronze
≥ $500 cumulative top-up
Bonus credits on every top-up
+3%
🥈
Silver
≥ $2,000 cumulative top-up
Bonus credits + priority routing queue
+5%
🥇
Gold
≥ $10,000 cumulative top-up
Bonus credits + dedicated routing + SLA
+8%
💎
Platinum
≥ $50,000 cumulative top-up
Maximum bonus + enterprise SLA + custom terms
+12%

Loyalty bonus credits are applied to your wallet at time of top-up. Example: Gold user tops up $1,000 → receives $1,080 wallet balance (+8% bonus). Bonuses do not stack with promotional codes.

Add-Ons

Optional extras available on Pro and Business plans.

🔒
Extended Audit Logs
$19 / month
Retain full request/response logs for 365 days. Export to S3 or GCS. Required for SOC 2 audits.
📊
Advanced Analytics
$29 / month
Cost attribution by team, project, or custom labels. CSV export, Grafana-compatible metrics endpoint.
🚨
Spend Alerts & PagerDuty
$9 / month
Multi-channel alerts (Slack, email, SMS, PagerDuty) with configurable thresholds and escalation policies.
🌐
Dedicated IP Egress
$49 / month
Route all traffic through a static IP pool for firewall whitelisting. Required for some financial and healthcare orgs.
🤝
Shared Slack Support
$99 / month
Join a shared Slack Connect channel with the Nex engineering team. <4h response time, Mon–Fri 9–6 SGT.
Higher Rate Limits
From $49 / month
Burst capacity packs: 5k, 20k, or 50k additional RPM. Stacks on top of your base plan limit.

Full Feature Comparison

Detailed breakdown of what's included in each plan.

FeatureDeveloperPro BusinessEnterprise
Limits & Access
Monthly free credits$5 one-time$10 incl.Custom
Requests per minute (RPM)20200Custom
API keys320Unlimited
Sub-keys per key3Unlimited
Context window supportUp to 128kUp to 200kUp to 1M+
Routing & Intelligence
Smart auto-routing
Provider fallback
Custom routing rules
Dedicated routing cluster
Streaming (SSE)
Budget & Controls
Per-key budget limits
Auto top-up
Spend alertsEmail only
Multi-wallet (teams)
Observability
Request logs retention7 days30 days365 days
Usage analytics dashboardBasicStandardAdvanced + export
Cost attribution labels
SLA & Support
Uptime target / SLA99.99% SLA (contracted)
Support channelCommunityEmailDedicated Slack
Response timeBest effort<48h<2h
Custom invoicing / PO
🇸🇬

Singapore GST Notice (9%)

Cete Ventures Pte. Ltd. (UEN: 202421160G) is GST-registered in Singapore. Platform subscription fees are subject to 9% GST for Singapore-based customers. Model usage credits for SG GST-registered businesses with a valid GST number may qualify for input tax claim — contact us to provide your registration number. Non-SG customers are invoiced without GST. A valid GST invoice is issued for every transaction.

Frequently Asked Questions

Do I pay provider API costs separately?
No. NexToken handles all provider API relationships on your behalf. You top up your NexToken wallet and we pay the providers. Your wallet is debited at our cost-plus-markup rates shown in the pricing table above. You never need to sign up with OpenAI, Anthropic, or Google directly.
What happens if my wallet balance hits zero?
API calls will return HTTP 402 (Payment Required) immediately. No debt is accumulated — NexToken operates on a strict prepaid model. You can enable auto top-up on Pro and Business plans to avoid interruptions. Existing in-flight streaming requests will complete (up to 120 seconds) before being terminated.
How does smart routing decide which provider to use?
NexToken's routing engine evaluates real-time provider health scores, current latency, your requested model, and your configured routing preferences. On Business plans, you can pin specific providers per API key or set custom fallback chains. The router runs sub-5ms decisions before proxying your request.
Can I get a refund on unused wallet credits?
Yes. Unused top-up credits (excluding free promotional credits) are refundable within 30 days of the top-up transaction. Refunds are processed back to your original payment method within 5–10 business days. Loyalty bonus credits are non-refundable. See our full Refund & Cancellation Policy.
What's the difference between Billing Tiers and Loyalty Tiers?
Billing Tiers reflect your monthly token spend volume and reduce your per-token markup — they reset on the 1st of each month. Loyalty Tiers reflect your cumulative total top-up history and add bonus credits to your wallet when you top up — they never reset. The two systems are completely independent.
Is there a free trial for paid plans?
Every new account gets $5 in one-time free credits — no credit card required. Simply sign up and start making API calls immediately. Pro and Business plans offer a 14-day money-back guarantee on the subscription fee. Enterprise plans are negotiated individually.
Do prices include streaming responses?
Yes. Streaming (SSE) is supported at no additional cost. Token pricing is identical for streaming and non-streaming requests. Token counts are calculated using the provider's reported usage field where available; otherwise NexToken uses tiktoken-based estimation.

Start routing smarter today

Join hundreds of developers and teams using NexToken to reduce LLM costs and improve reliability.