Question Intent Page · Updated 2026-05-11

What is the cheapest LLM API right now?

Short answer

For most text and coding workloads, start with DeepSeek or Qwen for low unit cost, SiliconFlow for China-hosted open models, and OpenRouter only when routing convenience beats the markup. Self-hosting is usually cheaper only at sustained high utilization.

cheapest LLM APIlow cost AI APIDeepSeek API pricingQwen API pricing

Conclusion

  • Lowest practical paid path: DeepSeek or Qwen when their model quality is enough for the task.
  • Cheapest China-direct open-model path: SiliconFlow free/small hosted models.
  • Cheapest multi-provider testing: OpenRouter free or low-cost models, but expect aggregator markup.
  • Self-hosting wins only if GPU utilization is high and ops time is not your bottleneck.

What to do next

  1. Estimate monthly input and output tokens; output tokens usually dominate cost.
  2. Classify workload: chat, coding, summarization, agents, embeddings, or long context.
  3. Test two cheap models and one premium fallback on the same evaluation prompts.
  4. Use caching, shorter prompts, and smaller models before switching providers.
  5. Set a monthly budget alert and log cost per successful task, not only cost per token.

Recommended paths

Provider Free / credits Best for
DeepSeek $5 signup + low per-token pricing Coding, agents, general text
Qwen 70M signup tokens Chinese, coding, long context
SiliconFlow Free small models + ¥14 credit China-direct open models
OpenRouter Free models + many paid routes Model shopping and fallback routing
Groq Free developer limits Low-latency open models

Global developer checklist

  • Confirm whether signup, billing, and API keys work from your country before writing production code.
  • Prefer OpenAI-compatible endpoints when you may need to switch models, regions, or providers later.
  • Test free credits with a real smoke prompt and record latency, error shape, streaming behavior, and quota burn.
  • Keep at least one fallback route for provider outages, model deprecations, and regional access changes.

Production handoff

Need one bill and one compatible endpoint?

Use OpenLLMAPI when the engineering cost of juggling provider keys is higher than a small routing layer.

Compare with OpenLLMAPI →

FAQ

Is the cheapest model always the best choice?

No. Measure cost per successful task. A model that needs retries, longer prompts, or human correction can be more expensive than a stronger model.

When is local LLM cheaper than an API?

Usually only when you keep the GPU busy for many hours per day or already own the hardware. For sporadic workloads, hosted APIs are usually cheaper.

How do I reduce LLM API cost without changing providers?

Trim prompts, cache repeated context, use batch jobs where available, route easy tasks to small models, and cap max output tokens.

Should I use an aggregator for cheapest pricing?

Aggregators are great for testing and fallback, but often add markup. For stable high-volume traffic, compare direct provider pricing.

🎁 Free Resource Pack

Get the Free AI Startup Toolkit

Free API credits list, AI business case studies, payment stack, risk checklist, and a monetization roadmap.

Get it free →
🐑 小羊助手