Conclusion
- Lowest practical paid path: DeepSeek or Qwen when their model quality is enough for the task.
- Cheapest China-direct open-model path: SiliconFlow free/small hosted models.
- Cheapest multi-provider testing: OpenRouter free or low-cost models, but expect aggregator markup.
- Self-hosting wins only if GPU utilization is high and ops time is not your bottleneck.
What to do next
- Estimate monthly input and output tokens; output tokens usually dominate cost.
- Classify workload: chat, coding, summarization, agents, embeddings, or long context.
- Test two cheap models and one premium fallback on the same evaluation prompts.
- Use caching, shorter prompts, and smaller models before switching providers.
- Set a monthly budget alert and log cost per successful task, not only cost per token.
Recommended paths
| Provider | Free / credits | Best for |
|---|---|---|
| DeepSeek | $5 signup + low per-token pricing | Coding, agents, general text |
| Qwen | 70M signup tokens | Chinese, coding, long context |
| SiliconFlow | Free small models + ¥14 credit | China-direct open models |
| OpenRouter | Free models + many paid routes | Model shopping and fallback routing |
| Groq | Free developer limits | Low-latency open models |
Global developer checklist
- Confirm whether signup, billing, and API keys work from your country before writing production code.
- Prefer OpenAI-compatible endpoints when you may need to switch models, regions, or providers later.
- Test free credits with a real smoke prompt and record latency, error shape, streaming behavior, and quota burn.
- Keep at least one fallback route for provider outages, model deprecations, and regional access changes.
Production handoff
Need one bill and one compatible endpoint?
Use OpenLLMAPI when the engineering cost of juggling provider keys is higher than a small routing layer.
FAQ
Is the cheapest model always the best choice?
No. Measure cost per successful task. A model that needs retries, longer prompts, or human correction can be more expensive than a stronger model.
When is local LLM cheaper than an API?
Usually only when you keep the GPU busy for many hours per day or already own the hardware. For sporadic workloads, hosted APIs are usually cheaper.
How do I reduce LLM API cost without changing providers?
Trim prompts, cache repeated context, use batch jobs where available, route easy tasks to small models, and cap max output tokens.
Should I use an aggregator for cheapest pricing?
Aggregators are great for testing and fallback, but often add markup. For stable high-volume traffic, compare direct provider pricing.