Groq
Groq is known for its custom LPU inference chip, offering the fastest AI inference in the industry. Free API supports Llama 3.3 70B, Llama 4 Scout/Maverick, Mixtral, Gemma 2, DeepSeek R1 distilled, and more. Llama 3.3 70B at 6000 tokens/min completely free, several times faster than GPU solutions. API keys start with gsk_, OpenAI-compatible format, switch with one line of code. Ideal for ultra-fast inference: real-time chat, code completion, streaming output.
🎁 Free Tier
Daily Limit: 6000 tokens/min (Llama 3.3 70B)
| Model | Context | Limit | Notes |
|---|---|---|---|
| Llama 3.3 70B Versatile | 128k | 30 RPM / 6000 TPM | World's fastest inference, 6000 tokens/min free, LPU chip accelerated |
| Llama 4 Scout 17B | 128k | 30 RPM / 6000 TPM | Meta Llama 4 Scout, MoE architecture, free to use |
| Llama 4 Maverick 17B | 128k | 30 RPM / 6000 TPM | Meta Llama 4 Maverick, MoE architecture, free to use |
| Mixtral 8x7B | 32k | 30 RPM / 5000 TPM | MoE architecture, cost-effective |
| Gemma 2 9B | 8k | 30 RPM / 15000 TPM | Google Gemma 2, ultra-fast small model |
| DeepSeek R1 Distill Llama 70B | 128k | 30 RPM / 6000 TPM | DeepSeek R1 distilled, strong reasoning |
🔑 Free API
Free Credits: Free tier(永久免费)
Rate Limit: 30 RPM / 6000 TPM
Free API powered by custom LPU (Language Processing Unit) chip, 10x+ faster than GPU. API keys start with gsk_. OpenAI-compatible format. Free tier has rate limits but no total cap, very generous for personal development.