Local AI · Model Database
Local AI models at a glance
All relevant open-source models you can run locally with Ollama, LM Studio, or llama.cpp – with memory requirements, context window, license, and German quality. The memory filter shows which models fit your hardware. Whether running locally is worth it at all is covered in the in-depth guide.
Whether a model then runs fast enough depends on the memory bandwidth of your device, not just on whether it fits into memory. The hardware calculator works that out for you.
| License | ||||||||
|---|---|---|---|---|---|---|---|---|
| 96/100 | 1,000B · 32B activeMoE | ~560 GB | 256K | good | Modified MIT | 1462 | $0.67 / $3.50/1M | |
| 94/100 | 744B · 40B activeMoE | ~410 GB | 200K | good | MIT | 1457 | $0.60 / $1.92/1M | |
| 93/100 | 284B · 13B activeMoE | ~155 GB | 1M | good | MIT | 1434 | $0.09 / $0.18/1M | |
| 92/100 | 235B · 22B activeMoE | ~140 GB | 32K | strong | Apache 2.0 | 1375 | $0.45 / $1.82/1M | |
| 80/100 | 80B · 3B activeMoE | ~52 GB | 256K | good | Apache 2.0 | — | $0.11 / $0.80/1M | |
| 78/100 | 32B | ~21 GB | 32K | strong | Apache 2.0 | 1347 | $0.08 / $0.28/1M | |
| 77/100 | 31B | ~20 GB | 256K | strong | Apache 2.0 | 1451 | — | |
| 72/100 | 30B · 3B activeMoE | ~20 GB | 32K | strong | Apache 2.0 | 1327 | $0.12 / $0.50/1M | |
| 70/100 | 24B | ~16 GB | 128K | strong | Apache 2.0 | 1303 | $0.35 / $0.55/1M | |
| 64/100 | 14B | ~11 GB | 32K | strong | Apache 2.0 | — | $0.10 / $0.24/1M | |
| 62/100 | 12B | ~9.5 GB | 256K | strong | Apache 2.0 | — | — | |
| 60/100 | 14B | ~11 GB | 16K | solid | MIT | 1256 | $0.07 / $0.14/1M | |
| 54/100 | 8B | ~7.5 GB | 256K | good | Apache 2.0 | — | $0.08 / $0.50/1M | |
| 52/100 | 8B | ~7 GB | 32K | strong | Apache 2.0 | — | $0.05 / $0.40/1M | |
| 38/100 | 4B | ~4 GB | 32K | good | Apache 2.0 | — | — | |
| 24/100 | 1.7B | ~2.5 GB | 32K | good | Apache 2.0 | — | — |
LMArena shows the current ELO rating from the public LMArena chat leaderboard (as of 06/10), not the agent, code, or image arena. Cloud price is the cheapest provider price per 1M tokens (input / output) according to OpenRouter. Both values are live and refreshed daily; “—” means the model is not (yet) listed there. All other columns are curated reference values.
How the numbers come together
Transparency first: these figures are carefully estimated reference values for real-world use, not lab benchmarks. As of mid-2026.
Memory requirement
The memory value applies to the Q4_K_M quantization (the default in Ollama and LM Studio) including a small context window. Rule of thumb: around 0.55 GB per billion parameters plus some overhead.
Mixture-of-Experts
For MoE models, the total parameter count determines the memory requirement (all experts sit in RAM), while only the active parameters determine the speed. Hence the speed advantage at the same size.
Strength score
The strength is a rounded reference value (0–100) that increases with model size. It helps when comparing within this list, but it is not an official benchmark score.
Note on the context window: the Qwen3 text models from 4B upwards run natively with 32K tokens and can be extended to up to 128K via YaRN (the 1.7B model stays at 32K). The table shows the native value.
Frequently asked questions
- Which local AI model is best for German?
- For German-language tasks, the Gemma models are considered particularly strong, followed by Qwen3 and Mistral. The table shows the German quality per model as an assessment.
- What does the context window mean?
- The context window is the amount of text, in tokens, that a model can process at once. 32K is enough for most tasks; long documents benefit from 128K or more. More context costs additional memory.
- Can these models be used commercially?
- That depends on the license (see the “License” column). Apache 2.0 and MIT allow free commercial use; licenses such as the Llama Community License or the Gemma Terms come with restrictions you should verify against the original license when in doubt.
- What is an MoE model?
- In a Mixture-of-Experts model, all parameters sit in memory, but only a fraction of them (the active parameters) compute per token. This makes it as fast as a much smaller model while requiring the memory of the large one.
From model to production system
Run your model locally and Corporate LLM turns it into a production system: RAG, agent system, skills, and connectors. 100% GDPR-compliant.