Which local AI model is best for German?

For German-language tasks, the Gemma models are considered particularly strong, followed by Qwen3 and Mistral. The table shows the German quality per model as an assessment.

What does the context window mean?

The context window is the amount of text, in tokens, that a model can process at once. 32K is enough for most tasks; long documents benefit from 128K or more. More context costs additional memory.

Can these models be used commercially?

That depends on the license (see the “License” column). Apache 2.0 and MIT allow free commercial use; licenses such as the Llama Community License or the Gemma Terms come with restrictions you should verify against the original license when in doubt.

What is an MoE model?

In a Mixture-of-Experts model, all parameters sit in memory, but only a fraction of them (the active parameters) compute per token. This makes it as fast as a much smaller model while requiring the memory of the large one.

Local AI · Model Database

Local AI models at a glance

All relevant open-source models you can run locally with Ollama, LM Studio, or llama.cpp – with memory requirements, context window, license, and German quality. The memory filter shows which models fit your hardware. Whether running locally is worth it at all is covered in the in-depth guide.

Whether a model then runs fast enough depends on the memory bandwidth of your device, not just on whether it fits into memory. The hardware calculator works that out for you.

FamilyGerman qualityLicense

Fits in

16 of 25 models

						License
Kimi K2.6	96/100	1,000B · 32B activeMoE	~560 GB	256K	good	Modified MIT	1462	$0.67 / $3.50/1M
GLM-5	94/100	744B · 40B activeMoE	~410 GB	200K	good	MIT	1457	$0.60 / $1.92/1M
DeepSeek V4-Flash	93/100	284B · 13B activeMoE	~155 GB	1M	good	MIT	1434	$0.09 / $0.18/1M
Qwen3 235B-A22B	92/100	235B · 22B activeMoE	~140 GB	32K	strong	Apache 2.0	1375	$0.45 / $1.82/1M
Qwen3-Coder-Next	80/100	80B · 3B activeMoE	~52 GB	256K	good	Apache 2.0	—	$0.11 / $0.80/1M
Qwen3 32B	78/100	32B	~21 GB	32K	strong	Apache 2.0	1347	$0.08 / $0.28/1M
Gemma 4 31B	77/100	31B	~20 GB	256K	strong	Apache 2.0	1451	—
Qwen3 30B-A3B	72/100	30B · 3B activeMoE	~20 GB	32K	strong	Apache 2.0	1327	$0.12 / $0.50/1M
Mistral Small 3.1 24B	70/100	24B	~16 GB	128K	strong	Apache 2.0	1303	$0.35 / $0.55/1M
Qwen3 14B	64/100	14B	~11 GB	32K	strong	Apache 2.0	—	$0.10 / $0.24/1M
Gemma 4 12B	62/100	12B	~9.5 GB	256K	strong	Apache 2.0	—	—
Phi-4 14B	60/100	14B	~11 GB	16K	solid	MIT	1256	$0.07 / $0.14/1M
Qwen3-VL 8B	54/100	8B	~7.5 GB	256K	good	Apache 2.0	—	$0.08 / $0.50/1M
Qwen3 8B	52/100	8B	~7 GB	32K	strong	Apache 2.0	—	$0.05 / $0.40/1M
Qwen3 4B	38/100	4B	~4 GB	32K	good	Apache 2.0	—	—
Qwen3 1.7B	24/100	1.7B	~2.5 GB	32K	good	Apache 2.0	—	—

LMArena shows the current ELO rating from the public LMArena chat leaderboard (as of 06/10), not the agent, code, or image arena. Cloud price is the cheapest provider price per 1M tokens (input / output) according to OpenRouter. Both values are live and refreshed daily; “—” means the model is not (yet) listed there. All other columns are curated reference values.

How the numbers come together

Transparency first: these figures are carefully estimated reference values for real-world use, not lab benchmarks. As of mid-2026.

Memory requirement

The memory value applies to the Q4_K_M quantization (the default in Ollama and LM Studio) including a small context window. Rule of thumb: around 0.55 GB per billion parameters plus some overhead.

Mixture-of-Experts

For MoE models, the total parameter count determines the memory requirement (all experts sit in RAM), while only the active parameters determine the speed. Hence the speed advantage at the same size.

Strength score

The strength is a rounded reference value (0–100) that increases with model size. It helps when comparing within this list, but it is not an official benchmark score.

Note on the context window: the Qwen3 text models from 4B upwards run natively with 32K tokens and can be extended to up to 128K via YaRN (the 1.7B model stays at 32K). The table shows the native value.

Frequently asked questions

Which local AI model is best for German?: For German-language tasks, the Gemma models are considered particularly strong, followed by Qwen3 and Mistral. The table shows the German quality per model as an assessment.
What does the context window mean?: The context window is the amount of text, in tokens, that a model can process at once. 32K is enough for most tasks; long documents benefit from 128K or more. More context costs additional memory.
Can these models be used commercially?: That depends on the license (see the “License” column). Apache 2.0 and MIT allow free commercial use; licenses such as the Llama Community License or the Gemma Terms come with restrictions you should verify against the original license when in doubt.
What is an MoE model?: In a Mixture-of-Experts model, all parameters sit in memory, but only a fraction of them (the active parameters) compute per token. This makes it as fast as a much smaller model while requiring the memory of the large one.

From model to production system

Run your model locally and Corporate LLM turns it into a production system: RAG, agent system, skills, and connectors. 100% GDPR-compliant.

Start for free Back to the calculator