Which Model Should You Actually Run?
Two years ago the answer to "which local model should I run" was Llama 2. Then it was Mistral. Then Llama 3. Then Qwen2.5. Now it's Qwen3 - and half the old benchmarks are useless because the models they tested don't belong on anyone's shortlist anymore.
The good news: the options in 2026 are genuinely excellent. The bad news: there are too many of them and people are still recommending models that got lapped six months ago. This page cuts through it. No benchmark soup, no 47-model comparison matrix. Just what to pull based on what you have and what you're trying to do.
How This Page Is Organized
Five Models Worth Knowing
You could spend three days reading benchmarks. Or you could pull one of these five and actually get something done. These are the ones that matter right now - tested, opinionated, and ranked.
ollama pull qwen3:8b
ollama pull llama3.2
ollama pull deepseek-r1:7b
ollama pull qwen2.5-coder:7b
ollama pull qwen3:14b
The Full Picture
Everything side by side. Speed numbers are measured on an 8-core CPU with 16GB RAM running the default quantization (q4_K_M). Your results will vary, but the proportions hold.
Measured on 8-core CPU / 16 GB RAM. GPU will be 3-10x faster. Apple Silicon M-series closer to the top end of each range.
Full Specs Table
| Model | Disk | RAM Used | Speed (CPU) | Context | Best For |
|---|---|---|---|---|---|
| llama3.2:1b | 1.3 GB | 1.5 GB | 35-50 t/s | 128K | |
| llama3.2:3b | 2 GB | 3 GB | 18-28 t/s | 128K | |
| qwen3:4b | 2.6 GB | 4 GB | 16-24 t/s | 128K | |
| qwen3:8bTop Pick | 5.2 GB | 7 GB | 8-14 t/s | 128K | |
| deepseek-r1:7bReasoning | 4.7 GB | 6 GB | 8-14 t/s | 128K | |
| qwen2.5-coder:7bCode | 4.7 GB | 6 GB | 8-14 t/s | 128K | |
| gemma3:12b | 8.1 GB | 10 GB | 5-9 t/s | 128K | |
| qwen3:14b | 9.3 GB | 12 GB | 4-7 t/s | 128K | |
| phi4 | 9.1 GB | 12 GB | 4-7 t/s | 16K | |
| deepseek-r1:14bReasoning | 9.0 GB | 12 GB | 4-7 t/s | 128K | |
| qwen3:32b | 20 GB | 24 GB | 2-4 t/s | 128K |
Green rows = runs on 8 GB RAM (though tight for qwen3:8b - 16 GB is more comfortable). Amber rows = 16 GB RAM recommended. Purple rows = 32 GB. Context = max tokens the model can hold in a single conversation.
What Fits What You Have
RAM is the main constraint for CPU-only inference. Here's exactly what runs comfortably at each tier - not what technically loads, but what you'd actually want to use.
ollama ps to see what's currently
loaded and how much VRAM/RAM it's using.
Don't Have Enough RAM?
Two options: upgrade or go smaller on the model. Laptop RAM is often upgradeable - jumping from 16GB to 32GB is the single highest-impact change you can make for local LLM performance, and it's usually not expensive. If you're on a mini PC, check if your SODIMM slots are accessible before buying new hardware.
Match the Model to the Job
The "best" model depends on what you're actually doing. A reasoning model is overkill for casual chat. A general model is the wrong tool for math problems. Here's the quick guide to matching them up.
Pull Commands - Copy and Go
If you have Ollama installed, these are the commands. Pick your tier, paste the ones
you want, and let them download. Models go to ~/.ollama/models/ by default.
The Essentials (start here)
# The one everyone should have
ollama pull qwen3:8b
# Fast model for low-spec hardware or quick responses
ollama pull llama3.2
# Reasoning - use for math, logic, complex problems
ollama pull deepseek-r1:7b
# Coding - better than general models for programming
ollama pull qwen2.5-coder:7b
If You Have 16 GB+ RAM
# Quality step-up from 8b - noticeable improvement
ollama pull qwen3:14b
# Best reasoning model at this tier
ollama pull deepseek-r1:14b
# Microsoft's model - great at structured output
ollama pull phi4
Useful Management Commands
# See what you have downloaded
ollama list
# See what's currently loaded in memory
ollama ps
# Remove a model you're not using (frees disk space)
ollama rm llama3.2:1b
# Update a model to the latest version
ollama pull qwen3:8b # just pull again - it updates automatically
# Run a model interactively (type /bye to exit)
ollama run qwen3:8b
Gear That Helps
Models take up real disk space. qwen3:8b is 5.2GB, deepseek-r1:7b is 4.7GB - pull a handful of models and you're looking at 25-40GB easily. A dedicated external SSD keeps your boot drive clean and gives you a portable model library.
View Samsung T7 Shield 2TB on Amazon →
Browse 32GB SODIMM kits on Amazon →
Ryzen 7 6800H Mini PC (32GB) on Amazon →
Affiliate links - costs you nothing extra, helps keep these tutorials free and updated.
ollama list to keep track,
and remove anything you're not using to keep disk space manageable.
Ready to put these models to work? Ollama Advanced covers performance tuning, running multiple models, and connecting them to OpenClaw.