Qwen3's Thinking Mode: When to Use It and How to Turn It Off
Pull qwen3:8b and start using it for everything, and at some point you notice it feels slow on questions that should be simple. You ask it what 47 times 13 is and instead of answering in a second, it pauses for five. That's not a problem with the model - that's thinking mode doing exactly what it's supposed to do. The question is whether you actually want it to.
What thinking mode is
Qwen3 includes a hybrid reasoning capability. When thinking mode is active, the model works
through the problem internally before producing its final answer - similar in concept to how
DeepSeek-R1 shows its chain of thought. In Ollama's API, this shows up as text inside
<think>...</think> tags that precedes the actual response content.
When you run ollama run qwen3:8b interactively, Ollama collapses these thinking
tokens so you see only the final answer - but they're still being generated and still taking
time. Hit the Ollama API directly and you'll see the full output including the reasoning chain.
For hard problems this is genuinely useful. The model that reasoned through a math problem step by step before answering gets the right answer more often than the one that guessed immediately. But for "what's the capital of France" that reasoning chain is waste - the model still generates it, it just isn't helping.
How to toggle it
Qwen3 supports per-message control with two tokens you append to your prompt:
# Disable thinking for this message
ollama run qwen3:8b "What is the capital of France? /no_think"
# Force thinking on for this message
ollama run qwen3:8b "Prove that sqrt(2) is irrational. /think"
/no_think skips the reasoning chain entirely. The model answers directly
from its training knowledge, which is fast and usually right for factual or conversational
queries. /think forces it on even if thinking is currently disabled by default.
In the Ollama API you pass these the same way - just append to the message content string. In Open WebUI, type them at the end of your message in the chat box.
Thinking tokens eat your context window
This is the part most people don't think about. The <think> block is
generated tokens, and they count against your num_ctx setting just like
everything else in the conversation.
The Ollama Advanced guide recommends setting
num_ctx to 4096 for speed on most setups. On a complex question, qwen3's
reasoning chain can run 500-1500 tokens on its own. That's a meaningful chunk of a 4096
context budget, leaving less room for conversation history and the actual answer.
If you're using thinking mode regularly for hard tasks, bump num_ctx to 8192
or higher so the reasoning chain doesn't crowd out the rest of the context. Note that
8192 context with thinking enabled is noticeably more RAM-hungry than the default - if
your machine feels sluggish at this setting, the mini PC guide covers
hardware that handles qwen3:8b at full context without thermal throttling:
ollama run qwen3:8b --num_ctx 8192
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:8b",
"messages": [{"role": "user", "content": "Review this algorithm for efficiency... /think"}],
"options": {"num_ctx": 8192},
"stream": false
}'
When to use each mode
Use thinking mode for:
- Math problems and proofs
- Multi-step logic and reasoning
- Complex code review and debugging
- Planning tasks with multiple constraints
- Questions where being wrong has a cost
Skip thinking mode for:
- Simple factual questions
- Casual chat and brainstorming
- Creative writing and summarization
- Translation
- Anything where a 5-second delay is annoying and accuracy isn't critical
Setting a default in Open WebUI
If you want thinking off by default for everyday use, set a system prompt in Open WebUI that includes the preference. Go to Settings > System Prompt and add something like:
You are a helpful assistant. Answer directly and concisely. /no_think
This applies /no_think to every conversation by default. You can still
override it on specific messages by appending /think when you need deeper
reasoning.
The same pattern works for OpenClaw skills - add /no_think
to the user message string before passing it to Ollama when the task doesn't need reasoning
depth. Reserve /think for skills doing complex analysis or multi-step planning.