How do I turn off thinking mode in qwen3?

Append /no_think to any message prompt. For example: "What is the capital of France? /no_think" - the model skips the reasoning chain and answers directly. Use /think to force it back on for a specific message.

Why is qwen3:8b responding slowly in Ollama?

Thinking mode is likely on. qwen3:8b generates a full reasoning chain inside tags before giving its final answer. On complex questions this can take 30-60 seconds. Add /no_think to your prompt to skip it, or disable it by default via a system prompt.

Does qwen3 thinking mode use more context window?

Yes. The thinking tokens are generated and counted against your context window (num_ctx) before the final answer begins. On a complex question, the reasoning chain can consume 500-2000 tokens. If you have num_ctx set to 4096 for speed, a long thinking chain leaves less room for the actual conversation history.

When should I use qwen3 thinking mode?

Use thinking mode for math problems, multi-step logic, complex code review, debugging, and any task where working through it step by step would help. Turn it off for simple questions, casual chat, creative writing, translation, and anything where speed matters more than depth.

← The Context Window

Ollama · June 26, 2026

Qwen3's Thinking Mode: When to Use It and How to Turn It Off

OpenClaw Sanctuary · 5 min read

Pull qwen3:8b and start using it for everything, and at some point you notice it feels slow on questions that should be simple. You ask it what 47 times 13 is and instead of answering in a second, it pauses for five. That's not a problem with the model - that's thinking mode doing exactly what it's supposed to do. The question is whether you actually want it to.

What thinking mode is

Qwen3 includes a hybrid reasoning capability. When thinking mode is active, the model works through the problem internally before producing its final answer - similar in concept to how DeepSeek-R1 shows its chain of thought. In Ollama's API, this shows up as text inside <think>...</think> tags that precedes the actual response content.

When you run ollama run qwen3:8b interactively, Ollama collapses these thinking tokens so you see only the final answer - but they're still being generated and still taking time. Hit the Ollama API directly and you'll see the full output including the reasoning chain.

For hard problems this is genuinely useful. The model that reasoned through a math problem step by step before answering gets the right answer more often than the one that guessed immediately. But for "what's the capital of France" that reasoning chain is waste - the model still generates it, it just isn't helping.

How to toggle it

Qwen3 supports per-message control with two tokens you append to your prompt:

Terminal / API

# Disable thinking for this message
ollama run qwen3:8b "What is the capital of France? /no_think"

# Force thinking on for this message
ollama run qwen3:8b "Prove that sqrt(2) is irrational. /think"

/no_think skips the reasoning chain entirely. The model answers directly from its training knowledge, which is fast and usually right for factual or conversational queries. /think forces it on even if thinking is currently disabled by default.

In the Ollama API you pass these the same way - just append to the message content string. In Open WebUI, type them at the end of your message in the chat box.

Thinking tokens eat your context window

This is the part most people don't think about. The <think> block is generated tokens, and they count against your num_ctx setting just like everything else in the conversation.

The Ollama Advanced guide recommends setting num_ctx to 4096 for speed on most setups. On a complex question, qwen3's reasoning chain can run 500-1500 tokens on its own. That's a meaningful chunk of a 4096 context budget, leaving less room for conversation history and the actual answer.

If you're using thinking mode regularly for hard tasks, bump num_ctx to 8192 or higher so the reasoning chain doesn't crowd out the rest of the context. Note that 8192 context with thinking enabled is noticeably more RAM-hungry than the default - if your machine feels sluggish at this setting, the mini PC guide covers hardware that handles qwen3:8b at full context without thermal throttling:

Terminal

ollama run qwen3:8b --num_ctx 8192

API

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "Review this algorithm for efficiency... /think"}],
  "options": {"num_ctx": 8192},
  "stream": false
}'

When to use each mode

Use thinking mode for:

Math problems and proofs
Multi-step logic and reasoning
Complex code review and debugging
Planning tasks with multiple constraints
Questions where being wrong has a cost

Skip thinking mode for:

Simple factual questions
Casual chat and brainstorming
Creative writing and summarization
Translation
Anything where a 5-second delay is annoying and accuracy isn't critical

Setting a default in Open WebUI

If you want thinking off by default for everyday use, set a system prompt in Open WebUI that includes the preference. Go to Settings > System Prompt and add something like:

System Prompt

You are a helpful assistant. Answer directly and concisely. /no_think

This applies /no_think to every conversation by default. You can still override it on specific messages by appending /think when you need deeper reasoning.

The same pattern works for OpenClaw skills - add /no_think to the user message string before passing it to Ollama when the task doesn't need reasoning depth. Reserve /think for skills doing complex analysis or multi-step planning.