
Bimodal Approach to GenAI Application Architecture
Dual-speed architecture helps you ship stable apps and iterate on AI fast—one product, two release pipelines, and clear contracts between them. Here’s how.

A concise introduction to reasoning models as the next frontier in AI, highlighting their structured thinking capabilities beyond text prediction.
A new paradigm is now mainstream: reasoning models. This advanced form of language models represent a significant leap forward in AI capabilities, offering more than just text generation: they provide structured, logical thinking that mimics human reasoning processes. Let's explore what makes these models special and how they're changing the AI paradigm today.
Reasoning is the cognitive process through which we draw conclusions from available information, encompassing logical thinking and problem-solving. In AI terms, reasoning involves:
The significance of reasoning in AI is profound: it enables machines to simulate human decision-making and problem-solving abilities.

Large language models have already achieved impressive capabilities:
However, they still struggle with complex reasoning challenges. When faced with multi-step problems requiring deep logical analysis, traditional LLMs often falter, producing plausible-sounding but incorrect results.

One of the earliest approaches to improve reasoning was Chain-of-Thought (CoT) prompting: a technique that encourages models to generate intermediate reasoning steps before providing a final answer.

Reasoning models take this concept to the next level. Unlike standard LLMs with Chain-of-Thought prompting bolted on, these are models specifically trained to:

As DeepSeek researchers observed in their groundbreaking paper on the R1 model:
"These models first think about the reasoning process in their mind and then provide the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively."

The key insight is that by increasing inference compute—specifically by extending the thinking phase—accuracy improves dramatically. This seems to contradict traditional understanding of autoregressive processes, where generating longer outputs typically increases the probability of errors.
This brings us to a fascinating contradiction. According to traditional understanding of autoregressive models (as highlighted by researchers like Yann LeCun), the error probability in large language models should increase with output length—a phenomenon known as "compounding error."

Yet reasoning models demonstrate the opposite effect. When given more "thinking time" (more tokens dedicated to reasoning), these models become significantly more accurate. This challenges fundamental assumptions about how large language models function and learn.
Creating a reasoning model involves several key stages that differ from traditional LLM development:
Unlike traditional supervised fine-tuning (which teaches models to imitate specific reasoning patterns), reasoning models employ reinforcement learning in easily verifiable domains:

This approach is particularly effective because:
Perhaps the most fascinating aspect of reasoning models is what DeepSeek researchers call the "aha moment"—when the model, trained with reinforcement learning, spontaneously develops self-correction mechanisms:

These capabilities emerged without explicit training, demonstrating how reinforcement learning can yield behaviors beyond what was directly programmed.
The million-dollar question for reasoning models is generalization: can reasoning skills learned in closed, reward-rich domains (like mathematics) transfer to open-ended problems?
Research shows a stark contrast between supervised fine-tuning and reinforcement learning approaches:

This transfer learning capability is critical for the real-world utility of reasoning models. As Andrej Karpathy noted, the success of reasoning models depends heavily on whether "knowledge acquired in closed reward-rich domains can transfer to open-ended problems."
Several leading reasoning models are now available, each with different capabilities:
| Model Provider | Model | Public Access | Math Performance (AIME) | Science (GPQA) | General Reasoning (ARC) |
| OpenAI | o1-mini | Chat only | 63.6% | 60% | 9.5% |
| OpenAI | o1 | Chat only | 83.3% | 78.8% | 25-32% |
| OpenAI | o3-mini | Chat & API | 60-87.3% | 70.6-79.7% | 11-35% |
| OpenAI | o3 | Through products | 96.7% | 87.7% | 76-87% |
| DeepSeek | R1 | Chat & API/Weights | 79.8% | 71.5% | 15.8% |
| Gemini 2.0 (Flash Thinking) | Chat & API | 73.3% | 74.2% | - |
Notably, these models also show impressive performance on coding tasks, with most achieving high scores on platforms like Codeforces and SWE-bench verified challenges.
Prompting reasoning models differs significantly from working with traditional LLMs:
| Technique | Traditional LLMs | Reasoning Models |
| Zero-shot prompting | Weak performance | Good performance |
| Few-shot prompting | Significantly improves accuracy | Can actually harm reasoning performance |
| RAG (Retrieval-Augmented Generation) | Recommended | Minimally helpful |
| Structured prompts (XML tags) | Helpful | Essential |
| Chain-of-Thought prompting | Very beneficial | Not recommended, may interfere with built-in reasoning |
| Ensemble methods | Improves performance | Minimal improvement |
As the team at PromptHub notes: "Reasoning models work best when given clean, direct instructions without examples that might constrain their thinking process."
Looking ahead, several key developments seem imminent:

Despite their impressive capabilities, reasoning models still have significant limitations:
You can read more about solving these shortcomings in our previous blog: Path to AGI.
The practical applications of reasoning models are vast.
The rise of reasoning models signals several important shifts in AI development:

Reasoning models represent a significant evolutionary step for artificial intelligence. By incorporating explicit reasoning processes and leveraging the power of reinforcement learning, these models are pushing the boundaries of what AI can accomplish.
While we're still in the early days of this technology, the trajectory is clear: AI systems that can think through problems step by step, evaluate their own reasoning, and arrive at solutions through logical deduction are becoming reality. The implications for industries, knowledge work, and software development are profound and far-reaching.
As these technologies continue to develop, we're likely to see an increasing shift toward agentic systems that can operate with greater autonomy and tackle increasingly complex reasoning challenges.

Sources
Language Models are Few-Shot Learners
Instruction Tuning for Large Language Models: A Survey
Training language models to follow instructions with human feedback
Deep reinforcement learning from human preferences
Constitutional AI: Harmlessness from AI Feedback
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Model distillation – Improve smaller models with distillation techniques
DeepSeek R1 Distill Now Available in Private LLM for iOS and macOS
Reward Hacking in Reinforcement Learning
OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
Introducing SWE-bench Verified
OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI
Prompt Engineering with Reasoning Models
Reasoning models – Explore advanced reasoning and problem-solving models
OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving
Explore more stories