llmPublished: January 22, 2025
DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning
By DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruibin Yuan, W. L. Zhao, Y. Wu
Research TL;DR
"Examines specialized reinforcement learning to incentivize reasoning processes in LLMs. Delivers top-tier coding and math benchmarks using open weights."
Abstract
We introduce DeepSeek-R1-Zero and DeepSeek-R1, reasoning models trained through large-scale Reinforcement Learning. R1-Zero displays emergent behaviors like self-correction and thinking structures, while R1 incorporates cold-start data to align output behaviors and excels in math and code.
Related Research
INTEGRATED RECOMMENDATION
Accelerate your workflow with Feedalyze
AI churn detection for SaaS. Know which customers will leave before they do.
Free plan available · Connects to HubSpot, Intercom, Zendesk