llmPublished: January 22, 2025

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning

By DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruibin Yuan, W. L. Zhao, Y. Wu

Research TL;DR

"Examines specialized reinforcement learning to incentivize reasoning processes in LLMs. Delivers top-tier coding and math benchmarks using open weights."

Abstract

We introduce DeepSeek-R1-Zero and DeepSeek-R1, reasoning models trained through large-scale Reinforcement Learning. R1-Zero displays emergent behaviors like self-correction and thinking structures, while R1 incorporates cold-start data to align output behaviors and excels in math and code.

Read full paper on arXiv →

Related Research

Jun 2017

Attention is all you need

Read Synopsis →

INTEGRATED RECOMMENDATION

Accelerate your workflow with Feedalyze

AI churn detection for SaaS. Know which customers will leave before they do.

Free plan available · Connects to HubSpot, Intercom, Zendesk

Detect churn before it happens →