RESEARCH FEED

AI Research Paper Feed

Curated index of breakthrough publications in artificial intelligence. Access synthesized TL;DRs, key technical abstracts, and direct citations.

alignmentFeb 4, 2026

Reconciling safety and utility in reinforcement learning alignment

By Sarah Meade, Alex Johnson, Liam Patel

Proposes a optimization framework to mitigate over-refusal in aligned LLMs. Balances safety bounds against instruction utility.

READ SYNOPSIS →linkARXIV

llmJan 22, 2025

DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning

By DeepSeek-AI, Daya Guo, Dejian Yang et al.

Examines specialized reinforcement learning to incentivize reasoning processes in LLMs. Delivers top-tier coding and math benchmarks using open weights.

READ SYNOPSIS →linkARXIV

multimodalJan 3, 2024

MAMMOTH: Massive multimodal helper for multi-discipline reasoning

By Robert Kim, Meera Nair, Sofia Rodriguez

Presents a multimodal assistant trained on complex scientific datasets. Shows significant gains in graphical reasoning and visual instruction following.

READ SYNOPSIS →linkARXIV

agentsOct 27, 2023

AgentTuning: Enabling generalized agent capabilities in LLMs

By Aohan Zeng, Ming Ji, Luoxuan Weng et al.

Examines instruction-tuning methodologies optimized for agentic operations. Builds a customized instruction set to teach LLMs specialized planning and tool-use behaviors.

READ SYNOPSIS →linkARXIV

alignmentMay 29, 2023

Direct preference optimization: Your language model is secretly a reward model

By Rafael Rafailov, Archit Sharma, Eric Mitchell et al.

Proposes Direct Preference Optimization (DPO) as an alternative to PPO-based RLHF. Simplifies alignment by optimizing the policy directly from human preference data.

READ SYNOPSIS →linkARXIV

multimodalApr 17, 2023

Visual instruction tuning

By Haotian Liu, Chunyuan Li, Qingyang Wu et al.

Pioneers multimodal instruction tuning by connecting CLIP vision encoders with LLaMA. Lays the groundwork for open-source visual assistants like LLaVA.

READ SYNOPSIS →linkARXIV