Study: Large Reasoning Models Demonstrate Limitations With Complex Tasks
Apple Machine Learning
|
Contributed by: Drex DeFord
Summary
Recent advancements in Large Reasoning Models (LRMs) reveal their capacity to generate detailed reasoning processes, yet their fundamental capabilities remain unclear. Current evaluations focus on final answer accuracy, neglecting the reasoning processes. This study uses controllable puzzle environments to analyze both final responses and internal reasoning traces. Findings indicate that while LRMs initially improve with increasing complexity, they ultimately experience a drop in accuracy, exhibiting a counterintuitive scaling behavior. In low complexity tasks, standard models outperform LRMs, while LRMs excel in medium complexity tasks, but both models struggle with high complexity, leading to significant performance collapse.