When do LRMs outpace standard LLMs?

 title: 'Figure 3: Illustration of the four puzzle environments. Columns show the progression from initial state (top) through intermediate state (middle) to target state (bottom) for puzzles: Tower of Hanoi (disk transfer across pegs), Checkers Jumping (position swapping of colored tokens), River Crossing (transporting entities across a river), and Blocks World (stack reconfiguration).'

Large Reasoning Models (LRMs) outpace standard Large Language Models (LLMs) at medium complexity tasks. In these scenarios, LRMs demonstrate an advantage as their additional reasoning capabilities allow them to perform better than their non-thinking counterparts. Specifically, they begin to show their strengths as the complexity of the problems increases beyond the initial low-complexity tasks where standard LLMs often outperform them.

However, both model types eventually experience a collapse in accuracy at high complexity tasks, highlighting a fundamental limitation in LRMs despite their advanced reasoning models. This pattern reveals three distinct reasoning regimes based on problem complexity[1].