42

Name the four puzzle environments.

 title: 'Figure 1: Top: Our setup enables verification of both final answers and intermediate reasoning traces, allowing detailed analysis of model thinking behavior. Bottom left & middle: At low complexity, non-thinking models are more accurate and token-efficient. As complexity increases, reasoning models outperform but require more tokens—until both collapse beyond a critical threshold, with shorter traces. Bottom right: For correctly solved cases, Claude 3.7 Thinking tends to find answers early at low complexity and later at higher complexity. In failed cases, it often fixates on an early wrong answer, wasting the remaining token budget. Both cases reveal inefficiencies in the reasoning process.'

The four puzzle environments mentioned in the document are:

  1. Tower of Hanoi - A classic puzzle involving the transfer of disks between pegs while following specific movement rules.

  2. Checker Jumping - A one-dimensional puzzle that requires the positions of red and blue checkers to be swapped with specific movement constraints.

  3. River Crossing - A planning puzzle where actors and their corresponding agents must cross a river using a boat under safety constraints.

  4. Blocks World - A puzzle involving the rearrangement of stacks of blocks from an initial configuration to a target configuration.

These environments facilitate the analysis of reasoning mechanisms in Large Reasoning Models (LRMs) by varying complexity systematically while maintaining consistent logical processes[1].


Related Content From The Pandipedia