Trust but Verify! A Survey on Verification Design for Test-time Scaling

Abstract

Test-time scaling (TTS) has emerged as a new frontier for scaling the performance of Large Language Models. In test-time scaling, by using more computational resources during inference, LLMs can improve their reasoning process and task performance. Several approaches have emerged for TTS such as distilling reasoning traces from another model or exploring the vast decoding search space by employing a verifier. Employing external verifiers or self-verification is crucial for test-time scaling, as they help guide the search process over large reasoning space. Verification for test-time scaling entails mechanisms or scoring functions used to evaluate the quality or plausibility of different reasoning paths or solutions from the language model during inference, enabling efficient search or selection among them without access to ground-truth labels. This paradigm commonly termed has emerged as a superior approach owing to parameter free scaling at inference time and high performance gains. The verifiers could be prompt-based, fine-tuned as a discriminative or generative model to verify process paths, outcomes or both. Despite their widespread adoption, there is no detailed collection, clear categorization and discussion of diverse verification approaches and their training mechanisms. In this survey, we cover the diverse approaches in the literature and present a unified view of verifier training, types and their utility in test-time scaling.

🧬 Test-Time Scaling and Verifier Design

1. Scaling Paradigms at Inference

In parallel scaling, the model generates multiple independent outputs simultaneously, often by varying sampling temperature or prompt exem- plars to induce diversity (Levy et al., 2023; Brown et al., 2024). These outputs form a candidate set $S = {s1, . . . , sk}$, from which a selection mechanism V identifies the final answer $s∗ = V(S)$.

Sequential scaling, in contrast, decomposes a problem into intermediate steps or sub-questions. Each step builds on the previous one, produc- ing a sequence $\{sq_1, . . . , sq_T\}$ where each sqt = $LLM(sq_{t−1}, ct)$ depends on the prior reasoning step and contextual information ct

2. Verification for Guiding Test-time Scaling

Tuning
- Supervised Fine-Tuning (SFT): by training on synthetic or distilled long CoT examples, SFT allows a model to imitate extended reasoning patterns.
- Reinforcement Learning (RL): RL can guide a model’s policy to generate longer or more accurate solutions.
Inference
- Verification: The verification process plays an important role in TTS, and can be adapted to:
  - Directly select the best output solution among multiple parallel hypothesis generated (Parallel Scaling).
  - Guide the step-by-step reasoning process and determine when to stop to give the final solution (Sequential Scaling).
  - Serve as the criteria or reward signal in the search process.
  - Determine what solutions to aggregate or in selecting the best solution and how to aggregate them (e.g.,as weights for the reasoning paths).

3. Where to Scale

Reasoning: Math, Code, Science, Game & Strategy, Medical, etc.
General-Purpose: Basics, Agents, Knowledge, Open-Ended, Multi-Modal, etc.

4. How Well to Scale

Performance: measures correctness and robustness of outputs.
Efficiency: captures the cost-benefit tradeoffs of TTS methods.
Controllability: assesses whether TTS methods adhere to resource or output constraints, such as compute budgets or output lengths.
Scalability: quantifies how well models improve with more test-time compute (e.g., tokens or steps).

🔍 Paper Tables

Test-time Scaling Paper Summary

Open Issues

Please raise Issues in the repo to add new papers or insights.

Submit your Issues

Comments & Discussion

BibTeX

@misc{venkteshvverifiers,
      title={Trust but Verify! A Survey on Verification Design for Test-time Scaling}, 
      author={Venktesh V, Mandeep Rathee and Avishek Anand},
      year={2025},
      eprint={2503.24235},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={}, 
}