Deep Think with Confidence

1Meta AI,2UCSD

Equal Contribution
Main Results

Deep Think with Confidence (DeepConf) is a parallel thinking method that enhances both LLM reasoning performance and efficiency at test time. It leverages model-internal confidence signals to dynamically filter low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks (see the vLLM example provided, full source codes will be released soon). It achieves up to 99.9% accuracy on AIME 2025 while reducing generated tokens by up to 84.7% compared to the standard thinking approaches.

Below is a real-time demo of DeepConf applied to the HMMT'25 dataset using the Qwen3-8B model with parallel thinking.

DeepConf with parallel thinking rejects low confidence reasoning traces during generation to achieve higher reasoning performance while using significantly fewer generated tokens.

DeepConf Offline Filtering Diagram

Confidence Measurements and Offline Thinking with Confidence

BibTeX

@misc{fu2025deepthinkconfidence,
      title={Deep Think with Confidence}, 
      author={Yichao Fu and Xuewei Wang and Yuandong Tian and Jiawei Zhao},
      year={2025},
      eprint={2508.15260},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.15260}, 
}