Discover RL-PLUS, a hybrid-policy optimization method that overcomes capability boundary collapse in LLMs, boosting reasoning and problem-solving skills.
Boost large language models' reasoning with training-free test-time contrastive learning, improving performance without extra training or heavy computation...
Discover C-voting, a confidence-based test-time voting method that improves neural model accuracy without explicit energy functions in reasoning tasks.