Large language models (LLMs) can improve themselves without human intervention. The authors conducted ablation studies and showed that fine-tuning on reasoning is critical for self-improvement.
the paper
the paper