This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Evals
Accepted at the 
Evals
 research sprint on 
August 20, 2023

Preliminary measures of faithfulness in least-to-most prompting

In our experiment, we scrutinize the role of post-hoc reasoning in the performance of large language models (LLMs), specifically the gpt-3.5-turbo model, when prompted using the least-to-most prompting (L2M) strategy. We examine this by observing whether the model alters its responses after previously solving one to five subproblems in two tasks: the AQuA dataset and the last letter task. Our findings suggest that the model does not engage in post-hoc reasoning, as its responses vary based on the number and nature of subproblems. The results contribute to the ongoing discourse on the efficacy of various prompting strategies in LLMs.

By 
Mateusz Bagiński, Jakub Nowak, Lucie Philippon
🏆 
4th place
3rd place
2nd place
1st place
 by peer review