We perform a comparative analysis of the DPO and PPO algorithms where we use techniques from interpretability to attempt to understand the difference between the two
Anonymous: Team members hidden
Rauno Arike, Luke Marks, Amir Abdullah, Luna Mendez
DPOvsPPO