This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability
Accepted at the 
Interpretability
 research sprint on 
July 17, 2023

Interpreting Planning in Transformers

We trained some simple models that figure out how to traverse a graph from a list of edges witch is kind of "planning" in some sense if you squint and got some traction on intepreting one of them.

By 
Victor Levoso Fernandez , Abhay Sheshadri
🏆 
4th place
3rd place
2nd place
1st place
 by peer review