This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Mechanistic Interpretability Hackathon
Accepted at the 
Mechanistic Interpretability Hackathon
 research sprint on 
January 25, 2023

Interactive Layerscope

We expand Neel Nanda's Interactive Neuroscope to view an entire layer. Looking at Neel Nanda's Interactive Neuroscope, we were stymied by the question of which neuron we ought to try to look at. It seemed potentially useful to be able to quickly map the activations of every neuron in the layer, particularly for smaller models with manageable numbers of neurons. To that end, we build a new version of the Neuroscope which generates a graphical representation of the entire layer. In the figure below, we show layer 7, generated using the default text in Nanda's Neuroscope: "The following is a list of powers of 10: 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000". We analyze more prompts with this tool and identify some interesting patterns and possible avenues for further research.

By 
Víctor Levoso Alejandro González Chris Lonsberry
🏆 
4th place
3rd place
2nd place
1st place
 by peer review