This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Mechanistic Interpretability Hackathon
Accepted at the 
Mechanistic Interpretability Hackathon
 research sprint on 
January 25, 2023

We Discovered An Neuron

How would a transformer know when to use ‘an’ rather than ‘a’? The autoregressive nature of the transformer means that it is only capable of predicting one token at a time, yet the choice between the words depends on the subsequent word. We use an IOI-inspired prompt with activation patching to isolate and identify a part of GPT-2 Large — a single MLP neuron — that is largely responsible for predicting the token ‘ an’. When we patch the activation of each neuron in turn we can identify which one is responsible for increasing the logit of this token. We present our method and further evidence in this paper.

By 
Joseph Miller, Clement Neo
🏆 
4th place
3rd place
2nd place
1st place
 by peer review