This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability Hackathon
Accepted at the 
Interpretability Hackathon
 research sprint on 
November 15, 2022

Top-Down Interpretability Through Eigenspectra

Random matrix theory (RMT) offers a host of tools to make sense of neural networks. In this paper, we look at the heavy-tailed random matrix theory developed by Martin and Mahoney (2021). From the spectrum of eigenvalues, it’s possible to derive generalization metrics that are independent of data, and to make decompose the training process into five unique phases. Additionally, the theory predicts and tests a key form learning bias known as “self-regularization.” In this paper, we extend the results from computer vision to language models, finding many similarities and a few potentially meaningful differences. This provides a glimpse of what more “top-down” interpretability approaches might accomplish: from a deeper understanding of the training process and path-dependence to inductive bias and generalization.

By 
Jan Wehner, Rauno Arike, Jesse Hoogland, Simon Marshall
🏆 
4th place
3rd place
2nd place
1st place
 by peer review