This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability 2.0
Accepted at the 
Interpretability 2.0
 research sprint on 
May 10, 2023

Algorithmic Explanation: A method for measuring interpretations of neural networks

How do you make good explanations for what a neural network does? We provide a framework for analysing explanations of the behaviour of neural networks by looking at the hypothesis of how they would act on a set of given inputs. By trying to model a neural network using known logic (or as much white-box logic as possible), this framework is a start on how we could tackle neural network interpretability as they get more complex.

By 
Joseph Miller, Clement Neo
🏆 
4th place
3rd place
2nd place
1st place
 by peer review