This hackathon ran from July 14th to July 16th 2023. You can now judge entries.
Machine learning is becoming an increasingly important part of our lives and researchers are still working to understand how neural networks represent the world.
Mechanistic interpretability is a field focused on reverse-engineering neural networks. This can both be how Transformers do a very specific task and how models suddenly improve. Check out our speaker Neel Nanda's 200+ research ideas in mechanistic interpretability.
Sign up below to be notified before the kickoff!
Join us in this iteration of the Alignment Jam research hackathons to spend 48 hour with fellow engaged researchers and engineers in machine learning on engaging in this exciting and fast-moving field!
Join the Discord where all communication will happen. Check out research project ideas for inspiration and the in-depth starter resources under the "Resources" tab.
You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project consists of multiple parts: 1) The PDF report, 2) a maximum 10-minute video overview, 3) title, summary, and descriptions.
You are allowed to think about your project and engage with the starter resources before the hackathon starts but your core research work should happen during the duration of the hackathon.
Besides these two points, the hackathons are mainly a chance for you to engage meaningfully with real research work into some of the state-of-the-art interpretability!
Neel Nanda's quickstart guide to creating research within the Jam's topic, mechanistic interpretability. Get an intro to the mech-int mindset, what a Transformer is, and which problems to work on.
This notebook enables you to write GPT-2 from scratch with the help of the in-depth tutorial by Neel Nanda below.
If you'd like to check out a longer series of tutorials that takes Transformers and language modeling it from the basics, then watch this playlist from the former AI lead of Tesla, Andrej Karpathy.
In this video and Colab demo, Neel shows a live research process using the TransformerLens library. It is split into the chapters of 1) experiment design, 2) model training, 3) surface level interpretability and 4) reverse engineering.
This code notebook goes through the process of reverse engineering a very specific task. Here we get an overview of very useful techniques in mechanistic Transformer interpretability:
See an interview with the authors of the original paper and one of the authors' Twitter thread about the research.
This demo notebook goes into depth on how to use the TransformerLens library. It contains code explanations of the following core features of TransformerLens:
It is designed to be easy to work with and provide an easier time entering the flow state for researchers. Read more on the Github page and see the Python package on PyPi.
Also check out Stefan Heimersheim's "How to: Transformer Mechanistic Interpretability —with 40 lines of code or less!!" which is a more code / less words version of the demo notebook.
Open the visualizer and read the documentation to work with the Transformer Visualizer tool.
This paper introduced the causal tracing method to edit a model's association between tokens. It is a very useful method for understanding which areas of a neural network contributes the most to a specific output.
See the website for the work, the article detailing this work along with the Twitter thread by Neel Nanda. See also the updated (but less intelligible) notebook on progress measuring for grokking (from the article Github).
Join us when we investigate what happens within the brains of language models!
DeepMind researcher Neel Nanda joins us to explore the field of LLM neuroscience during this weekend. Get ready to create impactful research with people across the world!
Don't miss this opportunity to explore machine learning deeper, network, and challenge yourself!
Register now: https://alignmentjam.com/jam/interpretability
[or add your event link here]
Use this template for the report submission. As you create your project presentations, upload your slides here, too. Make a recording of your slideshow or project with the recording capability of e.g. Keynote, Powerpoint, and Slides (using Vimeo).
Big thanks to everyone who submitted their work. Your efforts have made this event a success and set a new bar for what we can expect in future editions of the hackathon!
We want to extend our appreciation to our judges Fazl Barez, Alex Foote, Esben Kran, and Bart Bussman and to our keynote speaker Neel Nanda. Rewatch the winning top 4 project lightning talks below.