Below is an FAQ-style summary of what you can expect (navigate it with the table of contents on the left).
What is it?
The Interpretability Hackathon is a weekend-long event where you participate in teams (1-6) to create interesting and fun research. You submit a PDF report that summarizes and discusses your findings in the context of AI safety. These reports will be judged by our panel and you can win up to $1,000!
It runs from 11th Nov to 13th Nov (in two weeks) and you’re welcome to join for a part of it (see further down). We get an interesting talk by an expert in the field and hear more about the topic.
Everyone can participate and we encourage you to join especially if you’re considering AI safety from another career . We prepare templates for you to start out your projects and you’ll be surprised what you can accomplish in just a weekend – especially with your new-found friends!
Read more about how to join, what you can expect, the schedule, and what previous participants have said about being part of the hackathon below.
Where can I join?
You can join the event both in-person and online but everyone needs to make an account and join the jam on the itch.io page.
The in-person locations include the LEAH offices in London right by UCL, Imperial, King’s College, and London School of Economics (link); Aarhus University in Aarhus, Denmark (link), and Tallinn, Estonia (link). The virtual event space is on GatherTown (link).
Everyone should join the Discord to ask questions, see updates and announcements, find team members, and more. Join here.
What are some examples of interpretability projects I could make?
You can check out a bunch of interesting, smaller interpretability project ideas on AI Safety Ideas such as reconstructing the input from neural activations, evaluating the alignment tax of interpretable models, or making models’ uncertainty interpretable.
Other examples of practical projects can be to find new ways to visualize features in language models, such as Anthropic has been working on, distilling mechanistic interpretability research, create a demo for a much more interpretable language model, or map out how a possible interpretable AGI might look with our current lens.
You can also do projects in explainability about how much humans understand why the outputs of language models look the way they do, how humans see attention visualizations, or maybe even the interpretability of humans themselves and take inspiration from the brain and neuroscience.
Also check out the results from the last hackathon to see what you might accomplish during just one weekend. The judges were really quite impressed with the full reports given the time constraint! You can also read the complete projects here.
- Redwood Research's interpretability tools: http://interp-tools.redwoodresearch.org/
- The activation atlas: https://distill.pub/2019/activation-atlas/
- The Tensorflow playground: https://playground.tensorflow.org/
- The Neural Network Playground (train simple neural networks in the browser): https://nnplayground.com/
- Visualize different neural network architectures: http://alexlenail.me/NN-SVG/index.html
- Distill publication on visualizing neural network weights
- Andrej Karpathy's "Understanding what convnets learn"
- Looking inside a neural net
You can also see more on the resources page.
Why should I join?
There’s loads of reasons to join! Here are just a few:
- See how fun and interesting AI safety can be
- Free food the whole weekend!
- Get to know new people who are into ML safety and interpretability
- Win up to $1,000, helping you towards your next Armani handbag
- Get practical experience with ML safety research
- Show the AI safety labs what you are able to do to increase your chances at some amazing jobs
- Get a cool certificate that you can show your friends and family
- Acquire yourself a neat sticker
- Figure out what all these San Franciscan EAs are up to
- Get some proof that you’re actually really cool so you can get that grant to pursue AI safety research that you always wanted
- And many many more… Come along!
What if I don’t have any experience in AI safety?
Please join! This can be your first foray into AI and ML safety and maybe you’ll realize that it’s not that hard. Even if you don’t find it particularly interesting, this time you might see it in a new light!
There’s a lot of pressure from AI safety to perform at a top level and this seems to drive some people out of the field. We’d love it if you consider joining with a mindset of fun exploration and get a positive experience out of the weekend.
What is the agenda for the weekend?
The schedule runs from 6PM CET / 9AM PST Friday to 7PM CET / 10AM PST Sunday. We start with an introductory talk and end with an awards ceremony. Join the public ICal here.
CET / PST Fri 6 PM / 9 AMIntroduction to the hackathon, what to expect, and a talk from an expert. Splitting into teams.Fri 7:30 PM / 10:30 AMHacking begins! Free dinner.Sun 2 PM / 5 AMFinal submissions have to be finished. Judging begins.Sun 6 PM / 9 AMAward ceremony starts and the winning projects are presented.Sun 7 PM / 10 AMSocializing and dinner.
I’m busy, can I join for a short time?
As a matter of fact, we encourage you to join even if you only have a short while available during the weekend!
For the last hackathon, the average work amount was 17 hours and a couple of participants only spent a few hours on their projects as they joined Saturday. Another participant was at an EA retreat at the same time and even won a prize!
So yes, you can both join without coming to the beginning or end of the event, and you can submit research even if you’ve only spent a few hours on it. We of course still encourage you to come for the intro ceremony and join for the whole weekend.
Wow this sounds fun, can I also host an in-person event with my local AI safety group?
Definitely! It might be hard to make it for this hackathon but we encourage you to join our team of in-person organizers around the world for the next hackathon that we expect to run in mid December as a Christmas special!
You can read more about what we require here and the possible benefits it can have to your local AI safety group here. Sign up as a host on the button on this page.
What have previous participants said about this hackathon?
“You always learn a lot just going all into a project and it feels like an achievement when you come out the other side and have something cool to show. Definitely worth participating in.”“The hackathon was a lot of fun to participate in, everyone was great and it was really interesting to spend the time figuring out what kind of risks are associated with language models and when they might occur.”“Intro texts from Buck abt being an AI experimental psychologist really sparked my curiosity and made me more confident in my ability to actually generate good work (I wasn't really into AI safety before the hackathon).”“I enjoyed the hackathon very much. The task was open-ended, and interesting to manage. I felt like I learned a lot about the field of AI safety by exploring the language models during the hackathon.”“It was a fun way to spend the weekend! I think one of the best parts about it is working in teams - you learn some cooperation skills, as well as get to consider things that you maybe never even thought of before”
Where can I read more about this?
- The information page
- Hackathon page
- The Alignment Jam website
- Inspiration list for the interpretability hackathon
- The Discord server
- In-person and online events
- The GatherTown hacking space
- London event page
- Tallinn event page
- Aarhus event page
- The previous hackathon
- Results from the previous hackathon
- The previous hackathon’s announcement post
- Starter code templates and inspiration list for the previous hackathon
Again, sign up here by clicking “Join jam” and read more about the hackathons more generally here.
Godspeed, research jammers!