The hackathon is happening right now! Join by signing up below and be a part of our community server.
← Alignment Jams

Distillation Write-a-thon 2.0!

Hosted by AI Safety Info in partnership with Apart Research
August 25th to August 27th 2023
Hackathon starts in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign up

This hackathon ran from August 25th to August 27th 2023.

Join us for the Distillation Hackathon where you are tasked with working on more digestible versions of existing literature in AI safety research and theory!

You will be working on articles for aisafety.info. You can write new answers to questions people have, or edit existing drafts of answers.

See the information document for up-to-date information on the event, e.g. how the top projects are selected.

Schedule

  • Friday, August 25th, 7am UTC: Participants may start!
  • Sunday, August 27th, 11pm UTC: Wrap-up Discord/GatherTown group call

See the full details

Other information

If you write articles in the course of the write-a-thon, and if you write on Stampy in general, your contributions are released under a permissive copyright.

Welcome to Stampede!

The hackathon will run from August 25th, 7am UTC, to August 27th, 7am UTC. You are welcome to do any combination of writing articles, editing articles, and helping each other - all contributions will be considered part of the prize evaluations. See ‘how to participate’.

Collaboration on this event will happen on Discord and Gather. Feel free to join the discord ahead of time and introduce yourself!

The AIsafety.info website. Visit the information on this page in document format here.

There will be prizes for the best four entrants, who will win $1000, $600, $300, and $100. A random entrant (out of other serious entries) will win $200.

Here is the Schedule of events, a list of Suggested Questions to work on during the hackathon, the Article registration doc to register as the main person working on a question, and the Submission form for submitting your articles.

Notes

  • Your contributions during the hackation and to Stampy in general are released under a permissive copyright.
  • “Stampede”, “Write-a-thon” and “distillation hackathon” and “hackathon” all refer to the same thing in this context.

Schedule

Schedule for the event. Visit the linked version.

How to Participate

Broadly, there are three ways to contribute: Writing, editing, and making suggestions. You can do any combination of these three throughout the event. 

If you happen to have a particular subject related to AI safety that you know a lot about, we encourage you to work on articles about that; you can ask on Discord if articles on it already exist.

Writing articles

To see articles that need to be written, choose one from this list of suggested articles or this searchable list of questions from Coda. Alternatively, if you want to write an answer to a question that isn’t already in the system, you can ask on Discord for a doc to be created.

You can start writing in suggest mode on these articles at any time. If you would like to be the main author for an article, go to the Article registration doc and add the article along with your name and discord handle. We will probably give you edit access to the doc, (or you can ask to be given access on the discord if we seem to be taking a while).

Editing articles

Some articles already have drafts, and need editing. If you would like to work on these, you can choose one from this list of suggested articles (scroll down to ‘Editing’) or this searchable list of questions from Coda. 

Articles being edited should have a main editor in charge of the article. If you would like to be the main editor of something, go to the Article registration doc and add the article along with your name and discord handle. We will probably give you edit access to the doc, (or you can ask to be given access on the discord if we seem to be taking a while).

Helping out

To see questions others are already working answering on this weekend, go to the Article registration doc. Use google docs’ suggest and comment features to make suggestions and comments on the documents.

2. Write an answer or edit your chosen question

An answer for aisafety.info is usually a few paragraphs with links to outside resources (and internal links to other Stampy docs, if you want, though we can add them later instead). Take a look at existing articles for an idea of what to aim for.

  • See the style guidelines for more details if you’d like them.
  • There’s also an editing guide we use for the usual Stampy workflow; some of it doesn’t apply to the hackathon, but feel free to optionally look at it for more context.

We encourage people to collaborate on their articles. Hang out in the #distillation-write-a-thon channel on Discord and in gather.town (the Schelling point is the “alignment ecosystem development” room, but people can spread out into smaller working groups) to see what articles are in progress and get on calls about them or just make comments and suggestions on Discord or the doc. The submission document asks about who was helpful, and we will consider this in awarding prizes.

It’s probably a good idea to create a thread on the article you’re writing in the #editing channel, to serve as a central point for discussion on the article (along with the Google doc). You can tag @feedback in the Discord thread if you’re looking for feedback.

You’re allowed to use LLMs to help you, (we suggest this one; be aware that it logs questions where some of us can see them), as long as you mark this with a comment and you take responsibility for any hallucinations that the LLM generates (don’t do this unless you know enough about the subject to be able to judge whether it’s hallucinating).

3.Repeat the above for as many questions as you like. 

As long as you have participated in the hackathon, you can fill out the submission form to be considered for prizes - even if you haven’t been the main author on any articles. The form will ask which other participants were helpful to you, so please try to keep track of who contributed. 

4. Collect the articles you’ve been involved in and submit them

  • Collect the articles you’ve been the main author on and submit them.
  1. The form asks for links to docs you were the main author/editor on, your name and contact information, and an ordered list of who was most helpful editing your articles. 
  2. You are encouraged to submit by the end of the hackathon, but the form will stay open until Thursday, August 31st, 7am UTC
  3. If you can’t make it to the hackathon, you can submit articles you wrote during one contiguous three-day period between the announcement and the hackathon. If you’re taking this option, probably talk to Siao (@monstrologies) on Discord for details.
  • Your entry may win a prize. The best four entries will win $1000, $600, $300, and $100. A random entry (out of other serious entries) will win $200.

Questions

If you have any questions, don’t hesitate to ask in the #distillation-write-a-thon channel, or to message Siao (@monstrologies), on Discord.

Jan Brauner

Research scholar in AI safety at OATML (Oxford)
Keynote speaker and judge

Esben Kran

Co-director at Apart Research
Judge & Organizer

Fazl Barez

Co-director and research lead at Apart Research
Judge

Editor's hub

The Editor's hub contains answers to most of your questions! Check it out here.

What questions do we want?

We are focused specifically on AI existential safety (both introductory and technical questions), but do not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a place to submit detailed ideas for evaluation.

We are interested in:

  • Introductory questions closely related to the field, e.g.
  • "How long will it be until transformative AI arrives?"
  • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
  • "What is Cooperative Inverse Reinforcement Learning?"
  • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field, e.g.
  • "Should I get a PhD?"
  • "Where can I find relevant job opportunities?"

More good examples can be found on the answers table.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety, e.g.
  • "How should self-driving cars weigh up moral dilemmas"
  • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person, e.g.
  • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"

We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer, not be marked as an accepted question, and not be served to readers on our UI.

How can I collect questions?

As well as simply adding your own questions over at add question, you could also message your friends with something like:

Hi, I'm working on a project to create a comprehensive FAQ about AI alignment (you can read about it here https://aisafety.info?state=6436_ if interested). We're looking for questions and I thought you may have some good ones. If you'd be willing to write up a google doc with you top 5-10ish questions we'd be happy to write a personalized FAQ for you. https://coda.io/d/_dfau7sl2hmG/_susRF#_lubMU explains the kinds of questions we're looking for. Thanks!

and maybe bring the Google doc to a Stampy editing session so we can collaborate on answering them or improving your answers to them.

See much more information and guidelines in the Editor's hub!

Research project ideas

See the updated document. Don’t forget to register in this table if you intend to submit a doc as main author. Already claimed articles marked in [brackets].

Writing

Editing

Registered jam sites

No events registered at the moment

Register your own site

The in-person hubs for the Alignment Jams are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research and engineering community. Read more about organizing.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

Submit your project

Submit your project in the Google Forms below:

Accepted submissions to the hackathon

Big thanks to everyone who submitted their work. Your efforts have made this event a success and set a new bar for what we can expect in future editions of the hackathon!

We want to extend our appreciation to our judges Fazl Barez, Alex Foote, Esben Kran, and Bart Bussman and to our keynote speaker Neel Nanda. Rewatch the winning top 4 project lightning talks below.

Give us your feedback
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Relating induction heads in Transformers to temporal context model in human free recall
Relating induction heads in Transformers to temporal context model in human free recall
This study explores the parallels between the mechanisms of induction heads in Transformer models and the process of sequential memory recall in humans, finding surprising similarities that could potentially enhance our understanding of both artificial intelligence and human cognition.
Ji-An Li
Solo Moonhowl
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Who cares about brackets?
Who cares about brackets?
Investigating how GPT2-small is able to accurately predict closing brackets
Theo Clark, Alex Roman, Hannes Thurnherr
Team Brackets
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Embedding and Transformer Synthesis
Embedding and Transformer Synthesis
I programmatically created a set of embeddings that can be used to perfectly reconstruct a binary classification function (“embedding synthesis”). I used these embeddings to programmatically set weights for a 1-layer transformer that can also perfectly reconstruct the classification function (“transformer synthesis”). With one change, this reconstruction matches my original hypothesis of how a pre-existing transformer works. I ran several experiments on my synthesized transformer to evaluate my synthetic model.
Rick Goldstein
Rick Goldstein
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Interpreting Planning in Transformers
Interpreting Planning in Transformers
We trained some simple models that figure out how to traverse a graph from a list of edges witch is kind of "planning" in some sense if you squint and got some traction on intepreting one of them.
Victor Levoso Fernandez , Abhay Sheshadri
Shoggoth Neurosurgeons
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Towards Interpretability of 5 digit addition
Towards Interpretability of 5 digit addition
This paper details a hypothesis for the internal structure of the 5 digit addition model that may explain the observed variability & proposes specific testing to confirm (or not) the hypothesis.
Philip Quirke
Philip Quirke
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Factual recall rarely happens in attention layer
Factual recall rarely happens in attention layer
In this work, I investigated whether factual information is saved only in the FF layer or also in the attention layers, and found that from a large enough FF hidden dimension, factual information is rarely saved in the attention layers.
Bary Levy
mentaleap
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
SoLU activation functions have been shown to make large language models more interpretable, incentivizing alignment of a fraction of features with the standard basis. However, this happens at the cost of suppression of other features. We investigate this problem using experiments suggested in Nanda’s 2023 work “200 Concrete Open Problems in Mechanistic Interpretability”. We conduct three main experiments. 1, We investigate the layernorm scale factor changes on a variety of input prompts; 2, We investigate the logit effects of neuron ablations on neurons with relatively low activation; 3, Also using ablations, we attempt to find tokens where “the direct logit attribution (DLA) of the MLP layer is high, but no single neuron is high.
Mateusz Bagiński, Kunvar Thaman, Rohan Gupta, Alana Xiang, j1ng3r
SoLUbility
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for One is 1- Analyzing Activations of Numerical Words vs Digits
One is 1- Analyzing Activations of Numerical Words vs Digits
Extensive research in mechanistic interpretability has showcased the effectiveness of a multitude of techniques for uncovering intriguing circuit patterns. We utilize these techniques to compare similarities and differences among analogous numerical sequences, such as the digits “1, 2, 3, 4”, the words “one, two, three, four”, and the months “January, February, March, April”. Our findings demonstrate preliminary evidence suggesting that these semantically related sequences share common activation patterns in GPT-2 Small.
Mikhail L
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for DPO vs PPO comparative analysis
DPO vs PPO comparative analysis
We perform a comparative analysis of the DPO and PPO algorithms where we use techniques from interpretability to attempt to understand the difference between the two
Rauno Arike, Luke Marks, Amir Abdullah, Luna Mendez
DPOvsPPO
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Experiments in Superposition
Experiments in Superposition
In this project we do a variety of experiments of superposition. We try to understand superposition in attention heads, MLP layers, and nonlinear computation in superposition.
Kunvar Thaman, Alice Rigg, Narmeen Oozeer, Joshua David
Team Super Position 1
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Multimodal Similarity Detection in Transformer Models
Multimodal Similarity Detection in Transformer Models
[hidden]
Tereza Okalova, Toyosi Abu, James Thomson
End Black Box Syndrome
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Toward a Working Deep Dream for LLM's
Toward a Working Deep Dream for LLM's
This project aims to enhance language model interpretability by generating sentences that maximally activate a specific neuron, inspired by the DeepDream technique in image models. We introduce a novel regularization technique that optimizes over a lower-dimensional latent space rather than the full 768-dimensional embedding space, resulting in more coherent and interpretable sentences. Our approach uses an autoencoder and a separate GPT-2 model as an encoder, and a six-layer transformer as a decoder. Despite the current limitation of our autoencoder not fully reconstructing sentences, our work opens up new directions for future research in improving language model interpretability.
Scott Viteri and Peter Chatain
PeterAndScott
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Residual Stream Verification via California Housing Prices Experiment
Residual Stream Verification via California Housing Prices Experiment
In this data science project, I conducted an experiment to verify the Residual Stream as a Shared Bandwidth Hypothesis. The study utilized California Housing Prices data to support the experimental investigation.
Jonathan Batista Ferreira
Condor camp team
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Problem 9.60 - Dimensionaliy reduction
Problem 9.60 - Dimensionaliy reduction
The idea is to separate positive (1) and negative (0) comments in the vector space – the better the model, the better the separation. We could see the separation using a dimension reduction (PCA) of the vectors in 2 dimensions.
Juliana Carvalho de Souza
Juliana's team
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Go to project page
Late submission
project image for Goal Misgeneralization
Goal Misgeneralization
The main argument put forward in the papers is that we have to be careful about the inner alignment problem. We could reach terrible outcomes scaling this problem if we continue developing more powerful AI’s. Assuming the use of Reinforcement Learning from Human Feedback (RLHF).
João Lucas Duim
João Lucas Duim

Send in pictures of you having fun hacking away!

We love to see the community flourish and it's always great to see any pictures you're willing to share uploaded here.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you for sharing !
Oops! Something went wrong while submitting the form.
Q&A with Neel Nanda
Q&A with Neel Nanda
Discussing how Transformer models traverse graphs!
Discussing how Transformer models traverse graphs!