Alignment Jams
Alignment Jams
Check out previous hackathons, locations and what the idea behind them is.
Previous Research
See earlier participants' projects by hackathon and locations.
About & Contact
Check out who is behind the Alignment Jams and contact us.
Blog
Read our blog on what the hackathons are about and what some of the results are.
Quick Links
Next hackathon
Getting Started
For local organizers
Running a hackathon
Frequently asked questions
Media kit & marketing
Why run a hackathon?
Links
For participants & teams
Information
How to submit your project
Starter resources
Next steps
Become a mentor
AIΒ Safety Ideas
Search
Join the next hackathon
All Jams, Events & Projects
See what participants have created
See the upcoming hackathon
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
New AI organization brainstorm
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Risk Defense Initiative
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
AI Safety unionization for bottom-up governance
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
AI Safety Talent Pool Identification
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Analysis of upcoming AGI companies
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Diversity in AI safety
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Critique of OpenAI's alignment plan
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Simon's Time-Off Newsletter
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
ChatGPT Alignment Talent Search
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
AI Safety Subproblems for Software Engineering Researchers
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Catalogue of AI safety
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Read
Authority bias to ChatGPT
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Reverse Word Wizards: Pitting Language Models Against the Art of Reversal
Aarhus Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Player Of Games
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Automated Sandwiching: Efficient Self-Evaluations of Conversation-Based Scalable Oversight Techniques
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Automated Model Oversight Using CoTP
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Physics Guided Deep Learning Interpretation
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Can you keep a secret?
Aarhus Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Read
Sustainable Fashion Brand Language Learning Model 1
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Soft Prompts are a Convex Set
Mentaleap Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Automated Identification of Potential Feature Neurons
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small
LEAH Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
We Discovered An Neuron
LEAH Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
TraCR-Supported Mechanistic Interpretability
Copenhagen Mechanistic Interpretabilty Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
$B$ Confident Bro: Discovering Latent Knowledge In Language Models Without Supervision
OxAI Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Distillation by duplication: The importance of layer selection
CompSoc Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Attention Phrenology: A spatial classification of attention heads
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Iterative summarization interpretability
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Investigating Agent Behavior In different RL methods
Mentaleap Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
The Start of Investigating a 1-Layer SoLU Model
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Trafo Mech Int on the web!
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
One Attention Head Is All You Need for Sorting Fixed-Length Lists
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
In search of linguistic concepts: investigating BERT's context vectors
CompSoc Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Read
Interactive Layerscope
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Trojan detection and implementation on transformers
Responsible Machine Learning AI Testing hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Counting Letters, Chaining Premises & Solving Equations: Exploring Inverse Scaling Problems with GPT-3
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Investigating Training Dynamics via Token Loss Trajectories
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing
Responsible Machine Learning AI Testing hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Evaluating Critical Level Of Perturbations Required To Achieve Certain Fail Rate
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Formal Verification for Paren-balance checking
CDMX AI Testing Hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
Model Hubris: On the Presumptuousness of Large Language Models
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
This Is Fine(-tuning): A benchmark testing LLMs robustness against bad fine-tuning data
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Read
LLM benchmarking through specifically-aligned feedback
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Probing Conceptual Knowledge on Solved Games
Israel Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Model editing hazards at the example of ROME
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Backup Transformer Heads are Robust to Ablation Distribution
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Investigating Neuron Behaviour via Dataset Example Pruning and Local Search
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
An Intuitive Logic for Understanding Autoregressive Language Models
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Top-Down Interpretability Through Eigenspectra
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
An Informal Investigation of Indirect Object Identification in Mistral GPT2-Small Battlestar
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Mechanisms of Causal Reasoning
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Caught Red-Bandit
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Natural language descriptions for natural language directions
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Trying to make GPT2 dream
Tallinn EA jam site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Visualizing the effect prompt design has on text-davinci-002 mode collapse and social biases
Aarhus Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Optimising image patches to change RL-agent behaviour
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Finding unusual neuron sets by activation vector distance
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
How to find the minimum of a list - Transformer Edition
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Alignment Jam : Gradient-based Interpretability of Quantum-inspired neural networks
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
War is 15% conflic, 15% DragonMagazine
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Interpreting Catastrophic Failure Modes in OpenAIβs Whisper
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Algorithmic bit-wise boolean task on a transformer
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Interpretability at a glance
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Neurons and Attention Heads that Look for Sentence Structure in GPT2
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Sparsity Lens
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Observing and Validating Induction heads in SOLU-8l-old
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Read
Regularly Oversimplifying Neural Networks
Online & Global Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Simulating an Alien
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Wording influences truthfulness
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Reasoning with Chain of Thought
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Reducing hindsight neglect with "Let's think step by step"
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
All Fish are Trees
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Soliciting criminal advice from LLMs
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Read
Agreeableness vs. Truthfulness
Apart's Aarhus event
Oct 2022