Alignment Jams
Alignment Jams
Check out previous hackathons, locations and what the idea behind them is.
Previous Research
See earlier participants' projects by hackathon and locations.
About & Contact
Check out who is behind the Alignment Jams and contact us.
Blog
Read our blog on what the hackathons are about and what some of the results are.
Quick Links
Next hackathon
Getting Started
For local organizers
Running a hackathon
Frequently asked questions
Media kit & marketing
Why run a hackathon?
Links
For participants & teams
Information
How to submit your project
Starter resources
Next steps
Become a mentor
AIΒ Safety Ideas
Search
Join the next hackathon
All Jams, Events & Projects
See what participants have created
See the upcoming hackathon
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Fuzzing Large Language Models
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Towards Formally Describing Program Traces from Chains of Language Model Calls with Causal Influence Diagrams: A Sketch
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
It Ain't Much but it's ONNX Work
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
OthelloScope
CPH-INT V2
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Improving TransformerLens Head Detector
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Dropout Incentivizes Privileged Bases
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Solving the CNN Mech Int Challenge
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
AutoAdminsteredAntidotes: Circuit detection in a poisoned model for MNIST classification
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Othello Mechint playground
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Detecting Phase Transitions
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Exploring OthelloGPT
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
Private
Info hazard
Go to project page
Algorithmic Explanation: A method for measuring interpretations of neural networks
Global Interpretability hackathon 2.0
Apr 2023
4th π
3rd π
2nd π
1st π
AI Governance
Private
Info hazard
Go to project page
AI and Democracy: Balancing Risks and Opportunities to Maintain Meaningful Human Control
The Delft AI Governance Challenge
Mar 2023
4th π
3rd π
2nd π
1st π
AI Governance
Private
Info hazard
Go to project page
AI Impact Assessments
CPH AI Governance Hackathon
Mar 2023
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Investigating Neuron Behaviour via Dataset Example Pruning and Local Search
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
New AI organization brainstorm
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Risk Defense Initiative
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
AI Safety unionization for bottom-up governance
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
AI Safety Talent Pool Identification
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Analysis of upcoming AGI companies
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Diversity in AI safety
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Critique of OpenAI's alignment plan
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Simon's Time-Off Newsletter
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
ChatGPT Alignment Talent Search
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
AI Safety Subproblems for Software Engineering Researchers
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Catalogue of AI safety
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Thinkathon
Private
Info hazard
Go to project page
Authority bias to ChatGPT
EAG Bay Area Thinkathon
Mar 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Reverse Word Wizards: Pitting Language Models Against the Art of Reversal
Aarhus Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Player Of Games
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Automated Sandwiching: Efficient Self-Evaluations of Conversation-Based Scalable Oversight Techniques
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Automated Model Oversight Using CoTP
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Physics Guided Deep Learning Interpretation
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Can you keep a secret?
Aarhus Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Oversight
Private
Info hazard
Go to project page
Sustainable Fashion Brand Language Learning Model 1
Virtual Scale Oversight Hackathon
Feb 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Soft Prompts are a Convex Set
Mentaleap Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Automated Identification of Potential Feature Neurons
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small
LEAH Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
We Discovered An Neuron
LEAH Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
TraCR-Supported Mechanistic Interpretability
Copenhagen Mechanistic Interpretabilty Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
$B$ Confident Bro: Discovering Latent Knowledge In Language Models Without Supervision
OxAI Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Distillation by duplication: The importance of layer selection
CompSoc Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Attention Phrenology: A spatial classification of attention heads
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Iterative summarization interpretability
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Investigating Agent Behavior In different RL methods
Mentaleap Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
The Start of Investigating a 1-Layer SoLU Model
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Trafo Mech Int on the web!
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
One Attention Head Is All You Need for Sorting Fixed-Length Lists
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
In search of linguistic concepts: investigating BERT's context vectors
CompSoc Mechanistic Interpretability Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
Mechanistic
Private
Info hazard
Go to project page
Interactive Layerscope
Online & Global Hackathon
Jan 2023
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Trojan detection and implementation on transformers
Responsible Machine Learning AI Testing hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Counting Letters, Chaining Premises & Solving Equations: Exploring Inverse Scaling Problems with GPT-3
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Investigating Training Dynamics via Token Loss Trajectories
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing
Responsible Machine Learning AI Testing hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Evaluating Critical Level Of Perturbations Required To Achieve Certain Fail Rate
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Formal Verification for Paren-balance checking
CDMX AI Testing Hackathon
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
Model Hubris: On the Presumptuousness of Large Language Models
Testing the AI on the internet
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
This Is Fine(-tuning): A benchmark testing LLMs robustness against bad fine-tuning data
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
AI Testing
Private
Info hazard
Go to project page
LLM benchmarking through specifically-aligned feedback
Delft Alignment Jam
Dec 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Probing Conceptual Knowledge on Solved Games
Israel Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Model editing hazards at the example of ROME
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Backup Transformer Heads are Robust to Ablation Distribution
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
An Intuitive Logic for Understanding Autoregressive Language Models
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Top-Down Interpretability Through Eigenspectra
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
An Informal Investigation of Indirect Object Identification in Mistral GPT2-Small Battlestar
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Mechanisms of Causal Reasoning
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Caught Red-Bandit
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Natural language descriptions for natural language directions
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Trying to make GPT2 dream
Tallinn EA jam site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Visualizing the effect prompt design has on text-davinci-002 mode collapse and social biases
Aarhus Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Optimising image patches to change RL-agent behaviour
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Finding unusual neuron sets by activation vector distance
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
How to find the minimum of a list - Transformer Edition
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Alignment Jam : Gradient-based Interpretability of Quantum-inspired neural networks
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
War is 15% conflic, 15% DragonMagazine
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Interpreting Catastrophic Failure Modes in OpenAIβs Whisper
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Algorithmic bit-wise boolean task on a transformer
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Interpretability at a glance
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Neurons and Attention Heads that Look for Sentence Structure in GPT2
LEAH Hackathon Site
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Sparsity Lens
ENS Interpretability Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Observing and Validating Induction heads in SOLU-8l-old
Global event
Nov 2022
4th π
3rd π
2nd π
1st π
Interpretability
Private
Info hazard
Go to project page
Regularly Oversimplifying Neural Networks
Online & Global Hackathon
Nov 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Simulating an Alien
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Wording influences truthfulness
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Reasoning with Chain of Thought
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Reducing hindsight neglect with "Let's think step by step"
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
All Fish are Trees
Global event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Soliciting criminal advice from LLMs
Apart's Aarhus event
Oct 2022
4th π
3rd π
2nd π
1st π
LLM Hackathon
Private
Info hazard
Go to project page
Agreeableness vs. Truthfulness
Apart's Aarhus event
Oct 2022