Join us for this month's Alignment Jam to investigate how we can both formally and informally verify the safety of machine learning systems!

You have successfully been signed up! You should receive an email with further information.

Oops! Something went wrong while submitting the form.

Spend a weekend of intense and focused research work towards validating safety of neural networks in various domains (e.g. language) using adversarial attack / defense and other ML safety research methods. Re-watch the intro talk here.

Schedule & logistics

Here, you can see the calendar and schedule. All times are in UTC+1 (UK Summer Time). You can subscribe to the calendar to see the event timings in your time zone.

Friday | 26 May

18:00-19:00

[CHANGED] Introduction talk Introduction to the topic and the logistics of the hackathon itself Keynote with an expert in AI safety along with logistical information for your participation hackathon

19:30-20:00

Team formation Join us in GatherTown for the virtual team formation

20:00

Hacking away Hack away on the important problems in safety verification of neural networks! Use the starter resources and the project ideas on this page

Saturday | 27 May

Morning

Hacking away Continue your research projects

17:00-18:00

Project discussion with Lauro Langosco Lauro will join us in the Discord server to chat with you about your projects

Sunday | 28 May

Morning

Hacking away Continue your research projects

12:00-13:00

[CANCELED DUE TO ILLNESS] Keynote talk Talk by a Joar Skalse

Evening

Local presentations of your projects Present your projects to each other in the local jam sites. These presentations can be 5-15 minutes and can be reused for Wednesday

Wednesday | 31 May

19:00-20:30

Project presentations Project presentations hosted on the Alignment Jam Discord server

Monday | 5 June

Judge feedback is in! All projects will have received feedback from the judges or the Alignment Jam team and projects may be continued into the publication program

Past experiences

See what our great hackathon participants have said

Jason Hoelscher-Obermaier

Interpretability hackathon

The hackathon was a really great way to try out research on AI interpretability and getting in touch with other people working on this. The input, resources and feedback provided by the team organizers and in particular by Neel Nanda were super helpful and very motivating!

Luca De Leo

AI Trends hackathon

I found the hackaton very cool, I think it lowered my hesitance in participating in stuff like this in the future significantly. A whole bunch of lessons learned and Jaime and Pablo were very kind and helpful through the whole process.

Alejandro González

Interpretabiity hackathon

I was not that interested in AI safety and didn't know that much about machine learning before, but I heard from this hackathon thanks to a friend, and I don't regret participating! I've learned a ton, and it was a refreshing weekend for me.

Alex Foote

Interpretability hackathon

A great experience! A fun and welcoming event with some really useful resources for starting to do interpretability research. And a lot of interesting projects to explore at the end!

Sam Glendenning

Interpretability hackathon

Was great to hear directly from accomplished AI safety researchers and try investigating some of the questions they thought were high impact.

Intro talk

The collaborators who will join us for this hackathon.

Joar Skalse

PhD researcher at the Krueger Lab in Cambridge

[Canceled] Speaker & Judge

Lauro Langosco

PhD student at the Krueger Lab in Cambridge

Project discussion host

Fazl Barez

Research lead at Apart Research

Judge

Alexander Briand

Researcher at Arb Research

Judge

More judges will join us to provide feedback.

Readings

Read up on the topic before we start! The reading group will work through these materials together up to the kickoff.

Join the reading group

Verifiable robustness introduction (RTAI L1)

Lecture 1

Adversarial attacks (RTAI L2)

Lecture 2

Safety Verification of Deep Neural Networks

Chapter / article

Jailbreakchat.com: Explore how people break ChatGPT

Field work

Papers in textual adversarial attacks and defenses

Papers

Starter resources

Check out the core starter resources that helps you get started with your research as quickly as possible! Will be shared before the kickoff.

Tutorial for α,β-CROWN Robustness Verification

α,β-CROWN won the competition for robustness verification (VNN-COMP'22). Also see this other tutorial.

The basic idea behind α,β-CROWN is to use efficient bound propagation for verification tasks based on Automatic Linear Relaxation based Perturbation Analysis for Neural Networks (LiRPA). The code of LiRPA can be found on github.

Adversarial attacks against large language models

The TextAttack library 🐙 is a set of tools for creating adversarial examples for large language models. See the documentation.

Maybe you wish to use it to compete in the HackAPrompt competition as well which has extended its deadline until the 4th of June!

Watch a WANDB talk about the library.

Auto-LiRPA tutorial

LiRPA is an important tool in robustness verification and certified adversarial defense and this tutorial takes you through the basics.

See the documentation and their introductory video to the tool.

Running TransformerLens to easily analyze activations in language models

This demo notebook goes into depth on how to use the TransformerLens library. TransformerLens is very useful for mechanistic investigations into Transformers and can be very useful for understanding activations. Core features:

Loading and running models
Saving activations from a specific example run
Using the unique Hooks functionality to intervene on and access activations

Read more on the Github page and see the Python package on PyPi.

System 3 continuation

The System 3 paper by our research lead Fazl Barez uses an approach to input symbolic logic for safety in natural environments into neural networks. This Colab notebook is an experimental setting and we recommend you read the paper first.

Research project ideas

Get inspired for your own projects with these ideas developed during the reading groups! Go to the Resources tab to engage more with the topic.