The hackathon is happening right now! Join by signing up below and be a part of our community server.
← Alignment Jams

Multi-Agent Safety Hackathon

--
Mark Jimenez
Aishwarya Gurung
Abhay Sheshadri
Thorbjørn Wolf
Jakub Smékal
Magnus Jungersen
Shen Zhou Hong
Liza Karmannaya / Tennant
Emily Han
JP Rivera
Jonathan Segefjord
Helios
Gurjeet Jagwani
ram bharadwaj
Kaspar Senft
Jonathan Cook
Ben Greenberg
Gabe Mukobi
Maksim Kukushkin
Amritanshu Prasad
Andy Liu
Miranda Zhang
Vadim
Родион Константинович Иванов
Marcel Hedman
Patricia Paskov
Nina Rimsky
Luc Chartier
Carolyn Qian
Max David Gupta
Brandon Sayler
Riccardo Conci
Aashiq Muhamed
Xiaohu Zhu
Nancirose Piazza
Jack Foxabbott
Léo DANA
Kaleb Crans
Romain Graux
Seth Karten
Aishwarya
Huibin Wang
Igor
Rashmi Jha
Aron Malmborg
Roland Pihlakas
Nikita Menon
Chandler
kanad chakrabarti
Gabe Mukobi
B Caravaggio
Jobst Heitzig
Ariel Kwiatkowski
Jesse Clifton
Esben Kran
Cenny Wenner
Agatha Duzan
ravi
Michael Sipes
Valerio Targon
Abdulazizbek Mamanazarovich Gainazarov
Yerdaulet
Anjali Ragupathi
Sanatbek Zhamurzaev
anushka deshpande
Mark
Zican Wang
Shilong Deng
James Zhou
Aditri Bhagirath
Paolo Bova
Filip Sondej
Narmeen
Alice Rigg
Matthew Lutz
Blake Elias
blake Elias
Gabin
shubhorup biswas
Florian
Xuhui Liu
Ziyan Wang·
Quentin Feuillade--Montixi
Finn Metz
Daniel Vidal
Yudi Zhang
Arathy
Robert Amanfu
Mattéo Papin
Jannik Brinkmann
Peter Potaptchik
Max David Gupta
Tomas Dulka
Rakshit
Mahan Tourkaman
Ishan Gaur
Dhruv Kaul
Shama Rahman
Abdur Raheem Ali
Eshani
Aditya Bansal
Arden
Mary Osuka
Corey Morris
Sahil
Yue
Ted
Sai Konkimalla
Vansh
nev
Rujikorn Charakorn
Anka Reuel
JM Mommessin
Bahrudin Trbalic
Timothy Chan
Ben Xu
Jakub Smekal
Shivam Agarwal
Kia Ashouritaklimi
Lakshmi Chockalingam
Maximilien Dufau
Anand Advani
Tom Shlomi
Kabir Kumar
Matéo PETEL
Tasha Kim
Kaspar
Komal Vij
Kevin Kim
Rossella Sblendido
Davide Locatelli
Domguia
anushka deshpande
Christopher Tan
Paolo Bova
Saanvi Nair
Luyao Zhang
Yoo Eun Kim
Nisan Stiennon
Dr Chris Cook
Ankur Agarwal
Richard Annilo
Ashwin
Danny Fernando Bravo Lopez
Alexander Meinke
Meng
Yali Du
Siavash Ahmadi
Thiru
Mehmet Ismail
Aidan O’Gara
Lucia Cipolina-Kun
Pavel Czempin
Kyle
Anna Wang
Signups
--
EscalAtion: Assessing Multi-Agent Risks in Military Contexts
Second-order Jailbreaks
Do many interacting LLMs perform well in the N-Player Prisoner’s Dilemma Game?
Emergent Deception from Semi-Cooperative Negotiations
Missing Social Instincts in LLMs
LLMs With Knowledge of Jailbreaks Will Use Them
Exploring Failures: Assessing Large Language Model in General Sum Games with Imperfect Information Against Human Norms
Risk assessment through a small-scale simulation of a chemical laboratory.
Cooperative AI is a Double Edged Sword
LLM Collectives in Multi-Round Interactions: Truth or Deception?
Can collusion between advanced AI Agents remain perfectly undetectable?
Balancing Objectives: Ethical Dilemmas and AI's Temptation for Immediate Gains in Team Environments
The Firemaker
Exploring multi-agent interactions in the dollar auction
Jailbreaking is Incentivized in LLM-LLM Interactions
Can Malicious Agents Corrupt the System?
LLM agent topic of conversation can be manipulated by external LLM agent
AI Defect in Low Payoff Multi-Agent Scenarios
Jailbreaking the Overseer
Escalation and stubbornness caused by hallucination
The artificial wolves of Millers Hollow
Entries
Friday September 29th 19:00 UTC to Sunday October 1st 2023
Hackathon starts in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign up

This hackathon ran from September 29th to October 1st 2023.

Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

Kabir Kumar
,
Liza Karmannaya / Tennant
,
Seth Karten
,
and others!
You have successfully been signed up! You should receive an email with further information.
Oops! Something went wrong while submitting the form.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow below (this is also presented in the livestream).

Logistics presentation by Esben Kran. Download the slideshow instead.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Inspiration and resources

Overview papers

Empirical papers

Code bases

  • ChatArena. A framework for building environments for interactions between language models.
  • Welfare Diplomacy. A variant of the Diplomacy environment designed to incentivize and allow for better measurement of cooperative behavior. Includes scaffolding for language model-based agents.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

  • Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
  • Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
  • Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

Schedule

Subscribe to the calendar.

  • Friday 17:00 UTC: Keynote talk with Lewis Hammond and Esben Kran
  • Saturday afternoon: Project discussions with Lewis Hammond
  • Saturday evening: Two technical talks from Joel Leibo, Christian Schroeder de Witt and Sasha Vezhnevets with a 15 minute break in-between
  • Sunday afternoon: Project discussions with Lewis Hammond
  • Sunday evening: Ending session with Lewis Hammond
  • Monday morning: Submission deadline

Keynote speakers

Lewis Hammond

Acting Executive Director of the Cooperative AI Foundation, DPhil Candidate at the University of Oxford
Keynote speaker & co-organizer

Christian Schroeder de Witt

Postdoc at the FLAIR group in University of Oxford who helped establish the field of deep multi-agent reinforcement learning
Speaker

Sasha Vezhnevets

Researcher at DeepMind and previously at University of Edinburgh
Speaker

Joel Leibo

Senior staff scientist at DeepMind and behind Melting Pot, a multi-agent reinforcement learning evaluation environment
Speaker

Esben Kran

Founder and CEO of Apart Research and previously lead data scientist and brain-computer interface researcher
Keynote speaker & co-organizer

Judges

Lewis Hammond

Executive Director of the Cooperative AI Foundation
Judge

Jesse Clifton

Executive Director at Center on Long-Term Risk and research analyst with the Cooperative AI Foundation
Judge

Akbir Khan

Research analyst at the Cooperative AI Foundation and UCL PhD in scalable oversight
Judge

Alan Chan

PhD student at MILA, Université de Montréal
Judge

Jason Hoelscher-Obermaier

Research Lead at Apart Research, PhD in experimental physics, previously ARC, Iris.ai
Judge

Fazl Barez

Research Director at Apart Research, PhD in robotics and control
Judge

More to be announced!

Registered jam sites

Multi-Agent Safety Hackathon at Center on Long-Term Risk (London)
CLR offices are hosting a hackathon site from 10am-10pm UK time on Saturday and Sunday.
Visit event page
[CANCELED] Center on Long-Term Risk
Multi-Agent Safety Hackathon with EffiSciences (Paris)
The French AIS hub host the hackathon in Paris, 45 rue d'Ulm (moving rooms during the week-end)
Visit event page
ENS Ulm, Paris
Stanford AI Alignment Multi-Agent Hackathon
Stanford AI Alignment is hosting a local site in the Gates CS building all day Saturday and Sunday! Watch the SAIA email list/Slack if you're already on it, or contact gmukobi@stanford.edu to join!
Visit event page
Stanford University
Multi-Agent Safety Hackathon Hub at EPFL
On campus - Exact location TBD but probably Rolex or DLL
Copenhagen Multi-Agent Safety Hackathon
Join us in Copenhagen to contribute to exciting research within multi-agent safety research. We will provide food and drinks for your research journey!
Visit event page
Copenhagen, the Apart Offices

Register your own site

The in-person hubs for the Alignment Jams are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research and engineering community. Read more about organizing.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

Submit your project

Use the project submission template for your PDF submission. Make a recording of your slideshow or project with the recording capability of e.g. Keynote, Powerpoint, and Slides (using Vimeo).

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.
Oops! Something went wrong while submitting the form.
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
EscalAtion: Assessing Multi-Agent Risks in Military Contexts
Gabriel Mukobi*, Anka Reuel*, Juan-Pablo Rivera*, Chandler Smith*
EscalAltion
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Jailbreaking the Overseer
Alexander Meinke
AlexM
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
LLMs With Knowledge of Jailbreaks Will Use Them
Jack Foxabbott, Marcel Hedman, Kaspar Senft, Kianoosh Ashouritaklimi
Jailbreakers
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Second-order Jailbreaks
Mikhail Terekhov, Romain Graux, Denis Rosset, Eduardo Neville, Gabin Kolly
Jailbroken
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Exploring multi-agent interactions in the dollar auction
Thomas Broadley, Allison Huang
Thomas and Allison
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
The artificial wolves of Millers Hollow
Dana Léo, Feuillade-Montixi Quentin, Tavernier Florent
Paris-Garou
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Escalation and stubbornness caused by hallucination
Filip Sondej
Team Consciousness
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
AI Defect in Low Payoff Multi-Agent Scenarios
Esben Kran
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
LLM agent topic of conversation can be manipulated by external LLM agent
Magnus Tvede Jungersen
Pico Pizza
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Can Malicious Agents Corrupt the System?
Matthieu David,Maximilien Dufau,Matteo Papin
MA³chiavelli
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Jailbreaking is Incentivized in LLM-LLM Interactions
Abhay Sheshadri, Jannik Brinkmann, Victor Levoso
Shoggoth Psychology
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
The Firemaker
Roland Pihlakas
AIntelope
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Balancing Objectives: Ethical Dilemmas and AI's Temptation for Immediate Gains in Team Environments
Dhruv Kaul
Team Dhruv
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Can collusion between advanced AI Agents remain perfectly undetectable?
Mikhail Baranchuk, Sumeet Motwani, Dr. Christian Schroeder de Witt
Team PerfectCollusion
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
LLM Collectives in Multi-Round Interactions: Truth or Deception?
Paolo Bova, Matthew J. Lutz, Mahan Tourkaman, Anushka Deshpande, Thorbjørn Wolf
Team God Bear
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Cooperative AI is a Double Edged Sword
Aidan O'Gara, Ashwin Balasubramanian
USC AI Safety
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Risk assessment through a small-scale simulation of a chemical laboratory.
Andres M Bran, Bojana Rankovic, Theo Neukomm
CHEVAPI
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Exploring Failures: Assessing Large Language Model in General Sum Games with Imperfect Information Against Human Norms
Ziyan Wang, Shilong Deng, Zijing Shi, Meng Fang, Yali Du
Cooperative AI Lab
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Missing Social Instincts in LLMs
Sumeet
Team LLMs
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Emergent Deception from Semi-Cooperative Negotiations
Blake Elias, Anna Wang, Andy Liu
Godless Bears
Read more
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Late submission
Do many interacting LLMs perform well in the N-Player Prisoner’s Dilemma Game?
Shuqing Shi, Xuhui Liu, Yudi Zhang,Meng Fang, Yali Du
PD's Team
Read more

Send in pictures of you having fun hacking away!

We love to see the community flourish and it's always great to see any pictures you're willing to share uploaded here.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you for sharing !
Oops! Something went wrong while submitting the form.
Saturday hacking away
Saturday hacking away
3 AM at the Friday aftermath!
3 AM at the Friday aftermath!
The Copenhagen office during the intro keynote talk
The Copenhagen office during the intro keynote talk
[admin] preliminary system test before the weekend starts
[admin] preliminary system test before the weekend starts