Find Dangerous Multi-Agent Failures

As AI systems proliferate and become increasingly agent-like, they will interact with each other and with humans in new ways. These new multi-agent systems will create entirely new risk surfaces. Follow along or rewatch the keynote livestream below. You can see the keynote. slideshow here along with the logistics keynote here.

During this hackathon, you will search for the most concerning failures specific to systems of multiple AIs. Potential failures involve tragedies of the commons, destructive conflict, collusion, and destabilizing emergent dynamics.

As part of this hackathon, you will have the chance to become co-author on a large report on multi-agent risk with the Cooperative AI Foundation and more than 35 co-authors from institutions including UC Berkeley, Oxford, Cambridge, Harvard, and DeepMind. If your project submission of multi-agent failure demonstrations fits into the final report, you will be included as a co-author.

Several senior co-authors have already suggested a range of ideas for possible failure mode demonstrations that are currently lacking concrete implementations (see the "Ideas" tab). Figuring out how and whether such failure modes are possible is an easy way to get started on this challenge and has the advantage of already being linked to content in the report, but we also welcome your own ideas! The Cooperative AI Foundation, Apart Research, and their colleagues will be on hand to provide guidance and collaboration where possible.

You have successfully been signed up! You should receive an email with further information.

Oops! Something went wrong while submitting the form.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down. This topic is in reality quite open but the research field is mostly computer science and having a background in programming and machine learning definitely helps. We're excited to see you!

Get an overview of the hackathon and specific links in the slideshow below (this is also presented in the livestream).

Logistics presentation by Esben Kran. Download the slideshow instead.

Alignment Jam hackathons

Join us in this iteration of the Alignment Jam research hackathons to spend a weekend with fellow engaged researchers and engineers in machine learning on diving into this exciting and fast-moving field! Join the Discord where all communication will happen.

Inspiration and resources

Overview papers

TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI (Critch & Russell, 2023). Several of the “Stories” may suggest directions for demonstrations.
Open Problems in Cooperative AI (Dafoe et al., 2020). You might consider looking at failures of each of the “cooperative capabilities” listed in the paper, or at potential downsides from Cooperative AI.
ARCHES (AI Research Considerations for Human Existential Safety) paper (Critch & Krueger, 2020): See section 6-9 for inspiration. Read this related paper by David Manheim as well (Manheim, 2019).

Empirical papers

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark (Pan et al., 2023). Uses LMs to evaluate the extent to which LMs behave ethically, along a number of different dimensions, in text adventure games. You could consider a similar method for evaluating properties that might undermine social welfare in interactions between several agents.
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback (Fu et al., 2023). Looks at LMs playing negotiation games, and learning from experience via self-critique, and finds that LMs become harder bargainers after learning from experience. You could consider similar methods for studying whether desirable properties are stable under learning under multi-agent learning dynamics.
Artificial Intelligence, Algorithmic Pricing, and Collusion (Calvano et al., 2020). Investigates cooperation and collusion between Q-learning agents in an economic oligopoly game setting. Consider how AI systems working together might be negative for human welfare.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas (Leibo et al., 2017). Shows how conflict can arise in mixed-motive settings. The lead author Joel Leibo is a speaker at the hackathon.

Code bases

ChatArena. A framework for building environments for interactions between language models.
Welfare Diplomacy. A variant of the Diplomacy environment designed to incentivize and allow for better measurement of cooperative behavior. Includes scaffolding for language model-based agents.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

The evaluation reports will of course be evaluated as well! We will use multiple criteria:

Compelling Narrative: Demos should ideally capture important themes, be accessible to policymakers and laypeople, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
Focussed on Multiple Frontier Systems: To set these demos apart from other work, we are especially interested in concrete examples that are only/especially worrisome in multi-agent settings, and that are relevant to frontier models. We will be awarding bonus points for originality!
Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

Schedule

Subscribe to the calendar.

Friday 17:00 UTC: Keynote talk with Lewis Hammond and Esben Kran
Saturday afternoon: Project discussions with Lewis Hammond
Saturday evening: Two technical talks from Joel Leibo, Christian Schroeder de Witt and Sasha Vezhnevets with a 15 minute break in-between
Sunday afternoon: Project discussions with Lewis Hammond
Sunday evening: Ending session with Lewis Hammond
Monday morning: Submission deadline

Keynote speakers

Lewis Hammond

Acting Executive Director of the Cooperative AI Foundation, DPhil Candidate at the University of Oxford

Keynote speaker & co-organizer

Christian Schroeder de Witt

Postdoc at the FLAIR group in University of Oxford who helped establish the field of deep multi-agent reinforcement learning

Speaker

Sasha Vezhnevets

Researcher at DeepMind and previously at University of Edinburgh

Speaker

Joel Leibo

Senior staff scientist at DeepMind and behind Melting Pot, a multi-agent reinforcement learning evaluation environment

Speaker

Esben Kran

Founder and CEO of Apart Research and previously lead data scientist and brain-computer interface researcher

Keynote speaker & co-organizer

Judges

Lewis Hammond

Executive Director of the Cooperative AI Foundation

Judge

Jesse Clifton

Executive Director at Center on Long-Term Risk and research analyst with the Cooperative AI Foundation

Judge

Akbir Khan

Research analyst at the Cooperative AI Foundation and UCL PhD in scalable oversight

Judge

Alan Chan

PhD student at MILA, Université de Montréal

Judge

Jason Hoelscher-Obermaier

Research Lead at Apart Research, PhD in experimental physics, previously ARC, Iris.ai

Judge

Fazl Barez

Research Director at Apart Research, PhD in robotics and control

Judge

More to be announced!

Registered jam sites

Multi-Agent Safety Hackathon at Center on Long-Term Risk (London)

CLR offices are hosting a hackathon site from 10am-10pm UK time on Saturday and Sunday.

Visit event page

[CANCELED] Center on Long-Term Risk

Multi-Agent Safety Hackathon with EffiSciences (Paris)

The French AIS hub host the hackathon in Paris, 45 rue d'Ulm (moving rooms during the week-end)

Visit event page

ENS Ulm, Paris

Stanford AI Alignment Multi-Agent Hackathon

Stanford AI Alignment is hosting a local site in the Gates CS building all day Saturday and Sunday! Watch the SAIA email list/Slack if you're already on it, or contact gmukobi@stanford.edu to join!

Visit event page

Stanford University

Multi-Agent Safety Hackathon Hub at EPFL

On campus - Exact location TBD but probably Rolex or DLL

Visit event page

EPFL

Copenhagen Multi-Agent Safety Hackathon

Join us in Copenhagen to contribute to exciting research within multi-agent safety research. We will provide food and drinks for your research journey!

Visit event page

Copenhagen, the Apart Offices

Thank you! Your submission has been received! Your event will show up on this page.

Oops! Something went wrong while submitting the form.

Submit your project

Use the project submission template for your PDF submission. Make a recording of your slideshow or project with the recording capability of e.g. Keynote, Powerpoint, and Slides (using Vimeo).

You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.

Oops! Something went wrong while submitting the form.

4th 🏆

3rd 🏆

2nd 🏆

1st 🏆

Multi-Agent Safety Hackathon

Find Dangerous Multi-Agent Failures

Alignment Jam hackathons

Inspiration and resources

Rules

Evaluation criteria

Schedule

Keynote speakers

Lewis Hammond

Christian Schroeder de Witt

Sasha Vezhnevets

Joel Leibo

Esben Kran

Judges

Lewis Hammond

Jesse Clifton

Akbir Khan

Alan Chan

Jason Hoelscher-Obermaier

Fazl Barez

More to be announced!

Registered jam sites

Register your own site

Submit your project

Send in pictures of you having fun hacking away!

Hackathons

For Organizers

For Participants