Join by signing up below and be a part of our community server.
← Alignment Jams

The Agency Foundations Challenge

--
Bob Roberts
Filip Sondej
Jord
Vansh Gehlot
Piotr Zaborszczyk
Ibrahim Ola Garba
Slava Meriton
Codruta Lugoj
Catherine Brewer
Marco Bazzani
Sai Joseph
Vanshi Puri
Anton Zheltoukhov
Anastasia
Tanya Grinyuk
Tanya Grinyuk
Mary Osuka
Aksinya
ram bharadwaj
Irina
Helios Lyons
Roman M
Aleksei
Andrey
Yury Orlovskiy
Giorgio Michele Scolozzi
Chris
Ibrahim
Uzay
Nick Osipov
Mikhail
Corey Morris
Parameswaran Kamalaruban
Oleg L.
Ilham ElAalami
aman jaiswal
Jord
Yijin Hua
Carson Ellis
Natalia
Julia Karbing
Luke Frymire
Tu Trinh
Thiago Henrique Silva
Carolina
Simon Lermen
Luigi
Luigi
Nikita Menon
Mateusz Bagiński
Vladislav Bargatin
Danish Raza
Thomas James Ringstrom
Vansh
Sergej Maul
Chris
Juliette Culver
Nisan Stiennon
Logan Woods
Jonas Hallgren
Ariel Kwiatkowski
Heramb Podar
Keep name private
Mike
Kanishk Garg
Aishwarya Gurung
Stefano Schmidt
Kushal Thaman
Roman Hauksson
Indra Gesink
Tomas Dulka
Jeanne
Ashwini kumar Pal
Michael Andrzejewski
P
Ziyue Wang
Jan Provaznik
Jason Hoelscher-Obermaier
Benjamin Sturgeon
Michał Kubiak
Aishwarya Gurung
Nicole Phan
Javier Prieto
Igor Krawczuk
Harry Powell
Matija Franklin
Kerem
Danish Raza
Jobst Heitzig
Gerson foks
Marco Bazzani
Mateusz Bagiński
Konstantin Gulin
suvjectibity
ANDREW KOH
Abhimanyu
Tereza Okalova
Aishwarya Gurung
Edward
Tassilo Neubauer
Tassilo Neubauer
Catalin Mitelut
Brady
Clem
Jonas Kgomo
Admin
Esben Kran
Signups
--
Uncertainty about value naturally leads to empowerment
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
Discovering Agency Features as Latent Space Directions in LLMs via SVD
ILLUSION OF CONTROL
Comparing truthful reporting, intent alignment, agency preservation and value identification
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
Agency as Shanon information. Unveiling limitations and common misconceptions
Agency, value and empowerment.
Against Agency
Evaluating Myopia in Large Language Models
Entries
Hosted by Agency Foundations and Catalin Mitelut in collaboration with Apart Research
September 8th to September 24th 2023
Agency challenge ends in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign up

This challenge ran from September 8th to September 24th 2023.

The Challenge is now in progress! Rewatch the keynote if you weren't there for the start and get free replicate.ai credits for your work. See existing resources for AI safety and reinforcement learning along with interpretability for topic 1 through 3. Happy research hacking!

Explore agency foundations research for the development & alignment of AI systems

Ever wondered how human agency, i.e. the capacity to (causally) control the world, will interact with increasingly powerful AI and future AGI systems which may also want to control the world?  Do you question whether AGIs trained to focus on truthfulness or that are "intent aligned" are sufficiently safe?  Us too!

Join us on four research tracks in this two week challenge that kicks off with a hackathon hosted with Alignment Jams. Submit your final projects at the end of the two weeks on this page.

We are developing an agency foundations paradigm to start researching agency in AI-human interactions - and are kicking off our work with a hackathon. We selected a few topics to start, such as figuring out how to algorithmically describe agency "preservation", mechanistically interpret how neural networks represent agents and their capacities, and describe challenges in the governance of agency-preserving AI systems. More information of our conceptual goals for this hackathon are provided here: https://www.agencyfoundations.ai/hackathon.

  • Start:  Introductory talks - September 8th: 18:00-19:30 CET.
  • End: Submission deadline - September 24th night (any timezone).
  • Location: Online/Remote
  • Topics(1) mechanistic interpretability; (2) RL/IRL; (3) game theory; (4) conceptual/governance (see here for more details)
  • Prizes: US$10,000 ($2,500 in each category)
  • Format: online submissions.

More details about specific prizes, categories and additional information to be posted the 1st week of  September.

Sign up below to be notified before the kickoff! Read up on the schedule, see instructions for how to participate, and inspiration on the agency foundations website.

Jump on the Discord and ask your preliminary questions in the #❓| questions channel!

Ibrahim
,
Simon Lermen
,
Aishwarya Gurung
,
and others!
You have successfully been signed up! You should receive an email with further information.
Oops! Something went wrong while submitting the form.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission page (available when the hackathon starts). Each project consists of multiple parts: 1) The PDF report, 2) a maximum 10-minute video overview (optional), 3) title, summary, and descriptions.

You are allowed to think about your project and engage with the starter resources before the hackathon starts but your core research work should happen during the duration of the hackathon.

Tentative schedule

Subscribe to the calendar.

  • Friday September 8, 16:00 UTC: Keynote talk by Catalin Mitelut inspire your projects and provide an introduction to the topic. Tim Franzmeyer will present his work on altruistic RL agents. Esben Kran will also give a short overview of the logistics.
  • Saturday and Sunday 14:00 UTC: Project discussion sessions on the Discord server.
  • Friday September 22nd 14:00 UTC: A discussion and short talk.
  • Sunday September 24th night (all time zones): Submission deadline!

Past experiences

See what our great hackathon participants have said
Jason Hoelscher-Obermaier
Interpretability hackathon
The hackathon was a really great way to try out research on AI interpretability and getting in touch with other people working on this. The input, resources and feedback provided by the team organizers and in particular by Neel Nanda were super helpful and very motivating!
Luca De Leo
AI Trends hackathon
I found the hackaton very cool, I think it lowered my hesitance in participating in stuff like this in the future significantly. A whole bunch of lessons learned and Jaime and Pablo were very kind and helpful through the whole process.

Alejandro González
Interpretabiity hackathon
I was not that interested in AI safety and didn't know that much about machine learning before, but I heard from this hackathon thanks to a friend, and I don't regret participating! I've learned a ton, and it was a refreshing weekend for me.
Alex Foote
Interpretability hackathon
A great experience! A fun and welcoming event with some really useful resources for starting to do interpretability research. And a lot of interesting projects to explore at the end!
Sam Glendenning
Interpretability hackathon
Was great to hear directly from accomplished AI safety researchers and try investigating some of the questions they thought were high impact.

Keynote speakers

Catalin Mitelut

Postdoc at University of Basel and NYU studying neuroscience and human behavior manipulation by AI
Keynote speaker & organizer

Tim Franzmeyer

PhD student in cooperative AI and reinforcement learning under with Philip Torr and others at Oxford University
Keynote speaker & judge

Esben Kran

Founder and CEO of Apart Research and previously lead data scientist and brain-computer interface researcher
Keynote speaker & organizer

Judges

Tushant Jha (TJ)

Director of Research at the AI Objectives Institute focused on AI strategy for agency amplification
Judge

Geoffrey Miller

Associate Professor at the University ofNew Mexico on evolutionary psychology
Judge

Konrad Seifert

Co-CEO of the Simon Institute for Long-Term Governance
Judge

Erik Jenner

PhD student advised by Stuart Russell at CHAI with a research focus on alignment
Judge

Ben Smith

Researching multi-objective reinforcement learning to value-align AI
Judge

Catalin Mitelut

Postdoc at University of Basel and NYU studying neuroscience and human behavior manipulation by AI
Keynote speaker, judge & organizer

Tim Franzmeyer

PhD student in cooperative AI and reinforcement learning under with Philip Torr and others at Oxford University
Keynote speaker & judge

Esben Kran

Founder and CEO of Apart Research and previously lead data scientist and brain-computer interface researcher
Judge

Secret governance judge

Requested to be anonymous now.
Judge

More to be announced!

Starter resources

Check out the core starter resources that helps you get started with your research as quickly as possible! The Colab links will be updated before the kickoff.

Research project ideas

Get inspired for your own projects with these ideas developed during the reading groups! Go to the Resources tab to engage more with the topic.

Registered jam sites

These locations were hosted on the weekend of the 8th to the 10th of September for the kickoff hackathon.

AI Agency Foundations Hackathon
Research how AI agents work and how to keep them safe over a weekend! Currently only for students at the University of Texas at Dallas. goo.gl/maps/9M9r2q5wNEjYpTba8
Visit event page
UTDesign Makerspace
Agency Foundations Alignment Jam London
RSVP via Facebook or post in "uk" channel on Alignment Jam website for exact details
Visit event page
LEAH Coworking Space
AI agency hackathon
Join us at Fixed Point in Prague - Vinohrady, Koperníkova 6 for a weekend of research hacking to understand AI agency!
Visit event page
Prague Fixed Point
Global Agency Hackathon
Join everyone internationally on our Discord and get together in teams to solve the most important problems of agency preservation!
Visit event page
Alignment Jam Discord
Kraków Jam Site
Contact @matthewbaggins on Discord or bagginsmatthew@gmail.com. Hosted at ul. Celna 6/9, the office of The Optimum Pareto Foundation.
Visit event page
Kraków, Poland
Alignment Hackatons
A local space near Lomonosov Moscow State University. Location: Russia, Moscow, Lomonosovskiy Prospekt, 25k3
Visit event page
Turing Coffee Machine (EA Moscow)
Multi-Agent Safety Hackathon Moscow
A local space near Lomonosov Moscow State University. Location: Russia, Moscow, Lomonosovskiy Prospekt, 25k3
Visit event page
Turing Coffee Machine (EA Moscow)

Register your own site

The in-person hubs for the Alignment Jams are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research and engineering community. Read more about organizing and use the media below to set up your event.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

Submit your project

Use this template for the report submission. As you create your project presentations, upload your slides here, too. Make a recording of your slideshow or project with the recording capability of e.g. Keynote, Powerpoint, and Slides (using Vimeo).

For technical submissions within categories 1 through 3, you will have a maximum of 6 pages excluding limitations, appendix and references. For conceptual and governance submissions, you have a maximum of 10 pages excluding limitations, appendix and references.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.
Oops! Something went wrong while submitting the form.
project image for Evaluating Myopia in Large Language Models
Authors & links disabled for anonymous reviews
Evaluating Myopia in Large Language Models
Empirically investigating the myopia or lack thereof of Llama models
Interpretability of Agency
Read project
project image for In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
Authors & links disabled for anonymous reviews
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
Proposing a study to evaluate agency loss in human players pitted against their own memetic models fine-tuned in a feedback loop, inspired by recommender systems influencing information consumption patterns in social media.
Conceptual / Governance of Agency
Read project
project image for Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
Authors & links disabled for anonymous reviews
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This paper investigates several techniques to implement altruistic RL while preserving agency. Using a two-player grid game, we train a helper agent to support a lead agent in achieving their goals. By training the helper without showing them the goal and resampling the goals to rebalance for unequal value distributions, we demonstrate that helpers can act altruistically without observing the goals of the lead. We also initiate exploration of a technique to encourage corrigibility and respect for personal agency by resampling the leads values during training time, and point towards how these techniques could be used to translate into real-world situations through meta-learning.
Agency-Preserving Reinforcement Learning
Read project
project image for Against Agency
Authors & links disabled for anonymous reviews
Against Agency
I argue that agency is overrated when thinking about good futures, and that longtermist AI governance should instead focus on preserving and promoting human autonomy.
Conceptual / Governance of Agency
Read project
project image for Agency as Shanon information. Unveiling limitations and common misconceptions
Authors & links disabled for anonymous reviews
Agency as Shanon information. Unveiling limitations and common misconceptions
We consider similarities with Shanon infromation, enthropy and agency. We argue that agency is agent-independent and observer-dependent property. We discuss agency in the context of empowerment and argue that AI safety shall be concerned with both. We also provide the connection between quantifiable agency and agency as described in social sciences.
Conceptual / Governance of Agency
Read project
project image for Discovering Agency Features as Latent Space Directions in LLMs via SVD
Authors & links disabled for anonymous reviews
Discovering Agency Features as Latent Space Directions in LLMs via SVD
Understanding the capacity of large language models to recognize agency in other entities is an important research endeavor in AI Safety. In this work, we adapt techniques from a previous study to tackle this problem on GPT-2 Medium. We utilize Singular Value Decomposition to identify interpretable feature directions, and use GPT-4 to automatically determine if these directions correspond to agency concepts. Our experiments show evidence suggesting that GPT-2 Medium contains concepts associating actions on agents with changes in their state of being.
Interpretability of Agency
Read project
project image for Comparing truthful reporting, intent alignment, agency preservation and value identification
Authors & links disabled for anonymous reviews
Comparing truthful reporting, intent alignment, agency preservation and value identification
A universal approach can be created artificially - by gathering qualities of different approaches from this list and else.
Agency-Preserving Reinforcement Learning
Read project
project image for Uncertainty about value naturally leads to empowerment
Authors & links disabled for anonymous reviews
Uncertainty about value naturally leads to empowerment
I discuss some problems with measuring empowerment by the “number of reachable states''. Then propose a more robust measure based on uncertainty about ultimate value. I hope that towards the end you will find that new measure obviously natural. I also provide a Gymnasium environment well suited to experimenting with optionality and value uncertainty.
Agency-Preserving Reinforcement Learning
Read project
project image for Agency, value and empowerment.
Authors & links disabled for anonymous reviews
Agency, value and empowerment.
Our project involves building on the paper "LEARNING ALTRUISTIC BEHAVIOURS IN REINFORCEMENT LEARNING WITHOUT EXTERNAL REWARDS" by Franzmeyer et al. firstly by trying to replicate the paper and then advancing research in this direction by including measures of the value of states for the leader agent in their empowerment calculations.
Agency-Preserving Reinforcement Learning
Read project
project image for ILLUSION OF CONTROL
Authors & links disabled for anonymous reviews
ILLUSION OF CONTROL
This paper looks at the illusion of control by individuals.AI has the capability to deceive human beings in order to evade safety nets.The covertness with which the AI interferes with decision making creates an illusion of control by human beings.The paper has stated the different deceptive measures that AI incorporates and possible measures to ensure governance of AI.
Conceptual / Governance of Agency
Read project

Send in pictures of you having fun hacking away!

We love to see the community flourish and it's always great to see any pictures you're willing to share uploaded here.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you for sharing !
Oops! Something went wrong while submitting the form.
Q&A from the great first talk with Catalin!
Q&A from the great first talk with Catalin!