The hackathon is happening right now! Join by signing up below and be a part of our community server.
← Alignment Jams

AI Model Evaluations Hackathon

--
George
jason.hoelscherobermaier@gmail.com
Lucie Philippon
Manuel Hettich
Heramb Podar
Zachary Farndon
Michael Batavia
Matthew Lutz
Alice Rigg
Matteo Eleouet
Tassilo Neubauer
Akash Kundu
Rafael
Ariel Gil
Troels Nielsen
Paula Petcu
Naureen Nayyar
Siddhant Arora
Aishwarya Gurung
Finn Metz
Cenny Wenner
Tetra Jones
Jannes Elstner
Tetra Jones
Liam Alexander
Jaime
Henri van Soest
David Noever
Ben Xu
Constantin Weisser
Evan Harris
Tomas Dulka
Jacob Haimes
Paolo Bova
ram bharadwaj
Priyanka Tiwari
Jiaye Liu
Anasta
Brendan
Jord
Quentin
Corey Morris
Rebecca
Thee Ho
Jason Hoelscher-Obermaier
Johnny Lin
Mike
Esben Kran
Signups
--
Cross-Lingual Generalizability of the SADDER Benchmark
Detecting Implicit Gaming through Retrospective Evaluation Sets
Multifaceted Benchmarking
Towards High-Quality Model-Written Evaluations
Visual Prompt Injection Detection
Entries
Friday November 24th 19:00 UTC to Sunday November 26th 2023
Hackathon starts in
--
Days
--
Hours
--
Minutes
--
Seconds
Sign up

This hackathon ran from November 24th to November 26th 2023.

Join the fun 3 day research sprint!

Governments are at a loss as to the risks of AI. Many organizations are interested in understanding where they can deploy AI safely and where they cannot. Come join us for this weekend's effort to uncover the risks and design evaluations for understanding dangerous capabilities of language models!

Watch the keynote live and recorded below

We start the weekend with a livestreamed and recorded keynote talk introducing the topic and introducing the schedule for the weekend. Saturday and Sunday have mentoring sessions (office hours) where we encourage you to show up on the Discord. Wednesday the 29th, we host project presentations where top projects will showcase their results.

Get an introduction to doing evaluations from Apollo Research and see ideas under the "Ideas" tab for what you can spend your weekend on. See more inspiration further down the page along with the judging criteria.

Thank you to our gold sponsor Apollo Research.

Esben Kran
,
Ariel Gil
,
Jason Hoelscher-Obermaier
,
and others!
You have successfully been signed up! You should receive an email with further information.
Oops! Something went wrong while submitting the form.

There are no requirements for you to join but we recommend that you read up on the topic in the Inspiration and resources section further down.

Alignment Jam hackathons

The Alignment Jam hackathons are research sprints within topics of AI safety and security where you spend a weekend with fellow engaged researchers to dive into exciting and important research problems. Join the Discord where most communication and talks will happen and visit the front page.

How to get started with evaluations research

Read the introductory guide by Apollo Research on how to do evaluations research at the link here. You can both find ideas for your evaluations research in the "Ideas" tab and in their live updated document here.

Methodology in the field of evaluations

We are extra interested in methodologies of model evaluation since it is an important question. A few of the main issues with existing AI model evaluation methods include:

  • Static benchmarks are becoming saturated, as state-of-the-art models are already performing very well on many standard tests. This makes it hard to distinguish between more and less capable systems.
  • Benchmarks are often distant from real-world uses, so high performance may not translate to usefulness.
  • Internal testing by companies can be more realistic, but results are hard to compare across organizations.
  • Measuring real-world usefulness (e.g. developer productivity) requires carefully designing proxies based on the specifics of the task, which is challenging.
  • Eliciting capabilities is an art more than a science right now. What works for one model may not work for another.
  • Fine-tuning models for evaluations introduce questions about how much training to provide and how to accurately evaluate performance on the fine-tuned task.
  • Testing for model alignment is extremely difficult with little consensus on good approaches so far. Potential methods like red teaming have limitations.
  • Evaluating oversight methods for more advanced models than we have today remains very challenging. Approaches like sandwiching provide some help but may not fully generalize.

In summary, existing methods tend to be static, simplistic, non-comparable, and frequently disconnected from real uses.

So, how can we advance model evaluation methods? That's what we want to find out with you!

Resources

Code bases

  • ChatArena. A framework for building environments for interactions between language models.
  • Welfare Diplomacy. A variant of the Diplomacy environment designed to incentivize and allow for better measurement of cooperative behavior. Includes scaffolding for language model-based agents.
  • Evalugator. A library for running LLM evals based on OpenAI's evals repository.

Rules

You will participate in teams of 1-5 people and submit a project on the entry submission tab. Each project is submitted with: The PDF report and your title, summary, and description. There will be a team-making event right after the keynote for anyone who is missing a team.

You are allowed to think about your project before the hackathon starts but your core research work should happen in the duration of the hackathon.

Evaluation criteria

Your evaluations reports will of course be evaluated as well!

  • Innovative Methodology: Evaluations is hampered by many methods being tried within behavioral and some interpretability domains, however we have yet to see principled methods for evaluation. With inspiration from other fields like experimental physics, cognitive science, and governance, you might be able to invent new methodologies that more accurately captures dangerous capabilities!
  • Compelling Narrative: Demos should ideally capture important themes, be accessible to researchers, and present a compelling narrative. For example, when researchers were able to prompt GPT-4 to solve a captcha using a hired human (see Section 2.9 of the GPT-4 system card).
  • Easily Replicable: Demos should ideally be clear, well-documented, and easily replicable. For instance, is the code openly available or is it within a Google Colab that we can run through without problems?

Schedule

Subscribe to the calendar.

  • Friday 17:00 UTC: Keynote talk with Marius Hobbhahn
  • Saturday afternoon: Office hour and project discussions
  • Saturday evening: Two inspiring talks with a 15 minute break in-between by Ella Guest on AI Governance and Rudolf Laine on Situational Awareness Benchmarking
  • Sunday afternoon: Office hour and project discussions
  • Sunday evening: Virtual mixer
  • Monday morning: Submission deadline

Keynote speakers

Marius Hobbhahn

Co-founder and director of Apollo Research.
Keynote speaker

Ella Guest

AI Policy Fellow at the RAND Corporation
Speaker

Rudolf Laine

MATS scholar and ML researcher
Speaker

Esben Kran

Founder and co-director of Apart Research
Keynote speaker & co-organizer

Mentors & Judges

Ollie Jaffe

Dangerous Capabilities Evaluations Contractor at OpenAI
Mentor & Judge

Jacob Pfau

PhD student at the NYU Alignment Research Group
Judge

Marius Hobbhahn

Co-founder and director of Apollo Research.
Judge

Rudolf Laine

MATS scholar and ML researcher
Mentor & Judge

Jason Hoelscher-Obermaier

Research Lead at Apart Research, PhD in experimental physics, previously ARC, Iris.ai
Mentor & Judge

Fazl Barez

Research Director at Apart Research, PhD in robotics and control
Judge

Esben Kran

Founder and co-director of Apart Research
Mentor & Judge

More to be announced!

Registered jam sites

Alignment Hackatons
A local space near Lomonosov Moscow State University. Location: Russia, Moscow, Lomonosovskiy Prospekt, 25k3
Visit event page
Turing Coffee Machine (EA Moscow)
Model Evaluations Hackathon Hub at EPFL
We'll work from the CM120 during the day
Evaluate LLM risk with EffiScience
Details of the location to be communicated by email after registering
Visit event page
ENS Ulm, Paris
👾 Weekend Hackathon to Attack AI Models
Win up to 7000 DKK as a team by finding the highest-risk cases of modern advanced AI! Compete with people in London and across the world with the AI Model Evaluations Hackathon.
LISA Model Evaluations Hackathon
LISA, Techspace, 38-40 Commercial Road, London, E1 1LN
Visit event page
Whitechapel, London

Register your own site

The in-person hubs for the Alignment Jams are run by passionate individuals just like you! We organize the schedule, speakers, and starter templates, and you can focus on engaging your local research and engineering community. Read more about organizing.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you! Your submission has been received! Your event will show up on this page.
Oops! Something went wrong while submitting the form.

Submit your project

Use the project submission template for your PDF submission. Optionally, you may record a presentation documenting your research. U recording capability of e.g. Keynote, Powerpoint, and Slides (using Vimeo).

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
You have successfully submitted! You should receive an email and your project should appear here. If not, contact operations@apartresearch.com.
Oops! Something went wrong while submitting the form.
Detecting Implicit Gaming through Retrospective Evaluation Sets
Jacob Haimes, Lucie Philippon, Alice Rigg, Cenny Wenner
Gamers
Visual Prompt Injection Detection
Yoann Poupart, Imene Kerboua
Cross-Lingual Generalizability of the SADDER Benchmark
Siddhant Arora, Jord Nguyen, Akash Kundu
Towards High-Quality Model-Written Evaluations
Jannes Elstner, Jaime Raldua Veuthey
Multifaceted Benchmarking
Eduardo Neville, George Golynskyi, Tetra Jones
Multifaceted Benchmarking

Send in pictures of you having fun hacking away!

We love to see the community flourish and it's always great to see any pictures you're willing to share uploaded here.

Uploading...
fileuploaded.jpg
Upload failed. Max size for files is 10 MB.
Thank you for sharing !
Oops! Something went wrong while submitting the form.
Office hours on Saturday with Ollie!
Office hours on Saturday with Ollie!