This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Evals
Accepted at the 
Evals
 research sprint on 
August 21, 2023

SADDER - Situational Awareness Dataset for Detecting Extreme Risks

We create a benchmark for detecting two types of situational awareness (train/test distinguishing ability, and ability to reason about how it can and can't influence the world) that we believe are important for assessing threats from advanced AI systems, and measure the performance of several LLMs on this (GPT-4, Claude, and several GPT-3.5 variants).

By 
Rudolf Laine, Alex Meinke
🏆 
4th place
3rd place
2nd place
1st place
 by peer review