This work was done during 48 hours by research workshop participants and does not represent the work of Apart Research.
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
Oversight
Private
Info hazard
See web link
See the code
Visit itch.io page
Read PDF
Read PDF

Reverse Word Wizards: Pitting Language Models Against the Art of Reversal

Benchmark to test the capability of models to reverse given strings

Anonymous: Team members hidden

Ingrid Backman, Asta Rassmussen, Klara Nielsen

The Circuit Wizards

Reverse Word Wizards: Pitting Language Models Against the Art of Reversal
View the video presentation:

Download instead.

Download instead.

Hackathon

ScaleOversight

Sunday, February 12, 2023
Hackathon

ScaleOversight

Jam site

Aarhus Scale Oversight Hackathon

Join in the Nobelpark at Aarhus University for 48 hours of fun research! 
Aarhus University
, visit event page
Jam site

Anonymous

★★★☆☆
You have successfully rated this project!
Oops! Something went wrong while submitting the form.
You have successfully submitted your feedback. It should show up on this page.
Oops! Something went wrong while submitting the form.
This project received
4
stars from a user
Discovering Agency Features as Latent Space Directions in LLMs via SVD
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
2
stars from a user
ILLUSION OF CONTROL
This project received
4
stars from a user
Agency, value and empowerment.
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
ILLUSION OF CONTROL
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
3
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
3
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
2
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Impact of “fear of shutoff” on chatbot advice regarding illegal behavior
This project received
4
stars from a user
Goal Misgeneralization
This project received
4
stars from a user
Residual Stream Verification via California Housing Prices Experiment
This project received
4
stars from a user
Problem 9.60 - Dimensionaliy reduction
This project received
3
stars from a user
Trojan detection and implementation on transformers
This project received
5
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
4
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
3
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
4
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
3
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
Soft Prompts are a Convex Set
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
2
stars from a user
Toward a Working Deep Dream for LLM's
This project received
2
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Interpreting Planning in Transformers
This project received
2
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
4
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Toward a Working Deep Dream for LLM's
This project received
5
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Who cares about brackets?
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
3
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Interpreting Planning in Transformers
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
4
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Towards Interpretability of 5 digit addition