This work was done during 48 hours by research workshop participants and does not represent the work of Apart Research.

Preliminary measures of faithfulness in least-to-most prompting

In our experiment, we scrutinize the role of post-hoc reasoning in the performance of large language models (LLMs), specifically the gpt-3.5-turbo model, when prompted using the least-to-most prompting (L2M) strategy. We examine this by observing whether the model alters its responses after previously solving one to five subproblems in two tasks: the AQuA dataset and the last letter task. Our findings suggest that the model does not engage in post-hoc reasoning, as its responses vary based on the number and nature of subproblems. The results contribute to the ongoing discourse on the efficacy of various prompting strategies in LLMs.

Anonymous: Team members hidden

Mateusz Bagiński, Jakub Nowak, Lucie Philippon

L2M Faithfulness

Preliminary measures of faithfulness in least-to-most prompting
View the video presentation:

Download instead.

Download instead.

Hackathon

Evals

Jam site

Virtual

Anonymous

★★★☆☆
You have successfully rated this project!
Oops! Something went wrong while submitting the form.
You have successfully submitted your feedback. It should show up on this page.
Oops! Something went wrong while submitting the form.
This project received
4
stars from a user
Discovering Agency Features as Latent Space Directions in LLMs via SVD
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
2
stars from a user
ILLUSION OF CONTROL
This project received
4
stars from a user
Agency, value and empowerment.
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
ILLUSION OF CONTROL
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
3
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
3
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
2
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Impact of “fear of shutoff” on chatbot advice regarding illegal behavior
This project received
4
stars from a user
Goal Misgeneralization
This project received
4
stars from a user
Residual Stream Verification via California Housing Prices Experiment
This project received
4
stars from a user
Problem 9.60 - Dimensionaliy reduction
This project received
3
stars from a user
Trojan detection and implementation on transformers
This project received
5
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
4
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
3
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
4
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
3
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
Soft Prompts are a Convex Set
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
2
stars from a user
Toward a Working Deep Dream for LLM's
This project received
2
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Interpreting Planning in Transformers
This project received
2
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
4
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Toward a Working Deep Dream for LLM's
This project received
5
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Who cares about brackets?
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
3
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Interpreting Planning in Transformers
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
4
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Towards Interpretability of 5 digit addition