This work was done during 48 hours by research workshop participants and does not represent the work of Apart Research.

Embedding and Transformer Synthesis

I programmatically created a set of embeddings that can be used to perfectly reconstruct a binary classification function (“embedding synthesis”). I used these embeddings to programmatically set weights for a 1-layer transformer that can also perfectly reconstruct the classification function (“transformer synthesis”). With one change, this reconstruction matches my original hypothesis of how a pre-existing transformer works. I ran several experiments on my synthesized transformer to evaluate my synthetic model.

Anonymous: Team members hidden

Rick Goldstein

Rick Goldstein

Embedding and Transformer Synthesis
View the video presentation:
https://drive.google.com/file/d/1yaPj1RNPOgjCR1rjbuuzgOkXb4uGnSBq/view?usp=sharing

Download instead.

Download instead.

Hackathon

Interpretability

Jam site

Virtual

Anonymous

★★★☆☆
You have successfully rated this project!
Oops! Something went wrong while submitting the form.
You have successfully submitted your feedback. It should show up on this page.
Oops! Something went wrong while submitting the form.
This project received
4
stars from a user
Discovering Agency Features as Latent Space Directions in LLMs via SVD
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
2
stars from a user
ILLUSION OF CONTROL
This project received
4
stars from a user
Agency, value and empowerment.
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
ILLUSION OF CONTROL
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
3
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
3
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
2
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Impact of “fear of shutoff” on chatbot advice regarding illegal behavior
This project received
4
stars from a user
Goal Misgeneralization
This project received
4
stars from a user
Residual Stream Verification via California Housing Prices Experiment
This project received
4
stars from a user
Problem 9.60 - Dimensionaliy reduction
This project received
3
stars from a user
Trojan detection and implementation on transformers
This project received
5
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
4
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
3
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
4
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
3
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
Soft Prompts are a Convex Set
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
2
stars from a user
Toward a Working Deep Dream for LLM's
This project received
2
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Interpreting Planning in Transformers
This project received
2
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
4
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Toward a Working Deep Dream for LLM's
This project received
5
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Who cares about brackets?
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
3
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Interpreting Planning in Transformers
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
4
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Towards Interpretability of 5 digit addition