This work was done during 48 hours by research workshop participants and does not represent the work of Apart Research.
4th 🏆
3rd 🏆
2nd 🏆
1st 🏆
AI Testing
Private
Info hazard
See web link
See the code
Visit itch.io page
Read PDF
Read PDF

Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing

Based on the paper "Discovering Latent Knowledge in Language Models without supervision" this project discusses how well the proposed method applies to the concept of ambiguity. To do that, we tested the Contrast Consistent Search method on data with clear cut (0 or 1) or ambiguous (0,5) examples.

Anonymous: Team members hidden

Agatha Duzan, Matthieu David, Jonathan Claybrough

goldtartigradfish

Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing
View the video presentation:

Download instead.

Download instead.

Hackathon

AI Testing

Sunday, December 18, 2022
Hackathon

AI Testing

Jam site

Responsible Machine Learning AI Testing hackathon

The hackathon will be used for the evaluation of the students of the Responsible Machine Learning course at Mathématiques Vision Apprentissage
Jam site

ENS Ulm

Anonymous

★★★☆☆
You have successfully rated this project!
Oops! Something went wrong while submitting the form.
You have successfully submitted your feedback. It should show up on this page.
Oops! Something went wrong while submitting the form.
This project received
4
stars from a user
Discovering Agency Features as Latent Space Directions in LLMs via SVD
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
2
stars from a user
ILLUSION OF CONTROL
This project received
4
stars from a user
Agency, value and empowerment.
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
1
stars from a user
ILLUSION OF CONTROL
This project received
2
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
4
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
Against Agency
This project received
3
stars from a user
ILLUSION OF CONTROL
This project received
3
stars from a user
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
This project received
3
stars from a user
Comparing truthful reporting, intent alignment, agency preservation and value identification
This project received
3
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
2
stars from a user
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
This project received
3
stars from a user
Impact of “fear of shutoff” on chatbot advice regarding illegal behavior
This project received
4
stars from a user
Goal Misgeneralization
This project received
4
stars from a user
Residual Stream Verification via California Housing Prices Experiment
This project received
4
stars from a user
Problem 9.60 - Dimensionaliy reduction
This project received
3
stars from a user
Trojan detection and implementation on transformers
This project received
5
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
4
stars from a user
Turing Mirror: Evaluating the ability of LLMs to recognize LLM-generated text
This project received
3
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
4
stars from a user
Preliminary measures of faithfulness in least-to-most prompting
This project received
5
stars from a user
Can Large Language Models Solve Security Challenges?
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
3
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
SADDER - Situational Awareness Dataset for Detecting Extreme Risks
This project received
5
stars from a user
Soft Prompts are a Convex Set
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
2
stars from a user
Toward a Working Deep Dream for LLM's
This project received
2
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
5
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
5
stars from a user
Interpreting Planning in Transformers
This project received
2
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
4
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Toward a Working Deep Dream for LLM's
This project received
5
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
One is 1- Analyzing Activations of Numerical Words vs Digits
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Who cares about brackets?
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
3
stars from a user
Embedding and Transformer Synthesis
This project received
3
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Interpreting Planning in Transformers
This project received
4
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
4
stars from a user
Towards Interpretability of 5 digit addition
This project received
3
stars from a user
Toward a Working Deep Dream for LLM's
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
2
stars from a user
Factual recall rarely happens in attention layer
This project received
4
stars from a user
Relating induction heads in Transformers to temporal context model in human free recall
This project received
5
stars from a user
Experiments in Superposition
This project received
4
stars from a user
Preliminary Steps Toward Investigating the “Smearing” Hypothesis for Layer Normalizing in a 1-Layer SoLU Model
This project received
5
stars from a user
Experiments in Superposition
This project received
3
stars from a user
Multimodal Similarity Detection in Transformer Models
This project received
3
stars from a user
Interpreting Planning in Transformers
This project received
3
stars from a user
DPO vs PPO comparative analysis
This project received
4
stars from a user
Who cares about brackets?
This project received
4
stars from a user
Embedding and Transformer Synthesis
This project received
4
stars from a user
Towards Interpretability of 5 digit addition