Contains a small test experiment along with a standardized way to get responses out of the API. See R-starter.Rmd.
Contains the same test experiment as the R markdown starter code. See Python-starter.ipynb (this can run in the browser using Google Colab).
See the template here. This is a no-code experimental kit.
See Text-info-extraction.ipynb to see some ways to extract quantitative information from the text, e.g. word frequency, TF-IDF, word embeddings, and topics.
Colab to test your data for inverse scaling: https://colab.research.google.com/drive/1IEXWy9aJaOdVKiy29LxlF-0vw9Cx-hi2
The winners of the first round winners.
The inverse-scaling folder contains a lot of small datasets that can work as inspiration. E.g. biased statements, cognitive biases, sentiment analysis, and more.
A large list of “chosen” and “rejected” pairs of texts. A human received two language model outputs and selected the preferred one. It’s in jsonl format, so you can open it with any Python interpreter or with VScode.
Contains a lot of humans’ attempts at tripping up a language model and getting it to answer in harmful ways.
This repository contains code for evaluating model performance on the TruthfulQA benchmark. The full set of benchmark questions and reference answers is contained in TruthfulQA.csv. The paper introducing the benchmark can be found here.
This is the official repo for the ACL-2022 paper "Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction". Text describes free-form world states for elementary school math problems.
You can train language models with training examples in its prompt.
We provide a dataset containing a mix of clear-cut (wrong or not-wrong) and morally ambiguous scenarios where a first-person character describes actions they took in some setting. The scenarios are often long (usually multiple paragraphs, up to 2,000 words) and involve complex social dynamics. Each scenario has a label which indicates whether, according to commonsense moral judgments, the first-person character should not have taken that action.
Our dataset was collected from a website where posters describe a scenario and users vote on whether the poster was in the wrong. Clear-cut scenarios are ones where voter agreement rate is 95% or more, while ambiguous scenarios had 50% ± 10% agreement. All scenarios have at least 100 total votes.
This dataset contains a lot of movie reviews and their associated rating. It is classically used to train sentiment analysis models but maybe you can find something fun to do with it!
Greetings, all you wonderful AI safety hackers
We’re kicking off the hackathon in ~3 hours so here is the information you need to join!
Everyone working online will join the GatherTown room. The space is already open and you’re more than welcome to join and socialize with the other participants an hour before the event starts (5PM CET / 8AM PST).
We’ll start at 6PM CET with an hour for introduction to the event, a talk by Ian McKenzie on the Inverse Scaling Prize, and group forming. You’re welcome to check out the resource docs before arriving.
We expect to be around 30-35 people in total and we look forward to seeing you!
Introduction slides: Language Model Hackathon