The LMEH is a set of over 200 tasks that you can automatically run your models through. You can easily use it by writing pip install lm-eval at the top of your script.
See a Colab notebook shortly introducing how to use it here.
Check out the Github repository and the guide to adding a new benchmark so you can test your own tasks using their easy interface.
We're hosting a hackathon to find the best benchmarks for safety in large language models!
Large models are becoming increasingly important and we want to make sure that we understand the safety of these systems.
With Alignment Jams, we get a chance to create impactful and real research on this problem together with people from across the world.
Don't miss this opportunity to explore machine learning deeper, network, and challenge yourself!
Register now: https://alignmentjam.com/jam/benchmarks
[or add your event link here]