Introducing LiveBench: a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:
We will evaluate your model on LiveBench! Open a github issue or email us at livebench.ai@gmail.com!
Model | Organization |
---|
@inproceedings{livebench,
title={LiveBench: A Challenging, Contamination-Free {LLM} Benchmark},
author={Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz-Ziv and Neel Jain and Khalid Saifullah and Sreemanti Dey and Shubh-Agrawal and Sandeep Singh Sandha and Siddartha Venkat Naidu and Chinmay Hegde and Yann LeCun and Tom Goldstein and Willie Neiswanger and Micah Goldblum},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
}