LiveBench

A Challenging, Contamination-Free LLM Benchmark

LiveBench will appear as a Spotlight Paper in ICLR 2025.
This work is sponsored by Abacus.AI

Introduction

Introducing LiveBench: a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

  • LiveBench limits potential contamination by releasing new questions regularly.
  • Each question has verifiable, objective ground-truth answers, eliminating the need for an LLM judge.
  • LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.

We will evaluate your model on LiveBench! Open a github issue or email us at livebench.ai@gmail.com!

Leaderboard

We update questions regularly so that the benchmark completely refreshes every 6 months. All questions for previous releases are available here. The most recent version is LiveBench-2024-11-25.
To further reduce contamination, we delay publicly releasing the questions from the most-recent update. LiveBench-2024-11-25 had 300 new questions, so currently 30% of questions in LiveBench are not publicly released.

View Full Changelog
2024-11-25
Filter by organization...
ModelOrganization

BibTeX


@inproceedings{livebench,
  title={LiveBench: A Challenging, Contamination-Free {LLM} Benchmark},
  author={Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz-Ziv and Neel Jain and Khalid Saifullah and Sreemanti Dey and Shubh-Agrawal and Sandeep Singh Sandha and Siddartha Venkat Naidu and Chinmay Hegde and Yann LeCun and Tom Goldstein and Willie Neiswanger and Micah Goldblum},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
}