LiveBench

A Challenging, Contamination-Free LLM Benchmark

LiveBench will appear as a Spotlight Paper in ICLR 2025.
This work is sponsored by Abacus.AI

New! Check out LiveSWEBench, our new benchmark for AI coding agents

Introduction

Introducing LiveBench: a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:

LiveBench limits potential contamination by releasing new questions regularly.
Each question has verifiable, objective ground-truth answers, eliminating the need for an LLM judge.
LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time.

We will evaluate your model on LiveBench! Open a github issue or email us at livebench.ai@gmail.com!

Leaderboard

We update questions regularly so that the benchmark completely refreshes every 6 months. All questions for previous releases are available here. The most recent version is LiveBench-2025-04-02. This verison includes updated coding, language, math, and reasoning questions.
To further reduce contamination, we delay publicly releasing the questions from the most-recent update. LiveBench-2025-04-02 had ~300 new questions, so currently ~30% of questions in LiveBench are not publicly released.

View Full Changelog

2025-04-02

Show OrganizationShow API NameShow Reasoning Models

Filter by organization...

Model	Organization

BibTeX


@inproceedings{livebench,
  title={LiveBench: A Challenging, Contamination-Free {LLM} Benchmark},
  author={Colin White and Samuel Dooley and Manley Roberts and Arka Pal and Benjamin Feuer and Siddhartha Jain and Ravid Shwartz-Ziv and Neel Jain and Khalid Saifullah and Sreemanti Dey and Shubh-Agrawal and Sandeep Singh Sandha and Siddartha Venkat Naidu and Chinmay Hegde and Yann LeCun and Tom Goldstein and Willie Neiswanger and Micah Goldblum},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
}