Introducing LiveBench: a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties:
We will evaluate your model on LiveBench! Open a github issue or email us at livebench.ai@gmail.com!
Model | Global Average |
---|
@article{livebench,
author = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
title = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
url = {arXiv preprint arXiv:2406.19314},
year = {2024},
}