Edit Datasets filters

Datasets

214

Full-text search

Active filters: llm-evaluation

GSMA/leaderboard

Viewer • Updated 3 days ago • 85 • 86 • 2

MerlinSafety/EuroAlign-1K

Viewer • Updated 17 days ago • 3.3k • 52 • 2

2a-agency/brand-semantic-integrity-registry

Updated about 11 hours ago • 79 • 1

kingofspace0wzz/AgentSocialBench

Updated 4 days ago • 459 • 1

skylenage-ai/QwenClawBench

Viewer • Updated about 2 hours ago • 100 • 1

Haakkim/Haakkim-1.0v

Viewer • Updated 1 day ago • 1.27k • 1

cmudrc/Material_Selection_Eval

Viewer • Updated Jul 24, 2024 • 221 • 20 • 1

MixEval/MixEval

Viewer • Updated Sep 27, 2024 • 5k • 320 • 24

yuyijiong/difficult_retrieval

Preview • Updated Dec 7, 2025 • 25 • 1

MixEval/MixEval-X

Viewer • Updated Feb 15, 2025 • 7.68k • 137 • 10

eliyahabba/llm-evaluation-huji

Viewer • Updated Mar 3, 2025 • 2.3M • 7

zli12321/Bills

Viewer • Updated Jul 9, 2025 • 47.9k • 84

nlphuji/DOVE

Viewer • Updated Aug 14, 2025 • 254M • 10 • 4

zli12321/Wiki

Viewer • Updated Jul 9, 2025 • 22.3k • 112

nlphuji/DOVE_Lite

Viewer • Updated Aug 14, 2025 • 254M • 46 • 3

sarahcen/llm-election-data-2024

Preview • Updated Dec 10, 2025 • 34 • 3

Galtea-AI/galtea-red-teaming-clustered-data

Viewer • Updated May 19, 2025 • 26.1k • 43 • 1

Steefano/LCB

Preview • Updated Dec 4, 2025 • 345 • 5

wasanx/gemba_annotation

Viewer • Updated May 20, 2025 • 9.4k • 11

PKU-Alignment/DeceptionBench

Viewer • Updated May 27, 2025 • 180 • 402 • 3

anvo25/b-score

Viewer • Updated Jun 2, 2025 • 167 • 50 • 2

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 38

Jinyang23/NoiserBench

Preview • Updated May 31, 2025 • 8 • 2

luzimu/WebGen-Bench_train_data

Updated Sep 29, 2025 • 54 • 1

SylvainWei/TIME-Lite

Viewer • Updated Jul 23, 2025 • 1.55k • 180 • 2

Mahesh111000/Hanabi_dataset

Preview • Updated Jun 16, 2025 • 42

tencent/C3-BenchMark

Viewer • Updated Jul 1, 2025 • 256 • 269 • 6

onionmonster/truthful_qa

Preview • Updated Jun 30, 2025 • 10

Naholav/claude_4_math_evaluation_500

Preview • Updated Jul 7, 2025 • 30

zli12321/Scifi4TopicModel

Viewer • Updated Jul 9, 2025 • 2.08k • 12