Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
1
Reset Other
Policy
art
Synthetic
medical
code
biology
finance
legal
chemistry
agent
music
climate
Apply filters
Datasets
214
Full-text search
Edit filters
Sort: Trending
Active filters:
llm-evaluation
Clear all
GSMA/leaderboard
Viewer
•
Updated
3 days ago
•
85
•
86
•
2
MerlinSafety/EuroAlign-1K
Viewer
•
Updated
17 days ago
•
3.3k
•
52
•
2
2a-agency/brand-semantic-integrity-registry
Updated
about 11 hours ago
•
79
•
1
kingofspace0wzz/AgentSocialBench
Updated
4 days ago
•
459
•
1
skylenage-ai/QwenClawBench
Viewer
•
Updated
about 2 hours ago
•
100
•
1
Haakkim/Haakkim-1.0v
Viewer
•
Updated
1 day ago
•
1.27k
•
1
cmudrc/Material_Selection_Eval
Viewer
•
Updated
Jul 24, 2024
•
221
•
20
•
1
MixEval/MixEval
Viewer
•
Updated
Sep 27, 2024
•
5k
•
320
•
24
yuyijiong/difficult_retrieval
Preview
•
Updated
Dec 7, 2025
•
25
•
1
MixEval/MixEval-X
Viewer
•
Updated
Feb 15, 2025
•
7.68k
•
137
•
10
eliyahabba/llm-evaluation-huji
Viewer
•
Updated
Mar 3, 2025
•
2.3M
•
7
zli12321/Bills
Viewer
•
Updated
Jul 9, 2025
•
47.9k
•
84
nlphuji/DOVE
Viewer
•
Updated
Aug 14, 2025
•
254M
•
10
•
4
zli12321/Wiki
Viewer
•
Updated
Jul 9, 2025
•
22.3k
•
112
nlphuji/DOVE_Lite
Viewer
•
Updated
Aug 14, 2025
•
254M
•
46
•
3
sarahcen/llm-election-data-2024
Preview
•
Updated
Dec 10, 2025
•
34
•
3
Galtea-AI/galtea-red-teaming-clustered-data
Viewer
•
Updated
May 19, 2025
•
26.1k
•
43
•
1
Steefano/LCB
Preview
•
Updated
Dec 4, 2025
•
345
•
5
wasanx/gemba_annotation
Viewer
•
Updated
May 20, 2025
•
9.4k
•
11
PKU-Alignment/DeceptionBench
Viewer
•
Updated
May 27, 2025
•
180
•
402
•
3
anvo25/b-score
Viewer
•
Updated
Jun 2, 2025
•
167
•
50
•
2
kenhktsui/scli5
Viewer
•
Updated
Jul 6, 2025
•
286
•
38
Jinyang23/NoiserBench
Preview
•
Updated
May 31, 2025
•
8
•
2
luzimu/WebGen-Bench_train_data
Updated
Sep 29, 2025
•
54
•
1
SylvainWei/TIME-Lite
Viewer
•
Updated
Jul 23, 2025
•
1.55k
•
180
•
2
Mahesh111000/Hanabi_dataset
Preview
•
Updated
Jun 16, 2025
•
42
tencent/C3-BenchMark
Viewer
•
Updated
Jul 1, 2025
•
256
•
269
•
6
onionmonster/truthful_qa
Preview
•
Updated
Jun 30, 2025
•
10
Naholav/claude_4_math_evaluation_500
Preview
•
Updated
Jul 7, 2025
•
30
zli12321/Scifi4TopicModel
Viewer
•
Updated
Jul 9, 2025
•
2.08k
•
12
Previous
1
2
3
...
8
Next