Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

updated a dataset 2 days ago

TIGER-Lab/ClawBench

upvoted a paper 9 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

updated a Space 18 days ago

TIGER-Lab/ClawBench

View all activity

Organizations

updated a dataset 2 days ago

TIGER-Lab/ClawBench

Viewer • Updated 2 days ago • 283 • 538

upvoted a paper 9 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Paper • 2605.30288 • Published 14 days ago • 22

updated a Space 18 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 4 datasets 18 days ago

TIGER-Lab/ClawBenchV2Trace

Updated 18 days ago • 9.23k

NAIL-Group/ClawBenchV2Trace

Updated 18 days ago • 4.14k

NAIL-Group/ClawBenchV1Trace

Updated 18 days ago • 7.26k

NAIL-Group/ClawBench

Viewer • Updated 18 days ago • 153 • 300 • 2

commented a paper 27 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10 •

upvoted a paper 27 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

New activity in huggingface/HuggingDiscussions 28 days ago

[FEEDBACK] Daily Papers

#32 opened almost 2 years ago by

submitted a paper to Daily Papers 28 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated about 1 month ago

published a Space about 1 month ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated about 1 month ago

updated a Space about 1 month ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated about 1 month ago

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12 • 1

published 2 datasets about 1 month ago

TIGER-Lab/ClawBenchV2Trace

Updated 18 days ago • 9.23k

NAIL-Group/ClawBenchV2Trace

Updated 18 days ago • 4.14k

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated about 1 month ago