Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
bluelightai-dev
's Collections
Sampled Datasets
Sampled Datasets
updated
Nov 11, 2025
Random samples from large datasets, for convenience.
Upvote
-
bluelightai-dev/dclm-full-deduped-sample
Viewer
•
Updated
Nov 11, 2025
•
4.92M
•
210
bluelightai-dev/the-stack-dedup-sample
Viewer
•
Updated
Nov 10, 2025
•
474k
•
47
bluelightai-dev/common-corpus-sample-open-culture
Viewer
•
Updated
Nov 11, 2025
•
462k
•
119
bluelightai-dev/common-corpus-sample-open-government
Viewer
•
Updated
Nov 11, 2025
•
373k
•
295
•
1
bluelightai-dev/common-corpus-sample-open-science
Viewer
•
Updated
Nov 11, 2025
•
284k
•
36
bluelightai-dev/common-corpus-sample-open-source
Viewer
•
Updated
Nov 11, 2025
•
2.02M
•
4
bluelightai-dev/common-corpus-sample-open-web
Viewer
•
Updated
Nov 11, 2025
•
4.8M
•
29
bluelightai-dev/MathPile_Commercial-formatted
Viewer
•
Updated
Nov 12, 2025
•
389k
•
145
Upvote
-
Share collection
View history
Collection guide
Browse collections