Back to All Resources
The Pile
800GB diverse, open-source language modeling dataset curated for training large language models.
Visit Resource