Back to All Resources

The Pile

800GB diverse, open-source language modeling dataset curated for training large language models.

Visit Resource