Back to All Resources
RedPajama
Open dataset replicating LLaMA training data, with 1.2 trillion tokens across various sources.
Visit Resource