Back to All Resources

RedPajama

Open dataset replicating LLaMA training data, with 1.2 trillion tokens across various sources.

Visit Resource