gradientai
/

Llama-3-70B-Instruct-Gradient-524k

Text Generation

text-generation-inference

Model card Files Files and versions Community

leo-pekelis-gradient commited on May 4, 2024

Commit

0ff7303

·

verified ·

1 Parent(s): 1ecd22d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ For more info see our [End-to-end development service for custom LLMs and AI sys
 This model extends LLama-3 70B's context length from 8k to > 524K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 210M tokens for this stage, and ~400M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
-**[ADD EVAL PLOT HERE]**
 **Approach:**

 This model extends LLama-3 70B's context length from 8k to > 524K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 210M tokens for this stage, and ~400M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6585dc9be92bc5f258156bd6/weTc-OpmWPdpoeZ3CN6aW.png)
 **Approach:**