LLaMA 33b finetuned on wikitext_document_level with a linear ROPE scaling of 8, for a 16k token context length. This is a merged version of llama33b-16k-qlora.

Note that this is not an instruct model - this is base LLaMA with an extended sequence length.

Downloads last month
42
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train chargoddard/llama33b-16k