metadata
base_model:
- meta-llama/Llama-3.1-8B-Instruct
datasets:
- chtmp223/CLIPPER
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
# Llama-3.1-8B-CLIPPER
Llama-3.1-8B-CLIPPER is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct using supervised finetuning over chtmp223/CLIPPER dataset.
Please check [our paper](https://arxiv.org/abs/2502.14854) for more details on the method.
## π Model Details
### Model Description
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model:** meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
### Model Sources
- **Repository:** [Github repository](https://github.com/chtmp223/CLIPPER).
- **Paper:** [https://arxiv.org/abs/2502.14854](https://arxiv.org/abs/2502.14854)
## π» Training Details
### Training Data
[chtmp223/CLIPPER](https://huggingface.co/datasets/chtmp223/CLIPPER)
### Training Procedure
| **Configurations** | **Values** |
|----------------------------------|--------------|
| Hardware (Training and Inference)| 8xA100s |
| Tracking | wandb |
| batch size | 16 |
| gradient_checkpointing | True |
| learning_rate | 1.0e-6 |
| lr_scheduler_type | cosine |
| max_length | 131072 |
| num_train_epochs | 1 |
| optim | adamw_torch |
#### Software
Training code is adapted from [https://github.com/princeton-nlp/ProLong](https://github.com/princeton-nlp/ProLong).
## π€ Inference
Inference is done with [vLLM](https://github.com/vllm-project/vllm) on 1 A100-80GB.
## π Citation
@misc{pham2025clippercompressionenableslongcontext, title={CLIPPER: Compression enables long-context synthetic data generation}, author={Chau Minh Pham and Yapei Chang and Mohit Iyyer}, year={2025}, eprint={2502.14854}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.14854}, } ```