LongWriter-V-72B / README.md
nielsr's picture
nielsr HF staff
Add model card metadata
28277a1 verified
|
raw
history blame
4.07 kB
metadata
license: apache-2.0
library_name: transformers
pipeline_tag: image-text-to-text

LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

πŸ€— Train Dataset β€’ πŸ€— Benchmark β€’ πŸ€— Model β€’ πŸ“ƒ Paper

πŸ” Table of Contents

βš™οΈ LongWriter-V Deployment

Environmental Setup: To inference Qwen2.5-VL based models, you may need to install transformers from source. Refer to this issue for more details.

We open-source three models: LongWriter-V-7B and LongWriter-V-7B-DPO, trained based on Qwen2.5-VL-7B-Instruct and LongWriter-V-72B, trained based on Qwen2.5-VL-72B-Instruct.

πŸ€–οΈ LongWriter-Agent-V

We are also open-sourcing LongWriter-Agent-V under agentwrite/, our automated ultra-long output data construction pipeline. Run outline_vlm.py to obtain the final data. Please configure your API key in config.py.

πŸ–₯️ Model Training

You can download and save the LongWriter-V-22K data through the Hugging Face datasets (πŸ€— HF Repo).

You can train the model with LLaMA-Factory, we used the official Qwen2_VL training script for training.

πŸ“Š Evaluation

We introduce two evaluation benchmarks: MMLongBench-Write and LongWrite-V-Ruler. MMLongBench-Write focuses more on measuring the long output quality as well as the output length, while LongWrite-V-Ruler is designed as a light-weight stress test of the model's maximum output length. We provide our evaluation code under eval/. Run

python -m eval.mmlongbench_write --model {model_name} --method {vlm, caption_llm}
python -m eval.longwrite_v_ruler --model {model_name}

to get evaluation resuts. Remember to configure your OpenAI API key in config.py since we adopt GPT-4o as the judge.

Here are the evaluation results on MMLongBench-Write: image

Here are the evaluation results on LongWrite-V-Ruler: image

πŸ‘€ Cases

Here are LongWriter-V-7B's outputs to random test prompts. (Examples truncated for brevity).

πŸ“ Citation

If you find our work useful, please kindly cite:

@misc{tu2025longwriterv,
      title={LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models}, 
      author={Shangqing Tu and Yucheng Wang and Daniel Zhang-Li and Yushi Bai and Jifan Yu and Yuhao Wu and Lei Hou and Huiqin Liu and Zhiyuan Liu and Bin Xu and Juanzi Li},
      year={2025},
      eprint={2502.14834},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.14834}, 
}