lxasqjc (Chen Jin)

Chen Jin

lxasqjc

AI & ML interests

None yet

Recent Activity

updated a collection about 4 hours ago

Lavender

updated a collection about 8 hours ago

Lavender

updated a collection about 8 hours ago

Lavender

View all activity

Organizations

Post

1474

⚡ Can Stable Diffusion's visual expertise enhance Llama-3.2?
🚀 Lavender: efficiently fine-tunes advanced vision-language models by aligning their text-vision attention with Stable Diffusion.
Paper: Diffusion Instruction Tuning (2502.06814)
🔑 Key Highlights:
✅ Significant Gains: +30% on 20 tasks, +68% on OOD WorldMedQA
✅ Data-Efficient: Needs only 0.13M samples (~2.5% of typical VLM datasets)
✅ Low Compute: Finetunes in ~1 day on 8 NVIDIA A10G GPUs
✅ Model-Agnostic: Works with Llama-3.2-11B, MiniCPM-Llama3-v2.5 & more
✅ Precise Alignment: Transfers strong text-vision alignment from Stable Diffusion
✅ Open-Source: Code, data & finetuned models will be available

👥 Discuss live at: https://www.alphaxiv.org/abs/2502.06814
🔗 Project Page: https://astrazeneca.github.io/vlm/

Chen Jin

AI & ML interests

Recent Activity

Organizations

Posts 1

Collections 1

Diffusion Instruction Tuning

Papers 3

models

datasets