HumanF-MarkrAI/Gukbap-Qwen2.5-34B-VL🍚

Model Details🍚

Model Description

  • Developed by: HumanF-MarkrAI
  • Model type: Korean-VL-Qwen2.5-34B
  • Language(s): Korean + English
  • Context Length: 2048
  • License: cc-by-nc-4.0
  • Finetuned from model: AIDC-AI/Ovis2-34B.

Model Sources

When training, we used H100 80GB GPUx6.

Implications🍚

If you want to know our model's details, please see 🔥Gukbap-LMM Blog🔥.
And also, we provided the Korean-LMM training code based Ovis!! 🔥Github🔥. Please star⭐⭐!!

Training Method (SFT)🧐

The following papers contain the foundational methodologies for the dataset and training methods we are currently proceeding.

SFT Text-Datasets (Private)

When we made the Open-Source based dataset, we use microsoft/WizardLM-2-8x22B through DeepInfra.
Our datasets are made by Evolving system, which is propsed by WizardLM. In training, we used 1849 training dataset, and 200 validation dataset.

Benchmakrs🤗

Global MM Benchmark Score (Zero-shot)

We internally evaluated VLMEvalKit.
We utilized chatgpt-0125, gpt-4o-mini and gpt-4-turbo in MMBench, MathVista and MMVet, respectively.

Model MMStar MathVista HallusionBench AI2D OCRBench MMVet MMBench_V11 AVG
Step-1o (closed model) 69.3 74.7 89.1 55.8 92.6 82.8 87.3 78.8
InternVL2.5-78B-MPO (Open) 72.1 76.6 58.1 89.2 90.9 73.5 87.8 78.3
Ovis2-34B (Open) 69.2 76.1 58.8 88.3 89.4 77.1 86.5 77.9
InternVL2.5-38B-MPO (Open) 70.1 73.6 59.7 87.9 89.4 72.6 85.4 77.0
:---------: :-----: :------: :-----: :-----: :----: :-----: :-----: :-----:
Gukbap-Qwen2-34B-VL🍚 69.33 77.40 55.66 88.31 84.7 74.13 86.53 76.58
:---------: :-----: :------: :-----: :-----: :----: :-----: :-----: :-----:
Gemini-2.0-Flash 69.4 70.4 58.0 83.1 82.5 73.6 71.0 72.6
GPT-4o-20241120 65.1 59.9 56.2 84.9 80.6 74.5 84.3 72.2
Ovis1.6-Gemma2-9B (Open) 62.00 67.10 84.42 51.96 82.60 64.68 82.20 70.71
Gukbap-Gemma2-9B-VL🍚 62.13 66.00 84.49 53.01 82.80 63.90 82.20 70.65
LLaVA-OneVision-72B 65.8 68.4 47.9 86.2 74.1 60.6 84.5 69.6
VARCO-VISION-14B (NCSoft) 64.1 67.6 46.8 83.9 81.5 53.0 81.2 68.3
GPT-4o-mini-20240718 54.8 52.4 46.1 77.8 78.5 66.9 76.0 64.6

HallusionBench score: (aAcc + fAcc + qAcc) / 3

Korean MM Benchmark Score (Zero-shot)

We internally evaluated 🔥our code🔥.
We utilized gpt-4o-2024-08-06 in K-LLAVA-W evaluation.

Model K-MMBench K-MMStar K-DTCBench K-LLAVA-W AVG
GPT-4o-20241120 NaN NaN NaN 85.50 NaN
:---------: :-----: :------: :-----: :-----: :----:
Gukbap-Qwen2.5-34B-VL🍚 89.10 68.13 77.08 69.00 75.83
Ovis2-34B 89.56 68.27 76.25 53.67 71.94
Gukbap-Gemma2-9B-VL🍚 80.16 54.20 52.92 63.83 62.78
Ovis1.6-Gemma2-9B 52.46 50.40 47.08 55.67 51.40
VARCO-VISION-14B 87.16 58.13 85.42 51.17 70.47
llama-3.2-Korean-Bllossom-AICA-5B 26.01 21.60 17.08 45.33 27.51

MM Benchmarks

Inference

import torch
from PIL import Image
from transformers import AutoModelForCausalLM

#import os
#os.environ["cuda_visible_devices"]="0"

# load model
if __name__ == '__main__':
    # HumanF-MarkrAI/Gukbap-Qwen2-34B-VL
    # AIDC-AI/Ovis2-34B
    model = AutoModelForCausalLM.from_pretrained("HumanF-MarkrAI/Gukbap-Qwen2-34B-VL",
                                                torch_dtype=torch.bfloat16,
                                                multimodal_max_length=2048,
                                                cache_dir="/data/cache/",
                                                trust_remote_code=True).cuda()
    text_tokenizer = model.get_text_tokenizer()
    visual_tokenizer = model.get_visual_tokenizer()

    # single-image input (K-LLAVA-W)
    image_path = './images/ex_4.jpg'
    images = [Image.open(image_path)]
    max_partition = 9
    text = '이미지에서 잘리지 않은 과일은 몇 개인가요?'
    query = f'<image>\n{text}'

    # format conversation
    prompt, input_ids, pixel_values = model.preprocess_inputs(query, images, max_partition=max_partition)
    attention_mask = torch.ne(input_ids, text_tokenizer.pad_token_id)
    input_ids = input_ids.unsqueeze(0).to(device=model.device)
    attention_mask = attention_mask.unsqueeze(0).to(device=model.device)
    if pixel_values is not None:
        pixel_values = pixel_values.to(dtype=visual_tokenizer.dtype, device=visual_tokenizer.device)
    pixel_values = [pixel_values]

    # generate output
    with torch.inference_mode():
        gen_kwargs = dict(
            max_new_tokens=2048,
            do_sample=False,
            top_p=None,
            top_k=None,
            temperature=None,
            repetition_penalty=None,
            eos_token_id=model.generation_config.eos_token_id,
            pad_token_id=text_tokenizer.pad_token_id,
            use_cache=True
        )
        output_ids = model.generate(input_ids, pixel_values=pixel_values, attention_mask=attention_mask, **gen_kwargs)[0]
        output = text_tokenizer.decode(output_ids, skip_special_tokens=True)
        print(f'Output:\n{output}')

Chat Prompt😶‍🌫️

<|im_start|>user<image>
Hello! My favorite food is Gukbap🍚!<|im_end|>
<|im_start|>assistant
(model answer)

Gukbap-VL Series models🍚🍚

BibTeX

@article{HumanF-MarkrAI,
  title={Gukbap-Qwen2.5-34B-VL},
  author={MarkrAI},
  year={2025},
  url={https://huggingface.co/HumanF-MarkrAI}
}
Downloads last month
204
Safetensors
Model size
34.9B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Collection including HumanF-MarkrAI/Gukbap-Qwen2-34B-VL