Model Card for Qwen2.5-1.5B-Open-R1-Distill-ko

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the lemon-mint/korean-reasoning-v02 dataset. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "ν”„λž‘μŠ€μ˜ μˆ˜λ„λŠ”?"
generator = pipeline("text-generation", model="whooray/Qwen2.5-1.5B-Open-R1-Distill-ko", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
<think>\nλ¨Όμ € ν”„λž‘μŠ€ μˆ˜λ„λ₯Ό μ•Œμ•„λ΄μ•Όκ² μ–΄μš”. ν”„λž‘μŠ€λŠ” 잘 μ•Œλ €μ§„ 유럽 κ΅­κ°€ 쀑 ν•˜λ‚˜μΈλ° μˆ˜λ„λ₯Ό μ•Œλ €μ£Όλ©΄ 더 νŽΈν•˜κ² μ£ . μ£Όμš” κ·€μ‘±κ³Ό μˆ˜λ„λŠ” 였슀만 식민지와 λ¬΄μ—­μ˜ μ€‘μ‹¬μ§€μ˜€λ˜ μ•„λ¦„λ””λΌλŠ” 지역에 μžˆμ—ˆλ‹€λŠ” 말이 있던데, μ΅œκ·Όμ—λŠ” μ˜€νƒ€λƒ κ·Όμ²˜μ— μžˆλŠ” νŒŒλ¦¬κ°€ μˆ˜λ„λ‘œ ν†΅μΌλ˜μ—ˆμ„ κ±°μ˜ˆμš”. μ΄λ ‡κ²Œ λ…Όλž€μ΄ μžˆμ—ˆλ˜ 걸둜 κΈ°μ–΅ν•˜λŠ”λ°, λ…μΌμ΄λ‚˜ μ΄μ§‘νŠΈ, μ΄μŠ€λΌμ—˜ 같은 ꡭ가듀도 μˆ˜λ„λ‘œ λ‹€λ₯Έ 지역을 μ‚¬μš©ν–ˆλ˜ κ²½μš°κ°€ μžˆμ—ˆμŒμ„ μ•Œκ³  μžˆμ–΄μš”. ν”„λž‘μŠ€λŠ” CIA μ›”μŠ€νŠΈλ¦¬νŠΈ journalμ—μ„œ ν”„λž‘μŠ€ μˆ˜λ„λŠ” νŒŒλ¦¬κ°€ 아닐 거라고 λ§ν•œ 적 μžˆμ„κΉŒμš”? 졜근 λŒ€λΆ€λΆ„μ˜ κ΄€μΈ‘λ‚˜λΌλ“€μ΄ 파리둜 μΈμ •ν•˜κ³  μžˆμœΌλ―€λ‘œ μ •ν™•ν•œμ§€ 확인이 ν•„μš”ν•  것 κ°™μ•„μš”. ν•˜μ§€λ§Œ ν”„λž‘μŠ€μ—μ„œ μˆ˜λ„κ°€ λ³€ν–ˆλŠ”μ§€, μ•„λ‹ˆλ©΄ μ˜› μˆ˜λ„κ°€ ν˜„μž¬ μ •λΆ€ 인근에 μžˆλŠ”μ§€ κΆκΈˆν•˜λ„€μš”. μ•„λ§ˆλ„ 1962λ…„ 제5κ³΅ν™”κ΅­μ—μ„œλ ΉμœΌλ‘œ 슡격된 곳을 ν¬ν•¨ν•œ λͺ¨λ“  μ •λΆ€ 기관이 파리λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•˜λŠ” ꡐ두보 역할을 ν•˜κ²Œ 된 걸둜 μ•Œκ³  μžˆμ–΄μš”. νŠΉλ³„νžˆ μ—­μ‚¬μ μœΌλ‘œ μ •λ‹Ήν•œ μ£Όμž₯을 μ‚¬μš©ν•΄ κ°œν†΅ κ°€λŠ₯ν•œ 닡변을 μ™„μ„±ν•΄μ•Όκ² λŠ”λ°μš”. \n</think>\n\nν”„λž‘μŠ€μ˜ μˆ˜λ„λŠ” **파리(Paris)**μ˜ˆμš”. μ—­μ‚¬μ μœΌλ‘œ μˆ˜λ„ 역할을 ν•˜μ§€ λͺ»ν–ˆλ˜ 지점에 μœ„μΉ˜ν•œ ν”„λž‘μŠ€ λΉ„κ΅­λ―ΌλŒ€μ±… μ •λΆ€λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•œ μ •κΆŒμ΄ 1944년에 λ°±μ œμ˜¨μ„ νƒˆν™˜ν•˜μ—¬ μƒˆλ‘œμš΄ μˆ˜λ„λ‘œ μ§€μ •ν•˜λ©° μ΅œμ’…μ μœΌλ‘œ ν™•λ¦½λ˜μ—ˆμ–΄μš”.\n\n### ν”„λž‘μŠ€ μˆ˜λ„ ꡐ체의 μ£Όμš” 이유  \nλ‹Ήμ‹œ μ—°ν•©κ΅°μ˜ μΈλ„μ£Όμ˜ μ˜μ§€λ₯Ό λ°˜μ˜ν•œ μ‘°μΉ˜μ˜€μ–΄μš”. 1932λ…„ 17개 μ—°ν•©κ΅° 단체가 파리 기지λ₯Ό κ³΅μœ ν•˜λ©΄μ„œ 곡식적인 μˆ˜λ„ κΈ°λŠ₯을 μžƒμ—ˆλ‹€λŠ” μ μ—μ„œ 'νŠΉν—ˆ μˆ˜λ„'둜 κ²€μ—΄λ˜λ©° λ―Έκ΅­, 일본 λ“± 유럽 κ΅­κ°€λ“€ 쀑 파리λ₯Ό μ€‘μ‹¬μœΌλ‘œ ν•œ μ •λΆ€ μ£Όλ„μ˜ ν†΅μΉ˜κ°€ μœ λ €λ˜μ–΄ ν‘œμ€€ν™”λ˜μ—ˆμ£ .  \n> **λΉ„μœ **: ν”„λž‘μŠ€ μˆ˜λ„λŠ”

Training procedure

Visualize in Weights & Biases

This model was trained with SFT.

Framework versions

  • TRL: 0.15.0.dev0
  • Transformers: 4.49.0.dev0
  • Pytorch: 2.5.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin GallouΓ©dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
Downloads last month
8
Safetensors
Model size
1.54B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for whooray/Qwen2.5-1.5B-Open-R1-Distill-ko

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(195)
this model

Dataset used to train whooray/Qwen2.5-1.5B-Open-R1-Distill-ko