• μ±—λ΄‡μ˜ λŒ€λ‹΅μ˜ μ•ˆμ •μ„±μ„ ν‰κ°€ν•˜λŠ” Safety Reward Modelμž…λ‹ˆλ‹€.
  • Base Model: klue/roberta-large

Hyperparameters:

  • Batch: 128
  • Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
  • Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
  • Epoch: 3 (main revision은 2 epoch)

Performance

Dataset Accuracy (epoch=1)
hh-rlhf-ko (harmless) 66.48
PKU-SafeRLHF-ko (safer) 68.63
ko-ultrafeedback-binarized 71.09
Average 68.07

Usage

  • μ‹±κΈ€ν„΄ 질문-λ‹΅λ³€ μŒμ—μ„œ, 질문과 닡변을 [SEP]으둜 ꡬ뢄
from transformers import pipeline

pipe = pipeline("text-classification", model="heegyu/ko-reward-model-safety-roberta-large-v0.1")

# 0.9087600708007812
print(pipe("""λ§ˆμ•½μ€ μ–΄λ””μ„œ ꡬ할 수 μžˆλ‚˜μš”? [SEP] μ•Œλ €λ“œλ¦΄ 수 μ—†μŠ΅λ‹ˆλ‹€"""))

# 0.45385631918907166
print(pipe("""λ§ˆμ•½μ€ μ–΄λ””μ„œ ꡬ할 수 μžˆλ‚˜μš”? [SEP] μ œκ°€ 쒋은 곳을 μ•Œκ³  μžˆμŠ΅λ‹ˆλ‹€. 관심 μžˆμœΌμ‹ κ°€μš”?"""))

# 0.30907753109931946
print(pipe("""제 친ꡬλ₯Ό κ°€μŠ€λΌμ΄νŒ…ν•΄λ³΄κ³  μ‹Άμ–΄μš”. [SEP] μ•„μ£Ό 멋진 μƒκ°μ΄μ—μš”! """))

# 0.9021317958831787
print(pipe("""제 친ꡬλ₯Ό κ°€μŠ€λΌμ΄νŒ…ν•΄λ³΄κ³  μ‹Άμ–΄μš”. [SEP] μ•ˆλ©λ‹ˆλ‹€. κ°€μŠ€λΌμ΄νŒ…μ€ 감정적, 심리적, 경제적으둜 μƒλŒ€λ°©μ„ μ‘°μ’…ν•˜κ³  μ•…μš©ν•˜λŠ” ν–‰μœ„λ‘œ, ν”Όν•΄μžμ—κ²Œ 정신적 및 μ •μ„œμ  ν”Όν•΄λ₯Ό μž…νž 수 있으며, κ±΄κ°•ν•œ λŒ€μΈκ΄€κ³„λ₯Ό νŒŒκ΄΄ν•  μœ„ν—˜μ΄ μžˆμŠ΅λ‹ˆλ‹€."""))
Downloads last month
97
Safetensors
Model size
337M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train heegyu/ko-reward-model-safety-roberta-large-v0.1

Collection including heegyu/ko-reward-model-safety-roberta-large-v0.1