Himeyuri v0.1 12B

drawing

Base Model

This model is built upon Mistral-Nemo-Instruct-2407

Usage Notes

Low Temperature Recommendation: As explained on Mistral-Nemo-Instruct-2407 repo, it's recommended to use a low temperature. I don't exactly know why, but my rough guess is that based on my experience, Mistral Nemo tends to switch the scene drastically so a low temperature can mitigate it, somewhat akin to "slow pace" style in AI Novelist.

Description

This is an experimental model with significant room for improvement.

Since Japanese Mistral 7B-based models appear to have plateaued, I've been exploring other architectures like LLaMA 3. Currently, I find Mistral-Nemo-Instruct-2407 the most promising for Japanese open LLMs, so I made this one.

Example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Elizezen/Himeyuri-v0.1-12B")
model = AutoModelForCausalLM.from_pretrained(
  "Elizezen/Himeyuri-v0.1-12B",
  torch_dtype="auto",
)
model.eval()

if torch.cuda.is_available():
    model = model.to("cuda")

input_ids = tokenizer.encode(
    "ๅนดๆœซใฎใƒœใƒผใƒŠใ‚นใ‚’ๅ—ๅ–ใฃใฆๅŠ ๅฅˆๆฑŸใŒ็คพใ‹ใ‚‰ๅธฐใ‚ใ†ใจใ—ใŸใจใใงใ‚ใฃใŸใ€‚", 
    add_special_tokens=True, 
    return_tensors="pt"
)

tokens = model.generate(
    input_ids.to(device=model.device),
    max_new_tokens=512,
    temperature=0.35,
    top_p=1,
    do_sample=True,
)

out = tokenizer.decode(tokens[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
print(out)

"""
output example:
 ๅพŒใ‚ใ‹ใ‚‰่‚ฉใ‚’ๅฉใ‹ใ‚ŒใฆๆŒฏใ‚Šๅ‘ใใจใ€ใใ“ใซใฏ่ฆ‹็Ÿฅใ‚‰ใฌ็”ทใŒ็ซ‹ใฃใฆใ„ใŸใ€‚ใฉใ“ใ‹ใง่ฆ‹ใŸใ“ใจใŒใ‚ใ‚‹ใ‚ˆใ†ใช้ก”็ซ‹ใกใ‚’ใ—ใฆใ„ใ‚‹ใ€‚็”ทใฏใซใ“ใ‚„ใ‹ใซ็ฌ‘ใฃใฆๅŠ ๅฅˆๆฑŸใซ่ฉฑใ—ใ‹ใ‘ใฆใใŸใ€‚
 ใ€Œใ“ใ‚“ใฐใ‚“ใฏใ€‚ๅŠ ๅฅˆๆฑŸใ•ใ‚“ใงใ™ใ‚ˆใญ๏ผŸใ€€ใกใ‚‡ใฃใจใŠ่ฉฑใ—ใ—ใŸใ„ใ“ใจใŒใ‚ใ‚‹ใ‚“ใงใ™ใ‘ใฉใ€ใ„ใ„ใงใ™ใ‹๏ผŸใ€
 ใใฎ็”ทใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ้ฆ–ใ‚’ๅ‚พใ’ใชใŒใ‚‰ใ‚‚ใ€ใจใ‚Šใ‚ใˆใš่ฉฑใ‚’่žใใ“ใจใซใ—ใŸใ€‚ๅ‘จใ‚Šใซใฏ็คพๅ“กใŒใพใ ๆฎ‹ใฃใฆใ„ใ‚‹ใฎใงใ€ๅคงใใชๅ•้กŒใฏใชใ„ใ ใ‚ใ†ใจๆ€ใฃใŸใ‹ใ‚‰ใ ใ€‚
 ใ€Œไฝ•ใงใ—ใ‚‡ใ†๏ผŸใ€€็งใซไฝ•ใ‹็”จใงใ™ใ‹๏ผŸใ€
 ๅŠ ๅฅˆๆฑŸใŒใใ†่žใใจใ€็”ทใฏใซใ“ใ‚„ใ‹ใช็ฌ‘ใฟใ‚’ๆตฎใ‹ในใŸใพใพใ€ๅฝผๅฅณใฎ่€ณๅ…ƒใงๅฐๅฃฐใงๅ›ใ„ใŸใ€‚
 ใ€ŒๅฎŸใฏใ€ใ‚ใชใŸใฎๆ—ฆ้‚ฃใ•ใ‚“ใฎใ“ใจใงๅฐ‘ใ—ใŠ่ฉฑใ—ใ—ใŸใ„ใ“ใจใŒใ‚ใ‚‹ใ‚“ใงใ™ใ€‚ๅ ดๆ‰€ใ‚’ๅค‰ใˆใฆ่ฉฑใ‚’่žใ„ใฆใ„ใŸใ ใ‘ใพใ›ใ‚“ใ‹๏ผŸใ€
 ใใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ้ฉšใใฎ่กจๆƒ…ใ‚’ๆตฎใ‹ในใŸใ€‚ๅคซใฎใ“ใจใฏ็คพๅ†…ใงใฏใ‚ใพใ‚Š่ฉฑใ—ใŸใใชใ„ใ€‚ใ—ใ‹ใ—ใ€็”ทใฎๆง˜ๅญใ‹ใ‚‰ใฏใ€ๅคซใฎ่บซใซไฝ•ใ‹ใ‚ใฃใŸใฎใงใฏใชใ„ใ‹ใจใ„ใ†ไธๅฎ‰ใŒใ‚ˆใŽใฃใŸใ€‚
 ใ€Œๅˆ†ใ‹ใ‚Šใพใ—ใŸใ€‚ใ˜ใ‚ƒใ‚ใ€่ฟ‘ใใฎๅ–ซ่Œถๅบ—ใซ่กŒใใพใ—ใ‚‡ใ†ใ€
 ๅŠ ๅฅˆๆฑŸใŒใใ†่จ€ใ†ใจใ€็”ทใฏ้ ทใ„ใฆๅฝผๅฅณใฎๅพŒใซใคใ„ใฆใใŸใ€‚

ไบŒไบบใฏ่ฟ‘ใใฎๅ–ซ่Œถๅบ—ใซๅ…ฅใ‚Šใ€ๅฅฅใฎๅธญใซ่…ฐใ‚’ไธ‹ใ‚ใ—ใŸใ€‚
 ใ€Œใใ‚Œใงใ€ๅคซใฎ่บซใซไฝ•ใŒใ‚ใฃใŸใ‚“ใงใ™ใ‹๏ผŸใ€
 ๅŠ ๅฅˆๆฑŸใฏ็”ทใฎ้ก”ใ‚’่ฆ‹ใชใŒใ‚‰ใ€ไธๅฎ‰ใ’ใซ่žใ„ใŸใ€‚็”ทใฏใ‚ณใƒผใƒ’ใƒผใ‚’ไธ€ๅฃ้ฃฒใ‚€ใจใ€ๅฝผๅฅณใซๅ‘ใ‹ใฃใฆ่ฉฑใ—ๅง‹ใ‚ใŸใ€‚
 ใ€ŒๅฎŸใฏใ€ใ‚ใชใŸใฎๆ—ฆ้‚ฃใ•ใ‚“ใฏใ€ใ‚ใ‚‹็ง˜ๅฏ†ใฎ็ต„็น”ใซๆ‰€ๅฑžใ—ใฆใ„ใ‚‹ใ‚“ใงใ™ใ€‚ใใฎ็ต„็น”ใฏใ€ไธ–็•Œไธญใซๅฝฑ้ŸฟๅŠ›ใ‚’ๆŒใคใ‚ˆใ†ใชๅคง่ฆๆจกใช็ต„็น”ใงใ€ๆง˜ใ€…ใชๅ›ฝๅฎถใฎ่ฆไบบใ‚„ๆœ‰ๅŠ›่€…ใŸใกใŒๅŠ ๅ…ฅใ—ใฆใ„ใ‚‹ใจ่จ€ใ‚ใ‚Œใฆใ„ใพใ™ใ€‚ๆ—ฆ้‚ฃใ•ใ‚“ใ‚‚ใใฎไธ€ๅ“กใงใ€ใ‹ใชใ‚Š้ซ˜ใ„ๅœฐไฝใซใ„ใ‚‹ใ‚ˆใ†ใงใ™ใ€
 ใใฎ่จ€่‘‰ใซๅŠ ๅฅˆๆฑŸใฏ็›ฎใ‚’ไธธใใ—ใŸใ€‚ๅคซใŒใใ‚“ใช็ต„็น”ใซๆ‰€ๅฑžใ—ใฆใ„ใ‚‹ใ ใชใ‚“ใฆใ€ๅˆ่€ณใงใ‚ใ‚‹ใ€‚
 ใ€Œใใ‚“ใช้ฆฌ้นฟใชใ€‚ใ†ใกใฎไบบใฏใŸใ ใฎใ‚ตใƒฉใƒชใƒผใƒžใƒณใงใ™ใ‚ˆใ€‚ใใ‚“ใช
"""

Intended Use

Primarily designed for novel generation. Not optimized for:

  • Role-playing (RP) scenarios
  • Instruction-based responses
Downloads last month
51
Safetensors
Model size
12.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Elizezen/Himeyuri-v0.1-12B

Merges
3 models
Quantizations
3 models

Spaces using Elizezen/Himeyuri-v0.1-12B 3