japanese-large-lm-1.7b-instruction-sft

This repository provides a 1.7B parameters Japanese language model, fine-tuned and trained by LINE Corporation.

For Japanese

่ฉณ็ดฐใช่ชฌๆ˜Žใ‚„ๅฎŸ้จ“ใซ้–ขใ—ใฆใฏใ€ŒInstruction Tuningใซใ‚ˆใ‚Šๅฏพ่ฉฑๆ€ง่ƒฝใ‚’ๅ‘ไธŠใ•ใ›ใŸ3.6Bๆ—ฅๆœฌ่ชž่จ€่ชžใƒขใƒ‡ใƒซใ‚’ๅ…ฌ้–‹ใ—ใพใ™ใ€ใ‚’ใ”่ฆงใใ ใ•ใ„ใ€‚

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 
model = AutoModelForCausalLM.from_pretrained("line-corporation/japanese-large-lm-1.7b-instruction-sft")
tokenizer = AutoTokenizer.from_pretrained("line-corporation/japanese-large-lm-1.7b-instruction-sft", use_fast=False)
generator = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
 
input_text = """ๅ››ๅ›ฝใฎ็œŒๅใ‚’ๅ…จใฆๅˆ—ๆŒ™ใ—ใฆใใ ใ•ใ„ใ€‚"""
text = generator(
    f"ใƒฆใƒผใ‚ถใƒผ: {input_text}\nใ‚ทใ‚นใƒ†ใƒ : ",
    max_length = 256,
    do_sample = True,
    temperature = 0.7,
    top_p = 0.9,
    top_k = 0,
    repetition_penalty = 1.1,
    num_beams = 1,
    pad_token_id = tokenizer.pad_token_id,
    num_return_sequences = 1,
)
print(text)
# [{'generated_text': 'ใƒฆใƒผใ‚ถใƒผ: ๅ››ๅ›ฝใฎ็œŒๅใ‚’ๅ…จใฆๅˆ—ๆŒ™ใ—ใฆใใ ใ•ใ„ใ€‚\nใ‚ทใ‚นใƒ†ใƒ :  ้ฆ™ๅท็œŒใ€ๅพณๅณถ็œŒใ€ๆ„›ๅช›็œŒใ€้ซ˜็Ÿฅ็œŒ'}]

Tokenization

We use a sentencepiece tokenizer with a unigram language model and byte-fallback. We do not apply pre-tokenization with Japanese tokenizer. Thus, a user may directly feed raw sentences into the tokenizer.

License

Apache License, Version 2.0

Downloads last month
441
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for line-corporation/japanese-large-lm-1.7b-instruction-sft

Merges
1 model

Space using line-corporation/japanese-large-lm-1.7b-instruction-sft 1