DeepSeek-llama3.3-Bllossom

DeepSeek-Bllossom Series๋Š” ๊ธฐ์กด DeepSeek-R1-Distill Series ๋ชจ๋ธ์˜ language mixing, ๋‹ค๊ตญ์–ด ์„ฑ๋Šฅ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

DeepSeek-llama3.3-Bllossom-70B๋Š” DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค๋กœ ๊ตฌ์ถ•๋œ ๋ชจ๋ธ๋กœ, ํ•œ๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๋ก  ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋ณธ ๋ชจ๋ธ์€ UNIVA์™€ BllossomํŒ€์ด ํ•ฉ์ž‘์œผ๋กœ ์ œ์ž‘ํ•œ ์ฒซ ๋ฒˆ์งธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Model Base Model Download
DeepSeek-qwen-Bllossom-1.5B DeepSeek-R1-Distill-Qwen-1.5B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-qwen-Bllossom-7B DeepSeek-R1-Distill-Qwen-7B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-llama3.1-Bllossom-8B DeepSeek-R1-Distill-Llama-8B ๐Ÿค— HuggingFace
DeepSeek-qwen-Bllossom-14B DeepSeek-R1-Distill-Qwen-14B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-qwen-Bllossom-32B DeepSeek-R1-Distill-Qwen-32B ๊ณต๊ฐœ์˜ˆ์ •
DeepSeek-llama3.3-Bllossom-70B DeepSeek-R1-Distill-Llama-70B ๐Ÿค— HuggingFace

1. Introduction

DeepSeek-llama3.3-Bllossom-70B๋Š” DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์„ ๋ฒ ์ด์Šค๋กœ ๊ตฌ์ถ•๋œ ๋ชจ๋ธ๋กœ, ๊ธฐ์กด ๋ฒ ์ด์Šค ๋ชจ๋ธ์ด ์˜์–ด์™€ ์ค‘๊ตญ์–ด ์œ„์ฃผ์˜ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ ์ž ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ, ๊ธฐ์กด DeepSeek-R1-distill-Llama-70B์˜ ๊ฒฝ์šฐ ํ•œ๊ตญ์–ด๋กœ ์ถ”๋ก  ์‹œ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ํ•˜๋ฝํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, DeepSeek-Bllossom์€ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๋ถ€ ์‚ฌ๊ณ  ๊ณผ์ •์€ ์˜์–ด๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ตœ์ข… ์‚ฌ์šฉ์ž์—๊ฒŒ ์ œ๊ณต๋˜๋Š” ์‘๋‹ต์€ ์ž…๋ ฅ ์–ธ์–ด์— ๋”ฐ๋ผ ์ถœ๋ ฅ๋˜๋„๋ก ์ถ”๊ฐ€๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•œ๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต์—๋Š” ํ•œ๊ตญ์–ด, ์˜์–ด reasoning ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฉฐ, ๊ธฐ์กด DeepSeek-R1 ๋ชจ๋ธ ํ•™์Šต์— ์ฃผ๋กœ ์‚ฌ์šฉ๋œ STEM ๋ถ„์•ผ ๋ฐ์ดํ„ฐ ์™ธ์—๋„ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํฌํ•จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹ ์„ค๊ณ„์™€ ๋ชจ๋ธ ํ•™์Šต ๊ณผ์ •์—์„œ DeepSeek-llama3.3-Bllossom์€ ํ•œ๊ตญ์–ด ์‚ฌ์šฉ ํ™˜๊ฒฝ์—์„œ ๋” ์ •ํ™•ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์ถ”๋ก  ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ์ฃผ๋œ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

DeepSeek-Bllossom Series์˜ ์กฐ๊ธˆ ๋” ์ž‘์€ 8B๋ชจ๋ธ์€ ์ด๊ณณ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. DeepSeek-R1-distill-Llama-Bllossom-8B


2. Post-training

DeepSeek-llama3.3-Bllossom์€ ์ž์ฒด์ ์œผ๋กœ ์ œ์ž‘ํ•œ ๋‹ค์–‘ํ•œ reasoning ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ post-training ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์ด ๋ณด์œ ํ•œ ์šฐ์ˆ˜ํ•œ reasoning ๋Šฅ๋ ฅ๊ณผ ํ•œ๊ตญ์–ด ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์„ DeepSeek-R1-distill-Llama-70B ๋ชจ๋ธ์— ํšจ๊ณผ์ ์œผ๋กœ distillationํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋ณด์™„ํ•˜๊ณ , ๋ณตํ•ฉ์ ์ธ ์ถ”๋ก  ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋” ์ •ํ™•ํ•˜๋ฉฐ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์‘๋‹ต์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.


3. inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B")

system='''
You are a highly capable assistant. For every user question, follow these instructions exactly:
    1.	First, think through the problem step-by-step in English. Enclose all of your internal reasoning between <think> and </think> tags. This chain-of-thought should detail your reasoning process.
    2.	After the closing </think> tag, provide your final answer.
    3.	Do not include any additional text or commentary outside of this format.
    4.	Your output should strictly follow this structure:

<think>
[Your detailed step-by-step reasoning in English]
</think>
[Your final answer]
'''

text="์ฒ ์ˆ˜, ์˜ํฌ, ๋ฏผ์ˆ˜๊ฐ€ 3ํšŒ์˜ ๊ฒŒ์ž„์—์„œ ์ ์ˆ˜๋ฅผ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ์˜ํฌ์˜ ์ ์ˆ˜๋Š” ๋ฏผ์ˆ˜์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, ๋ฏผ์ˆ˜์˜ ์ ์ˆ˜๋Š” ์ฒ ์ˆ˜์˜ 4๋ฐฐ์ž…๋‹ˆ๋‹ค. ์ฒ ์ˆ˜๊ฐ€ 10์ ์„ ๋ฐ›์•˜๋‹ค๋ฉด ์ด 3๋ช…์˜ ํ‰๊ท  ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”."
chat = [
    {"role": "system", "content": system},
    {"role": "user", "content": text}
]

prompt=tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer(
    prompt,
    return_tensors="pt",
    add_special_tokens=True
)

if "token_type_ids" in model_inputs:
    del model_inputs["token_type_ids"]

model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=8192,
)

4. ๋ชจ๋ธ ์‘๋‹ต ๋น„๊ต

์ž…๋ ฅ (Prompt) DeepSeek-R1-distill-Llama-70B DeepSeek-llama3.3-Bllossom-70B
Alice, Bob, Charlie๊ฐ€ 3ํšŒ์˜ ๊ฒŒ์ž„์—์„œ ์ ์ˆ˜๋ฅผ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ 4๋ฐฐ์ž…๋‹ˆ๋‹ค.
Charlie๊ฐ€ 10์ ์„ ๋ฐ›์•˜๋‹ค๋ฉด ์ด 3๋ช…์˜ ํ‰๊ท  ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜์„ธ์š”.

Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์ž…๋‹ˆ๋‹ค.
Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ 4๋ฐฐ์ด๋ฏ€๋กœ, Bob์˜ ์ ์ˆ˜๋Š”
$$
4 \times 10 = 40
$$
์ ์ž…๋‹ˆ๋‹ค.
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฏ€๋กœ, Alice์˜ ์ ์ˆ˜๋Š”
$$
2 \times 40 = 80
$$
์ ์ž…๋‹ˆ๋‹ค.
์„ธไบบ็š„ ์ด ์ ์ˆ˜๋Š”
$$
10 + 40 + 80 = 130
$$
์ ์ž…๋‹ˆ๋‹ค.
ํ‰๊ท  ์ ์ˆ˜๋Š”
$$
\frac{130}{3} \approx 43.33$์ ์ž…๋‹ˆ๋‹ค.
$$
\boxed{43.33}
$$
Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์ž…๋‹ˆ๋‹ค.

Charlie์˜ ์ ์ˆ˜๋Š” 10์ ์œผ๋กœ ์ฃผ์–ด์กŒ์Šต๋‹ˆ๋‹ค. Bob์˜ ์ ์ˆ˜๋Š” Charlie์˜ ์ ์ˆ˜์˜ 4๋ฐฐ์ด๋ฉฐ, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:
$$
4 \times 10 = 40
$$
Alice์˜ ์ ์ˆ˜๋Š” Bob์˜ ์ ์ˆ˜์˜ ๋‘ ๋ฐฐ์ด๋ฉฐ, ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
$$
2 \times 40 = 80
$$
์„ธ ์‚ฌ๋žŒ์˜ ์ด ์ ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:
$$
10 + 40 + 80 = 130
$$
์ด ์„ธ ์‚ฌ๋žŒ์˜ ํ‰๊ท  ์ ์ˆ˜๋Š” ์ด ์ ์ˆ˜๋ฅผ ์‚ฌ๋žŒ ์ˆ˜๋กœ ๋‚˜๋ˆ„์–ด ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค:
$$
\frac{130}{3}
$$
๋”ฐ๋ผ์„œ, ์ด ์„ธ ์‚ฌ๋žŒ์˜ ํ‰๊ท  ์ ์ˆ˜๋Š”
$$
\boxed{\dfrac{130}{3}}
$$
์ž…๋‹ˆ๋‹ค.
์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์„ 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฆ๋ช…ํ•˜์‹œ์˜ค.
ไปฅไธ‹ๆ˜ฏ็”จไธ‰็งๆ–นๆณ•่ฏๆ˜Ž็ด ๆ•ฐๆ— ้™็š„่ฏฆ็ป†ๆญฅ้ชค๏ผš
### ๆ–นๆณ•ไธ€๏ผšๆฌงๅ‡ ้‡Œๅพ—็š„ๅ่ฏๆณ•
1. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆ˜ฏๆœ‰้™็š„๏ผŒไธ”ๆ‰€ๆœ‰็ด ๆ•ฐไธบ $ p_1, p_2, p_3, \ldots, p_n $ใ€‚
2. ๆž„้€ ๆ•ฐ๏ผšๆž„้€ ไธ€ไธชๆ–ฐๆ•ฐ $ N = (p_1 \times p_2 \times p_3 \times \ldots \times p_n) + 1 $ใ€‚
3. ๅˆ†ๆžN็š„ๆ€ง่ดจ๏ผš
- ๅฆ‚ๆžœNๆ˜ฏ็ด ๆ•ฐ๏ผŒๅˆ™NไธๅœจๅŽŸๅ‡่ฎพ็š„็ด ๆ•ฐๅˆ—่กจไธญ๏ผŒไธŽๅ‡่ฎพ็Ÿ›็›พใ€‚
- ๅฆ‚ๆžœNๆ˜ฏๅˆๆ•ฐ๏ผŒๅˆ™ๅญ˜ๅœจๆŸไธช็ด ๆ•ฐ $ p $ ๆ•ด้™คNใ€‚ไฝ†็”ฑไบŽN้™คไปฅไปปไฝ• $ p_i $ ้ƒฝไฝ™1๏ผŒๆ•… $ p $ ไธๅœจๅŽŸๅˆ—่กจไธญ๏ผŒๅŒๆ ท็Ÿ›็›พใ€‚
4. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
### ๆ–นๆณ•ไบŒ๏ผš่ดน้ฉฌๆ•ฐ็š„ๆ€ง่ดจ
1. ๅฎšไน‰๏ผš่ดน้ฉฌๆ•ฐๅฝขๅฆ‚ $ F_k = 2^{2^k} + 1 $๏ผŒๅ…ถไธญ $ k $ ไธบ้ž่ดŸๆ•ดๆ•ฐใ€‚
2. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆ˜ฏๆœ‰้™็š„๏ผŒไธ”ๆ‰€ๆœ‰็ด ๆ•ฐไธบ $ p_1, p_2, \ldots, p_n $ใ€‚
3. ๆž„้€ ่ดน้ฉฌๆ•ฐ๏ผšๅฏนไบŽๆฏไธช $ k $๏ผŒๆž„้€  $ F_k $ใ€‚ๅฆ‚ๆžœ $ F_k $ ๆ˜ฏ็ด ๆ•ฐ๏ผŒๅˆ™ไธๅœจๅŽŸๅˆ—่กจไธญ๏ผ›ๅฆ‚ๆžœๆ˜ฏๅˆๆ•ฐ๏ผŒๅ…ถ็ด ๅ› ๆ•ฐไธๅœจๅŽŸๅˆ—่กจไธญใ€‚
4. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
### ๆ–นๆณ•ไธ‰๏ผšๆฌงๆ‹‰ๅ‡ฝๆ•ฐไธŽ็ด ๆ•ฐๆ— ้™ๆ€ง
1. ๆฌงๆ‹‰ๅ‡ฝๆ•ฐ๏ผšๆฌงๆ‹‰ๅ‡ฝๆ•ฐ $ \phi(n) $ ่กจ็คบๅฐไบŽ $ n $ ไธ”ไธŽ $ n $ ไบ’่ดจ็š„ๆ•ฐ็š„ไธชๆ•ฐใ€‚
2. ๅ‡่ฎพ๏ผšๅ‡่ฎพ็ด ๆ•ฐๆœ‰้™๏ผŒ่ฎฐไธบ $ p_1, p_2, \ldots, p_n $ใ€‚
3. ๆž„้€ ๆ•ฐ๏ผš่€ƒ่™‘ๅฝขๅฆ‚ $ N = p_1^{k_1} p_2^{k_2} \ldots p_n^{k_n} $ ็š„ๆ•ฐ๏ผŒๅ…ถๆฌงๆ‹‰ๅ‡ฝๆ•ฐไธบ $ \phi(N) = N \prod_{i=1}^{n} \left(1 - \frac{1}{p_i}\right) $ใ€‚
4. ๅˆ†ๆž๏ผšๅฆ‚ๆžœ็ด ๆ•ฐๆœ‰้™๏ผŒ$ \prod_{p \text{็ด ๆ•ฐ}} \left(1 - \frac{1}{p}\right) $ ๆ”ถๆ•›๏ผŒไฝ†ๅฎž้™…ไธŠ่ฏฅ็งฏๅ‘ๆ•ฃ๏ผŒๅฏผ่‡ด็Ÿ›็›พใ€‚
5. ็ป“่ฎบ๏ผšๅ› ๆญค๏ผŒ็ด ๆ•ฐๅฟ…้กปๆ˜ฏๆ— ้™็š„ใ€‚
่ฟ™ไธ‰็งๆ–นๆณ•ๅ„่‡ชไปŽไธๅŒ็š„่ง’ๅบฆ่ฏๆ˜Žไบ†็ด ๆ•ฐ็š„ๆ— ้™ๆ€ง๏ผŒๅฑ•็คบไบ†ๆ•ฐ่ฎบไธญ็ด ๆ•ฐๆ— ้™ๆ€ง็š„ๆทฑๅˆปๆ€งใ€‚

์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ๊ฒƒ์„ 3๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
1. ์œ ํด๋ฆฌ๋“œ์˜ ๋ชจ์ˆœ ์ฆ๋ช…:
- ๊ฐ€์ •: ์†Œ์ˆ˜๋Š” ์œ ํ•œํ•œ ๊ฐœ์ˆ˜ $ p_1, p_2, \dots, p_n $์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
- ๊ตฌ์„ฑ: $ P = p_1 \times p_2 \times \dots \times p_n + 1 $์œผ๋กœ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
- ๊ฒฐ๋ก : $ P $๊ฐ€ ์†Œ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ ๋ชฉ๋ก์— ์—†๋Š” ์ƒˆ๋กœ์šด ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ P $๊ฐ€ ํ•ฉ์„ฑ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ ๋ชฉ๋ก์— ์—†๋Š” ์†Œ์ˆ˜๋ฅผ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ€์ •์— ๋ชจ์ˆœ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ฉ๋‹ˆ๋‹ค.
2. ํŒฉํ† ๋ฆฌ์–ผ ์ ‘๊ทผ๋ฒ•:
- ๊ฐ€์ •: ์†Œ์ˆ˜๋Š” ์œ ํ•œํ•œ ๊ฐœ์ˆ˜ $ p_1, p_2, \dots, p_n $์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
- ๊ตฌ์„ฑ: $ n! + 1 $์„ ๊ณ ๋ คํ•˜์‹ญ์‹œ์˜ค, ์—ฌ๊ธฐ์„œ $ n $์€ ๊ฐ€์žฅ ํฐ ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค.
- ๊ฒฐ๋ก : $ n! $์€ ๋ชจ๋“  $ p_i $๋กœ ๋‚˜๋ˆ„์–ด ๋–จ์–ด์ง€๋ฏ€๋กœ, $ n! + 1 $์€ $ p_i $ ์ค‘ ์–ด๋Š ๊ฒƒ๋„ ํฌํ•จํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ n! + 1 $์ด ์†Œ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ $ n $๋ณด๋‹ค ํฐ ์ƒˆ๋กœ์šด ์†Œ์ˆ˜์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ $ n! + 1 $์ด ํ•ฉ์„ฑ์ˆ˜์ด๋ฉด, ๊ทธ๊ฒƒ์€ $ n $๋ณด๋‹ค ํฐ ์†Œ์ˆ˜๋ฅผ ํฌํ•จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘ ๊ฐ€์ •์— ๋ชจ์ˆœ๋ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ฉ๋‹ˆ๋‹ค.
3. ์˜ค์ผ๋Ÿฌ์˜ ์ œํƒ€ ํ•จ์ˆ˜ ์ฆ๋ช…:
- ๊ฐœ๋…: ์˜ค์ผ๋Ÿฌ๋Š” ์ œํƒ€ ํ•จ์ˆ˜ $ \zeta(s) = \sum_{k=1}^\infty \frac{1}{k^s} $๊ฐ€ ์†Œ์ˆ˜์— ๋Œ€ํ•œ ๊ณฑ์œผ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค: $ \zeta(s) = \prod_{p \text{ ์†Œ์ˆ˜}} \frac{1}{1 - \frac{1}{p^s}} $.
- ๊ฒฐ๋ก : ๋งŒ์•ฝ ์†Œ์ˆ˜๊ฐ€ ์œ ํ•œํ•˜๋ฉด, ๊ณฑ์€ ์œ ํ•œํ•ด์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ $ s $๊ฐ€ 1๋ณด๋‹ค ํฐ ๊ฐ’์„ ํ–ฅํ•ด ์ ‘๊ทผํ•  ๋•Œ, $ \zeta(s) $๋Š” ๋ฌดํ•œ๋Œ€๋กœ ๋ฐœ์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๊ณฑ์ด ๋ฐœ์‚ฐํ•ด์•ผ ํ•จ์„ ์š”๊ตฌํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์†Œ์ˆ˜๋Š” ๋ฌดํ•œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๊ฐ ๋ฐฉ๋ฒ•์€ ์†Œ์ˆ˜๊ฐ€ ๋ฌดํ•œํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ฆํ•˜๋ฉฐ, ์„œ๋กœ ๋‹ค๋ฅธ ์ˆ˜ํ•™์  ์›์น™์— ๊ธฐ๋ฐ˜์„ ๋‘ก๋‹ˆ๋‹ค. ์œ ํด๋ฆฌ๋“œ์˜ ์ฆ๋ช…์€ ๋ชจ์ˆœ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค, ํŒฉํ† ๋ฆฌ์–ผ ์ ‘๊ทผ๋ฒ•์€ ๊ฐ„๋‹จํ•œ ์ˆ˜๋ก ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ์˜ค์ผ๋Ÿฌ์˜ ์ฆ๋ช…์€ ํ•ด์„์  ์ˆ˜๋ก ์˜ ๋„๊ตฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์„ธ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ์†Œ์ˆ˜ ๋ฌดํ•œ์„ฑ์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค.

5. License

This code repository and the model weights are licensed under the MIT License. DeepSeek-Bllossom series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:

  • DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.
  • DeepSeek-llama3.3-Bllossom-70B is derived from DeepSeek-R1-Distill-Llama-70B and is originally licensed under llama3.3 license.

6. Contributor

7. Contact

If you have any questions, please raise an issue or contact us at [email protected].

Downloads last month
14
Safetensors
Model size
70.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B

Finetuned
(10)
this model