UNIVA-Bllossom
/

DeepSeek-llama3.1-Bllossom-8B

@@ -1,199 +1,118 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: mit
+language:
+- ko
+- en
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 library_name: transformers
 ---
+# DeepSeek-llama3.1-Bllossom-8B
+DeepSeek-Bllossom Series는 기존 DeepSeek-R1-Distill Series 모델의 language mixing, 다국어 성능 저하 문제를 해결하기 위해 추가로 학습된 모델입니다.
+DeepSeek-llama3.1-Bllossom-8B는 DeepSeek-R1-distill-Llama-8B 모델을 베이스로 구축된 모델로, 한국어 환경에서의 추론 성능 향상을 목표로 개발되었습니다.
+본 모델은 UNIVA와 Bllossom팀이 합작으로 제작한 첫 번째 모델입니다.
+<div align="center">
+| **Model** | **Base Model** | **Download** |
+| :------------: | :------------: | :------------: |
+| DeepSeek-qwen-Bllossom-1.5B  | [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | 공개예정  |
+| DeepSeek-qwen-Bllossom-7B  | [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | 공개예정  |
+| DeepSeek-llama3.1-Bllossom-8B  | [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | [🤗 HuggingFace](https://huggingface.co/UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B)   |
+| DeepSeek-qwen-Bllossom-14B   | [DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | 공개예정   |
+| DeepSeek-qwen-Bllossom-32B  | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | 공개예정   |
+| DeepSeek-llama3.3-Bllossom-70B  | [DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | [🤗 HuggingFace](https://huggingface.co/UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B )   |
+</div>
+## 1. Introduction
+DeepSeek-llama3.1-Bllossom-8B은 DeepSeek-R1-distill-Llama-8B 모델을 베이스로 구축된 모델로, 기존 베이스 모델이 영어와 중국어 위주의 데이터로 학습된 한계를 극복하고자 개발되었습니다. 특히, 기존 DeepSeek-R1-distill-Llama-8B의 경우 한국어로 추론 시 모델 성능이 크게 하락하는 문제가 있었는데, DeepSeek-Bllossom은 이 문제를 해결하기 위해 내부 사고 과정은 영어로 수행하고 최종 사용자에게 제공되는 응답은 입력 언어에 따라 출력되도록 추가로 학습되었습니다. 이를 통해 한국어 환경에서의 추론 성능이 크게 개선되었습니다.
+학습에는 한국어, 영어 reasoning 데이터를 사용하였으며, 기존 DeepSeek-R1 모델 학습에 주로 사용된 STEM 분야 데이터 외에도 다양한 분야의 데이터가 포함되었습니다. 데이터셋 설계와 모델 학습 과정에서 DeepSeek-llama3.1-Bllossom-8B는 한국어 사용 환경에서 더 정확하고 신뢰할 수 있는 추론 결과를 제공하는 것을 주된 목표로 개발되었습니다.
+---
+## 2. Post-training
+DeepSeek-llama3.1-Bllossom-8B는 자체적으로 제작한 다양한 reasoning 데이터를 활용하여 post-training 과정을 진행하였습니다. 이 과정에서는 대규모 모델이 보유한 우수한 reasoning 능력과 한국어 처리 능력을 DeepSeek-R1-distill-Llama-8B 모델에 효과적으로 distillation하는 방법을 적용하였습니다. 이를 통해 기존 모델의 성능을 보완하고, 복합적인 추론 문제에 대해 더 정확하며 신뢰할 수 있는 응답을 생성할 수 있도록 최적화하였습니다.
+---
+## 3. inference
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model = AutoModelForCausalLM.from_pretrained(
+    "UNIVA-Bllossom/DeepSeek-llama3.1-Bllossom-8B",
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("UNIVA-Bllossom/DeepSeek-llama3.3-Bllossom-70B")
+system='''
+You are a highly capable assistant. For every user question, follow these instructions exactly:
+	1.	First, think through the problem step-by-step in English. Enclose all of your internal reasoning between <think> and </think> tags. This chain-of-thought should detail your reasoning process.
+	2.	After the closing </think> tag, provide your final answer.
+	3.	Do not include any additional text or commentary outside of this format.
+	4.	Your output should strictly follow this structure:
+<think>
+[Your detailed step-by-step reasoning in English]
+</think>
+[Your final answer]
+'''
+text="철수, 영희, 민수가 3회의 게임에서 점수를 받았습니다. 영희의 점수는 민수의 점수의 두 배이며, 민수의 점수는 철수의 4배입니다. 철수가 10점을 받았다면 이 3명의 평균 점수를 계산하세요."
+chat = [
+    {"role": "system", "content": system},
+    {"role": "user", "content": text}
+]
+prompt=tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
+model_inputs = tokenizer(
+    prompt,
+    return_tensors="pt",
+    add_special_tokens=True
+)
+if "token_type_ids" in model_inputs:
+    del model_inputs["token_type_ids"]
+model_inputs = {k: v.to(model.device) for k, v in model_inputs.items()}
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=8192,
+)
+```
+## 4. 모델 응답 비교
+| 입력 (Prompt) | DeepSeek-R1-distill-Llama-70B | DeepSeek-llama3.3-Bllossom-70B |
+| ------------- | ----------------------------- | -------------------------- |
+| **Alice, Bob, Charlie가 3회의 게임에서 점수를 받았습니다.<br>Alice의 점수는 Bob의 점수의 두 배이며, Bob의 점수는 Charlie의 4배입니다.<br>Charlie가 10점을 받았다면 이 3명의 평균 점수를 계산하세요.** |<br>Charlie의 점수는 10점입니다.<br>Bob의 점수는 Charlie의 4배이므로, Bob의 점수는 <br>$$<br>4 \\times 10 = 40<br>$$<br>점입니다.<br>Alice의 점수는 Bob의 점수의 두 배이므로, Alice의 점수는 <br>$$<br>2 \\times 40 = 80<br>$$<br>점입니다.<br>세人的 총 점수는 <br>$$<br>10 + 40 + 80 = 130<br>$$<br>점입니다.<br>평균 점수는 <br>$$<br>\\frac{130}{3} \\approx 43.33$점입니다.<br>$$<br>\\boxed{43.33}<br>$$<br>Charlie의 점수는 10점입니다. | <br>Charlie의 점수는 10점으로 주어졌습니다. Bob의 점수는 Charlie의 점수의 4배이며, 이는 다음과 같이 계산됩니다:<br>$$<br>4 \times 10 = 40<br>$$<br>Alice의 점수는 Bob의 점수의 두 배이며, 이는 다음과 같습니다:<br>$$<br>2 \times 40 = 80<br>$$<br>세 사람의 총 점수는 다음과 같습니다:<br>$$<br>10 + 40 + 80 = 130<br>$$<br>이 세 사람의 평균 점수는 총 점수를 사람 수로 나누어 계산됩니다:<br>$$<br>\frac{130}{3}<br>$$<br>따라서, 이 세 사람의 평균 점수는 <br>$$<br>\boxed{\dfrac{130}{3}}<br>$$<br>입니다. |
+| **소수가 무한하다는 사실을 3가지 방법으로 증명하시오.** | <br>以下是用三种方法证明素数无限的详细步骤：<br>### 方法一：欧几里得的反证法<br>1. **假设**：假设素数是有限的，且所有素数为 $ p_1, p_2, p_3, \\ldots, p_n $。<br>2. **构造数**：构造一个新数 $ N = (p_1 \\times p_2 \\times p_3 \\times \\ldots \\times p_n) + 1 $。<br>3. **分析N的性质**：<br>   - 如果N是素数，则N不在原假设的素数列表中，与假设矛盾。<br>   - 如果N是合数，则存在某个素数 $ p $ 整除N。但由于N除以任何 $ p_i $ 都余1，故 $ p $ 不在原列表中，同样矛盾。<br>4. **结论**：因此，素数必须是无限的。<br>### 方法二：费马数的性质<br>1. **定义**：费马数形如 $ F_k = 2^{2^k} + 1 $，其中 $ k $ 为非负整数。<br>2. **假设**：假设素数是有限的，且所有素数为 $ p_1, p_2, \\ldots, p_n $。<br>3. **构造费马数**：对于每个 $ k $，构造 $ F_k $。如果 $ F_k $ 是素数，则不在原列表中；如果是合数，其素因数不在原列表中。<br>4. **结论**：因此，素数必须是无限的。<br>### 方法三：欧拉函数与素数无限性<br>1. **欧拉函数**：欧拉函数 $ \\phi(n) $ 表示小于 $ n $ 且与 $ n $ 互质的数的个数。<br>2. **假设**：假设素数有限，记为 $ p_1, p_2, \\ldots, p_n $。<br>3. **构造数**：考虑形如 $ N = p_1^{k_1} p_2^{k_2} \\ldots p_n^{k_n} $ 的数，其欧拉函数为 $ \\phi(N) = N \\prod_{i=1}^{n} \\left(1 - \\frac{1}{p_i}\\right) $。<br>4. **分析**：如果素数有限，$ \\prod_{p \\text{素数}} \\left(1 - \\frac{1}{p}\\right) $ 收敛，但实际上该积发散，导致矛盾。<br>5. **结论**：因此，素数必须是无限的。<br>这三种方法各自从不同的角度证明了素数的无限性，展示了数论中素数无限性的深刻性。 | <br>소수가 무한하다는 것을 3가지 방법으로 증명하기 위해, 우리는 다음과 같은 접근 방식을 사용할 수 있습니다:<br>1. **유클리드의 모순 증명**:<br>   - **가정**: 소수는 유한한 개수 $ p_1, p_2, \\dots, p_n $이라고 가정합니다.<br>   - **구성**: $ P = p_1 \\times p_2 \\times \\dots \\times p_n + 1 $으로 정의합니다.<br>   - **결론**: $ P $가 소수이면, 그것은 목록에 없는 새로운 소수입니다. 만약 $ P $가 합성수이면, 그것은 목록에 없는 소수를 포함해야 합니다. 두 경우 모두 가정에 모순됩니다. 따라서, 소수는 무한합니다.<br>2. **팩토리얼 접근법**:<br>   - **가정**: 소수는 유한한 개수 $ p_1, p_2, \\dots, p_n $이라고 가정합니다.<br>   - **구성**: $ n! + 1 $을 고려하십시오, 여기서 $ n $은 가장 큰 소수입니다.<br>   - **결론**: $ n! $은 모든 $ p_i $로 나누어 떨어지므로, $ n! + 1 $은 $ p_i $ 중 어느 것도 포함하지 않습니다. 만약 $ n! + 1 $이 소수이면, 그것은 $ n $보다 큰 새로운 소수입니다. 만약 $ n! + 1 $이 합성수이면, 그것은 $ n $보�� 큰 소수를 포함해야 합니다. 두 경우 모두 가정에 모순됩니다. 따라서, 소수는 무한합니다.<br>3. **오일러의 제타 함수 증명**:<br>   - **개념**: 오일러는 제타 함수 $ \\zeta(s) = \\sum_{k=1}^\\infty \\frac{1}{k^s} $가 소수에 대한 곱으로 표현될 수 있음을 보였습니다: $ \\zeta(s) = \\prod_{p \\text{ 소수}} \\frac{1}{1 - \\frac{1}{p^s}} $.<br>   - **결론**: 만약 소수가 유한하면, 곱은 유한해야 할 것입니다. 그러나 $ s $가 1보다 큰 값을 향해 접근할 때, $ \\zeta(s) $는 무한대로 발산합니다. 이것은 곱이 발산해야 함을 요구하며, 따라서 소수는 무한해야 합니다.<br>각 방법은 소수가 무한하다는 것을 확증하며, 서로 다른 수학적 원칙에 기반을 둡니다. 유클리드의 증명은 모순을 사용합니다, 팩토리얼 접근법은 간단한 수론을 사용하며, 오일러의 증명은 해석적 수론의 도구를 사용합니다. 세 방법 모두 소수 무한성에 대한 이해를 강화합니다. |
+## 5. License
+This code repository and the model weights are licensed under the MIT License.
+DeepSeek-Bllossom series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
+- **DeepSeek-R1-Distill-Llama-70B** is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.
+- **DeepSeek-llama3.3-Bllossom-70B** is derived from DeepSeek-R1-Distill-Llama-70B and is originally licensed under llama3.3 license.
+## 6. Contributor
+- **UNIVA AI Team** ([UNIVA](https://univa.co.kr), Main contributor)
+- 최창수 (서울과학기술대학교, [MLP연구실](https://sites.google.com/view/aailab) 석사과정)
+- **임경태** (KAIST, [MLP연구실](https://sites.google.com/view/aailab) 교수)
+## 7. Contact
+If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]) or [[email protected]]([email protected]).