Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,52 +1,53 @@ | |
| 1 | 
            -
             | 
| 2 | 
            -
             | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 4 |  | 
| 5 | 
            -
             | 
| 6 | 
            -
              <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
         | 
| 7 | 
            -
            </div>
         | 
| 8 | 
            -
            <hr>
         | 
| 9 | 
            -
            <div align="center" style="line-height: 1;">
         | 
| 10 | 
            -
              <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
         | 
| 11 | 
            -
                <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
         | 
| 12 | 
            -
              </a>
         | 
| 13 | 
            -
              <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
         | 
| 14 | 
            -
                <img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
         | 
| 15 | 
            -
              </a>
         | 
| 16 | 
            -
              <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
         | 
| 17 | 
            -
                <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
         | 
| 18 | 
            -
              </a>
         | 
| 19 | 
            -
            </div>
         | 
| 20 |  | 
| 21 | 
            -
            <div align="center" style="line-height: 1;">
         | 
| 22 | 
            -
              <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
         | 
| 23 | 
            -
                <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
         | 
| 24 | 
            -
              </a>
         | 
| 25 | 
            -
              <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
         | 
| 26 | 
            -
                <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
         | 
| 27 | 
            -
              </a>
         | 
| 28 | 
            -
              <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
         | 
| 29 | 
            -
                <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
         | 
| 30 | 
            -
              </a>
         | 
| 31 | 
            -
            </div>
         | 
| 32 |  | 
| 33 | 
            -
             | 
| 34 | 
            -
             | 
| 35 | 
            -
                <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
         | 
| 36 | 
            -
              </a>
         | 
| 37 | 
            -
              <a href="https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL" style="margin: 2px;">
         | 
| 38 | 
            -
                <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
         | 
| 39 | 
            -
              </a>
         | 
| 40 | 
            -
            </div>
         | 
| 41 |  | 
|  | |
|  | |
| 42 |  | 
| 43 | 
            -
             | 
| 44 | 
            -
             | 
| 45 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 46 |  | 
|  | |
| 47 |  | 
| 48 | 
            -
             | 
|  | |
|  | |
| 49 |  | 
|  | |
|  | |
|  | |
|  | |
| 50 | 
             
            We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. 
         | 
| 51 | 
             
            To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. 
         | 
| 52 | 
             
            Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. 
         | 
| @@ -55,9 +56,6 @@ Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source | |
| 55 | 
             
            Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.
         | 
| 56 | 
             
            In addition, its training process is remarkably stable. 
         | 
| 57 | 
             
            Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. 
         | 
| 58 | 
            -
            <p align="center">
         | 
| 59 | 
            -
              <img width="80%" src="figures/benchmark.png">
         | 
| 60 | 
            -
            </p>
         | 
| 61 |  | 
| 62 | 
             
            ## 2. Model Summary
         | 
| 63 |  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            base_model: deepseek-ai/DeepSeek-V3
         | 
| 3 | 
            +
            language:
         | 
| 4 | 
            +
            - en
         | 
| 5 | 
            +
            library_name: transformers
         | 
| 6 | 
            +
            license: mit
         | 
| 7 | 
            +
            tags:
         | 
| 8 | 
            +
            - deepseek_v3
         | 
| 9 | 
            +
            - deepseek
         | 
| 10 | 
            +
            - unsloth
         | 
| 11 | 
            +
            - transformers
         | 
| 12 | 
            +
            ---
         | 
| 13 |  | 
| 14 | 
            +
            ## ***See [our collection](https://huggingface.co/collections/unsloth/deepseek-v3-all-versions-677cf5cfd7df8b7815fc723c) for versions of Deepseek V3 including GGUF, bf16 and original formats.***
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 15 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 16 |  | 
| 17 | 
            +
            # Finetune Llama 3.3, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth!
         | 
| 18 | 
            +
            We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 19 |  | 
| 20 | 
            +
            [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/unsloth)
         | 
| 21 | 
            +
            [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
         | 
| 22 |  | 
| 23 | 
            +
            # unsloth/DeepSeek-V3-GGUF
         | 
| 24 | 
            +
            For more details on the model, please go to Deepseek's original [model card](https://huggingface.co/deepseek-ai/DeepSeek-V3)
         | 
| 25 | 
            +
             | 
| 26 | 
            +
            ## ✨ Finetune for Free
         | 
| 27 | 
            +
             | 
| 28 | 
            +
            All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
         | 
| 29 | 
            +
             | 
| 30 | 
            +
            | Unsloth supports          |    Free Notebooks                                                                                           | Performance | Memory use |
         | 
| 31 | 
            +
            |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
         | 
| 32 | 
            +
            | **Llama-3.2 (3B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb)               | 2.4x faster | 58% less |
         | 
| 33 | 
            +
            | **Llama-3.2 (11B vision)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)               | 2x faster | 60% less |
         | 
| 34 | 
            +
            | **Qwen2 VL (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb)               | 1.8x faster | 60% less |
         | 
| 35 | 
            +
            | **Qwen2.5 (7B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb)               | 2x faster | 60% less |
         | 
| 36 | 
            +
            | **Llama-3.1 (8B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)               | 2.4x faster | 58% less |
         | 
| 37 | 
            +
            | **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb)               | 2x faster | 50% less |
         | 
| 38 | 
            +
            | **Gemma 2 (9B)**      | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb)               | 2.4x faster | 58% less |
         | 
| 39 | 
            +
            | **Mistral (7B)**    | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)               | 2.2x faster | 62% less |
         | 
| 40 |  | 
| 41 | 
            +
            [<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="200"/>](https://docs.unsloth.ai)
         | 
| 42 |  | 
| 43 | 
            +
            - This [Llama 3.2 conversational notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) is useful for ShareGPT ChatML / Vicuna templates.
         | 
| 44 | 
            +
            - This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
         | 
| 45 | 
            +
            - \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
         | 
| 46 |  | 
| 47 | 
            +
            ## Special Thanks
         | 
| 48 | 
            +
            A huge thank you to the Deepseek team for creating and releasing these models.
         | 
| 49 | 
            +
             | 
| 50 | 
            +
            ## Model Information
         | 
| 51 | 
             
            We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. 
         | 
| 52 | 
             
            To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. 
         | 
| 53 | 
             
            Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. 
         | 
|  | |
| 56 | 
             
            Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.
         | 
| 57 | 
             
            In addition, its training process is remarkably stable. 
         | 
| 58 | 
             
            Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. 
         | 
|  | |
|  | |
|  | |
| 59 |  | 
| 60 | 
             
            ## 2. Model Summary
         | 
| 61 |  | 

