Fix typos (#1)
Browse files- Fix typos (b09eb4e62489bd6ab99efe37c5de5a191d297248)
- Update README.md (a86fb434f954626112b1d4aef73dcb3be46b77e1)
Co-authored-by: Niels Rogge <[email protected]>
    	
        README.md
    CHANGED
    
    | 
         @@ -10,14 +10,15 @@ base_model: 
     | 
|
| 10 | 
         
             
            pipeline_tag: audio-to-audio
         
     | 
| 11 | 
         
             
            ---
         
     | 
| 12 | 
         | 
| 13 | 
         
            -
            # Model Card for  
     | 
| 14 | 
         
            -
             
     | 
| 
         | 
|
| 15 | 
         | 
| 16 | 
         | 
| 17 | 
         
             
            ## Model Details
         
     | 
| 18 | 
         | 
| 19 | 
         
             
            ### Model Description
         
     | 
| 20 | 
         
            -
            This is a Speech  
     | 
| 21 | 
         
             
            It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from 
         
     | 
| 22 | 
         
             
            the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with 
         
     | 
| 23 | 
         
             
            slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
         
     | 
| 
         @@ -35,10 +36,10 @@ The model was trained by next-token prediction over a subset of LibriSpeech, Lib 
     | 
|
| 35 | 
         | 
| 36 | 
         
             
            - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
         
     | 
| 37 | 
         
             
            - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
         
     | 
| 38 | 
         
            -
            - **Demo:** [ 
     | 
| 39 | 
         | 
| 40 | 
         
             
            ## Uses
         
     | 
| 41 | 
         
            -
            This is a base SpeechLM and as such can be used to generate  
     | 
| 42 | 
         
             
            [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
         
     | 
| 43 | 
         | 
| 44 | 
         
             
            ### Out-of-Scope Use
         
     | 
| 
         @@ -47,7 +48,7 @@ This model was trained on curated speech datasets which contain mainly audio-boo 
     | 
|
| 47 | 
         | 
| 48 | 
         | 
| 49 | 
         
             
            ## How to Get Started with the Model
         
     | 
| 50 | 
         
            -
            We refer users to the official repository for full usage  
     | 
| 51 | 
         | 
| 52 | 
         | 
| 53 | 
         
             
            ## Training Details
         
     | 
| 
         @@ -61,7 +62,7 @@ This model was trained on a subset of [LibriSpeech](https://huggingface.co/datas 
     | 
|
| 61 | 
         
             
            dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
         
     | 
| 62 | 
         | 
| 63 | 
         
             
            ### Training Procedure
         
     | 
| 64 | 
         
            -
            This model was trained by next token prediction over several  
     | 
| 65 | 
         
             
            Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
         
     | 
| 66 | 
         | 
| 67 | 
         
             
            #### Preprocessing
         
     | 
| 
         @@ -93,7 +94,7 @@ This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores an 
     | 
|
| 93 | 
         | 
| 94 | 
         
             
            #### Software
         
     | 
| 95 | 
         
             
            The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
         
     | 
| 96 | 
         
            -
            easy and  
     | 
| 97 | 
         | 
| 98 | 
         
             
            ## Citation
         
     | 
| 99 | 
         | 
| 
         | 
|
| 10 | 
         
             
            pipeline_tag: audio-to-audio
         
     | 
| 11 | 
         
             
            ---
         
     | 
| 12 | 
         | 
| 13 | 
         
            +
            # Model Card for SLAM
         
     | 
| 14 | 
         
            +
             
     | 
| 15 | 
         
            +
            This is a Speech Language Model trained for generating speech continuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
         
     | 
| 16 | 
         | 
| 17 | 
         | 
| 18 | 
         
             
            ## Model Details
         
     | 
| 19 | 
         | 
| 20 | 
         
             
            ### Model Description
         
     | 
| 21 | 
         
            +
            This is a Speech Language Model, introduced in "[_Slamming_: Training a Speech Language Model on One GPU in a Day](https://arxiv.org/abs/2502.15814)", focusing on efficient training. 
         
     | 
| 22 | 
         
             
            It was fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500 speech tokens extracted from 
         
     | 
| 23 | 
         
             
            the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). For a stronger version of the model trained with 
         
     | 
| 24 | 
         
             
            slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam_scaled).
         
     | 
| 
         | 
|
| 36 | 
         | 
| 37 | 
         
             
            - **Repository:** [https://github.com/slp-rl/slamkit](https://github.com/slp-rl/slamkit)
         
     | 
| 38 | 
         
             
            - **Paper:** [https://arxiv.org/abs/2502.15814](https://arxiv.org/abs/2502.15814)
         
     | 
| 39 | 
         
            +
            - **Demo:** [https://pages.cs.huji.ac.il/adiyoss-lab/slamming/](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/)
         
     | 
| 40 | 
         | 
| 41 | 
         
             
            ## Uses
         
     | 
| 42 | 
         
            +
            This is a base SpeechLM and as such can be used to generate continuations for speech segments, or as base for further tuning. See the _SlamKit_
         
     | 
| 43 | 
         
             
            [codebase](https://github.com/slp-rl/slamkit) for more details on usage, and checkout the [demo page](https://pages.cs.huji.ac.il/adiyoss-lab/slamming/) for some generation examples
         
     | 
| 44 | 
         | 
| 45 | 
         
             
            ### Out-of-Scope Use
         
     | 
| 
         | 
|
| 48 | 
         | 
| 49 | 
         | 
| 50 | 
         
             
            ## How to Get Started with the Model
         
     | 
| 51 | 
         
            +
            We refer users to the official repository for full usage explanations - [github](https://github.com/slp-rl/slamkit).
         
     | 
| 52 | 
         | 
| 53 | 
         | 
| 54 | 
         
             
            ## Training Details
         
     | 
| 
         | 
|
| 62 | 
         
             
            dataset [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
         
     | 
| 63 | 
         | 
| 64 | 
         
             
            ### Training Procedure
         
     | 
| 65 | 
         
            +
            This model was trained by next token prediction over several datasets, and then trained with DPO over [SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
         
     | 
| 66 | 
         
             
            Please refer to the [paper]() or [code](https://github.com/slp-rl/slamkit) for the full training recipes.
         
     | 
| 67 | 
         | 
| 68 | 
         
             
            #### Preprocessing
         
     | 
| 
         | 
|
| 94 | 
         | 
| 95 | 
         
             
            #### Software
         
     | 
| 96 | 
         
             
            The model was trained using the [*SlamKit*](https://github.com/slp-rl/slamkit) codebase which builds upon 🤗transformers extending it to support
         
     | 
| 97 | 
         
            +
            easy and efficient training of Speech Language Models.
         
     | 
| 98 | 
         | 
| 99 | 
         
             
            ## Citation
         
     | 
| 100 | 
         |