slprl
/

slam

@@ -12,14 +12,17 @@ base_model:
 This is a Speech Lanaguage Model trained for generating audio contiuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 - **Developed by:** [SLP-RL](https://huggingface.co/slprl)
 - **Model type:** SpeechLM
@@ -45,25 +48,14 @@ This is a base SpeechLM and as such can be used to generate contiuations for spe
 This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slam).
 ## Training Details
-We highly encourage users to read the full [paper](), for full training details.
 ### Training Data
 This model was trained on a subset of [LibriSpeech] train, [Libri-Light]() and the synthetic dataset
@@ -84,42 +76,11 @@ We encourage you to explore the official repository for full details - [github](
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
@@ -134,12 +95,10 @@ This model was trained as part of ["*Slamming*: Training a Speech Language Model
 This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
 #### Software
-The model was trained using the [*Slam*](https://github.com/slp-rl/slam) codebase which builds upon transformers extending it to support easy and efficent training of
-Speech Language Models.
 ## Citation
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
 Soon!

 This is a Speech Lanaguage Model trained for generating audio contiuations over discrete [Hubert tokens](https://huggingface.co/slprl/mhubert-base-25hz).
 ## Model Details
 ### Model Description
+This is a Speech Lanaguage Model, fine-tuned from [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) over a vocabulary of 500
+speech tokens extracted from the 11-th layer of [mhubert-25hz](https://huggingface.co/slprl/mhubert-base-25hz). It was trained as part of
+["*Slamming*: Training a Speech Language Model on One GPU in a Day"], focusing on efficient training. For a stronger model trained with
+slightly more compute - 2*A100 for 2 days, see [slam_scaled](https://huggingface.co/slprl/slam).
+The model was trained by next-token prediction over a subset of LibriSpeech, Libri-Light and a synthetic data
+[sTinyStories](https://huggingface.co/datasets/slprl/sTinyStories). It was then trained with DPO over
+[SpokenSwag](https://huggingface.co/datasets/slprl/SpokenSwag).
 - **Developed by:** [SLP-RL](https://huggingface.co/slprl)
 - **Model type:** SpeechLM
 This model was trained on curated speech datasets which contain mainly audio-books and stories, as such the outputs should not be treated as factual in any way.
 ## How to Get Started with the Model
 We refer users to the official repository for full usage explainations - [github](https://github.com/slp-rl/slam).
 ## Training Details
+We highly encourage users to read the full [paper](), for full training details, a brief overview is provided below.
 ### Training Data
 This model was trained on a subset of [LibriSpeech] train, [Libri-Light]() and the synthetic dataset
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 ## Evaluation
+The paper provides full results, we do give here some results and also refer to the [demo page]() to listen to some samples.
+**ADD Table**
 This model was trained using **only a single Nvidia A5000 GPU**, 16 CPU cores and 24 GB of RAM for **24 hours**.
 #### Software
+The model was trained using the [*Slam*](https://github.com/slp-rl/slam) codebase which builds upon 🤗transformers extending it to support
+easy and efficent training of Speech Language Models.
 ## Citation
 **BibTeX:**
 Soon!