Menlo
/

Speechless-llama3.2-v0.1

@@ -18,7 +18,7 @@ tags:
 Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
-Trained on over XXX hours of English and XXX hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.
 For more details, check out our official [blog post]().
@@ -49,7 +49,21 @@ For more details, check out our official [blog post]().
 You can use given example code to load the model.
 ```{python}
 ```
@@ -57,14 +71,15 @@ You can use given example code to load the model.
 | **Parameter**              | **Value**               |
 |----------------------------|-------------------------|
-| **Epochs**                 |                         |
-| **Global Batch Size**      |                         |
-| **Learning Rate**          |                         |
 | **Learning Scheduler**     | Cosine                  |
 | **Optimizer**              | AdamW                   |
-| **Warmup Ratio**           |                         |
-| **Weight Decay**           |                         |
-| **Max Sequence Length**    |                         |
 ## Evaluation

 Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
+Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.
 For more details, check out our official [blog post]().
 You can use given example code to load the model.
 ```{python}
+import torch
+from transformers import pipeline
+model_id = "homebrewltd/Speechless-llama3.2-v0.1"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")
+>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]
 ```
 | **Parameter**              | **Value**               |
 |----------------------------|-------------------------|
+| **Epochs**                 | 2                       |
+| **Global Batch Size**      | 144                     |
+| **Learning Rate**          | 3e-4                    |
 | **Learning Scheduler**     | Cosine                  |
 | **Optimizer**              | AdamW                   |
+| **Warmup Ratio**           | 0.05                    |
+| **Weight Decay**           | 0.01                    |
+| **Max Sequence Length**    | 512                     |
+| **Clip Grad Norm**         | 1.0                     |
 ## Evaluation