Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
|
| 19 |
Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
|
| 20 |
|
| 21 |
-
Trained on over
|
| 22 |
|
| 23 |
For more details, check out our official [blog post]().
|
| 24 |
|
|
@@ -49,7 +49,21 @@ For more details, check out our official [blog post]().
|
|
| 49 |
You can use given example code to load the model.
|
| 50 |
|
| 51 |
```{python}
|
|
|
|
|
|
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
```
|
| 54 |
|
| 55 |
|
|
@@ -57,14 +71,15 @@ You can use given example code to load the model.
|
|
| 57 |
|
| 58 |
| **Parameter** | **Value** |
|
| 59 |
|----------------------------|-------------------------|
|
| 60 |
-
| **Epochs** |
|
| 61 |
-
| **Global Batch Size** |
|
| 62 |
-
| **Learning Rate** |
|
| 63 |
| **Learning Scheduler** | Cosine |
|
| 64 |
| **Optimizer** | AdamW |
|
| 65 |
-
| **Warmup Ratio** |
|
| 66 |
-
| **Weight Decay** |
|
| 67 |
-
| **Max Sequence Length** |
|
|
|
|
| 68 |
|
| 69 |
## Evaluation
|
| 70 |
|
|
|
|
| 18 |
|
| 19 |
Speechless is a compact, open-source text-to-semantics (1B parameters) model, designed to generate direct semantic representations of audio as discrete tokens, bypassing the need for a text-to-speech (TTS) model. Unlike traditional pipelines that rely on generating and processing audio (TTS → ASR), Speechless eliminates this complexity by directly converting text into semantic speech tokens, simplifying training, saving resources, and enabling scalability, especially for low-resource languages.
|
| 20 |
|
| 21 |
+
Trained on over ~400 hours of English and ~1000 hours of Vietnamese data, Speechless is a core component of the Ichigo v0.5 family.
|
| 22 |
|
| 23 |
For more details, check out our official [blog post]().
|
| 24 |
|
|
|
|
| 49 |
You can use given example code to load the model.
|
| 50 |
|
| 51 |
```{python}
|
| 52 |
+
import torch
|
| 53 |
+
from transformers import pipeline
|
| 54 |
|
| 55 |
+
model_id = "homebrewltd/Speechless-llama3.2-v0.1"
|
| 56 |
+
|
| 57 |
+
pipe = pipeline(
|
| 58 |
+
"text-generation",
|
| 59 |
+
model=model_id,
|
| 60 |
+
torch_dtype=torch.bfloat16,
|
| 61 |
+
device_map="auto"
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
pipe("<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research")
|
| 65 |
+
|
| 66 |
+
>>> [{'generated_text': '<|reserved_special_token_69|>I’m Speechless – A Model Developed by Homebrew Research.assistant\n\n<|sound_1968|><|sound_0464|><|sound_0642|><|duration_02|><|sound_0634|><|sound_0105|><|duration_02|><|sound_1745|><|duration_02|><|sound_1345|><|sound_0210|><|sound_1312|><|sound_1312|>'}]
|
| 67 |
```
|
| 68 |
|
| 69 |
|
|
|
|
| 71 |
|
| 72 |
| **Parameter** | **Value** |
|
| 73 |
|----------------------------|-------------------------|
|
| 74 |
+
| **Epochs** | 2 |
|
| 75 |
+
| **Global Batch Size** | 144 |
|
| 76 |
+
| **Learning Rate** | 3e-4 |
|
| 77 |
| **Learning Scheduler** | Cosine |
|
| 78 |
| **Optimizer** | AdamW |
|
| 79 |
+
| **Warmup Ratio** | 0.05 |
|
| 80 |
+
| **Weight Decay** | 0.01 |
|
| 81 |
+
| **Max Sequence Length** | 512 |
|
| 82 |
+
| **Clip Grad Norm** | 1.0 |
|
| 83 |
|
| 84 |
## Evaluation
|
| 85 |
|