|
--- |
|
license: mit |
|
--- |
|
|
|
This is a streamlined interface version of [WavTokenizer-large-speech-75token](https://huggingface.co/novateur/WavTokenizer-large-speech-75token/tree/main), providing a way to interact with the model through separate encoder and decoder components. |
|
|
|
- Reduced model size from 1.75GB to ~330MB by keeping only necessary components for inference |
|
- Split interface (82MB encoder, 248MB decoder) |
|
|
|
The model is split into: |
|
- `encoder/`: Handles audio encoding |
|
- `decoder/`: Handles decoding and synthesis |