# SNAC-Vocos
A trainer for [SNAC](https://github.com/hubertsiuzdak/snac) (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

## Installation
Suggested python>=3.9.  
Clone the repository:
```
git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos
```
Install packages:
```
pip install -r requirements.txt
```
## Infer
Refer to the [infer.py](./infer.py) for inference instructions and usage examples.

## Available Models
| Model name | Huggingface | Corpus | Domain |
|:------------|:--------|:--------|:--------|
|snac_vocos_16khz_hop200_scale8421_1kh | [🤗](https://huggingface.co/hertz-pj/snac-vocos) | 1k hours | Speech(Mandarin/English) |


## Training
1、Prepare a filelist of audio files for the training and validation set, e.g. [train.list](./data/train.list).  
2、Fill a config file, e.g. [snac_vocos.yaml](./config/snac_vocos_nq4_scale8421_16khz.yaml). The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.  
3、Start training
```
python train.py fit --config ./configs/snac_vocos.yaml
```

## TODO
- [x] Release code
- [x] Release a checkpoint trained with 1k hours of speech(Mandarin/English).
- [ ] Demo page.


## Acknowledgements
This implementation uses parts of the code from the following Github repos:  
- [SNAC](https://github.com/hubertsiuzdak/snac)
- [WavTokenizer](https://github.com/jishengpeng/WavTokenizer/)