|
--- |
|
license: apache-2.0 |
|
base_model: |
|
- microsoft/wavlm-large |
|
pipeline_tag: audio-to-audio |
|
library_name: torch |
|
datasets: |
|
- mythicinfinity/libritts |
|
--- |
|
|
|
# β‘ FocalCodec |
|
|
|
A low-bitrate single-codebook 16 kHz speech codec based on [focal modulation](https://arxiv.org/abs/2203.11926). |
|
|
|
This repository contains the **12.5 Hz checkpoint** trained on **LibriTTS 960**, as described in the preprint. |
|
|
|
- π **Preprint**: https://arxiv.org/abs/2502.04465 |
|
|
|
- π **Project Page**: https://lucadellalib.github.io/focalcodec-web/ |
|
|
|
- πΎ **GitHub**: https://github.com/lucadellalib/focalcodec |
|
|
|
<img src="focalcodec.png" width="700"> |
|
|
|
--------------------------------------------------------------------------------------------------------- |
|
|
|
## βΆοΈ Quickstart |
|
|
|
See the readme at: https://github.com/lucadellalib/focalcodec |
|
|
|
--------------------------------------------------------------------------------------------------------- |
|
|
|
## @ Citing |
|
|
|
``` |
|
@article{dellalibera2025focalcodec, |
|
title = {{FocalCodec}: Low-Bitrate Speech Coding via Focal Modulation Networks}, |
|
author = {Luca {Della Libera} and Francesco Paissan and Cem Subakan and Mirco Ravanelli}, |
|
journal = {arXiv preprint arXiv:2502.04465}, |
|
year = {2025}, |
|
} |
|
``` |
|
|
|
--------------------------------------------------------------------------------------------------------- |
|
|
|
## π§ Contact |
|
|
|
[[email protected]](mailto:[email protected]) |
|
|
|
--------------------------------------------------------------------------------------------------------- |