|
--- |
|
license: cc-by-4.0 |
|
extra_gated_prompt: "You agree to abide by the terms of the CC-BY 4.0 license and provide accurate information about your intended use of this model." |
|
extra_gated_fields: |
|
Full Name: text |
|
Organization (if applicable): text |
|
Country: country |
|
IP Location: ip_location |
|
Intended Use Case: |
|
type: select |
|
options: |
|
- Research |
|
- Education |
|
- Accessibility (e.g., assistive technology) |
|
- Creative Projects (e.g., audiobooks, podcasts) |
|
- Commercial |
|
- label: Other (please specify) |
|
value: other |
|
Please describe how you intend to use this model: text |
|
Do you plan to use this model for commercial purposes?: |
|
type: select |
|
options: |
|
- Yes |
|
- No |
|
I agree to comply with the terms of the CC-BY license when using this model: checkbox |
|
I confirm that the information provided above is complete and accurate: checkbox |
|
--- |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/Q0I-vdI_e3Kbwjw2IYJtQ.png" width="500"> |
|
|
|
⚠️ **WORK IN PROGRESS**: This model is still in early training stages. Current checkpoints produce low-quality, garbled speech. Updated checkpoints will be released as training progresses. |
|
|
|
# OpenF5-TTS |
|
|
|
A commercial-friendly version of F5-TTS retrained from scratch on permissively-licensed data. |
|
|
|
Trained on the [Emilia-YODAS (CC-BY)](https://huggingface.co/datasets/amphion/Emilia-Dataset) dataset using the F5-TTS Small configuration. |
|
|
|
[**GitHub Repository + Details**](https://github.com/fakerybakery/OpenF5-TTS) |
|
|
|
## Usage |
|
|
|
The model requires specific configuration files to work properly: |
|
|
|
```bash |
|
pip install f5-tts |
|
huggingface-cli download mrfakename/OpenF5-Intermediate --local-dir openf5 |
|
f5-tts_infer-cli -mc openf5/model_config.yaml -p openf5/model_last.pt -v openf5/vocab.txt |
|
``` |
|
|
|
## Training Progress |
|
|
|
Listen to various audio samples across steps. All samples use this text: |
|
|
|
> I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. |
|
|
|
Reference audio: |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/Yfkcl6KS75B4b04n6vtz1.wav"></audio> |
|
|
|
### ~600K steps (Current Checkpoint) |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/WKWAAOfI2mXqMUC9jLnSK.wav"></audio> |
|
|
|
### ~550K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/5zsJuaYFJuXjKITESE2N8.wav"></audio> |
|
|
|
### ~500K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/MfSfl-EOE3e7s-AHcVnvT.wav"></audio> |
|
|
|
### ~450K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/h9DiqU70Oe-3lW6JfyX7X.wav"></audio> |
|
|
|
### ~400K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/QsWOwiU2NJ9UrvHLc_szd.wav"></audio> |
|
|
|
### ~350K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/OLC6g606zOBYUQ_ZaG1Db.wav"></audio> |
|
|
|
### ~300K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/7kGKvov7tn1SbpJy-AV4r.wav"></audio> |
|
|
|
### ~250K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/4CwJjM8m6D5evRln4URq2.wav"></audio> |
|
|
|
### ~200K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/RDg4pzNKjxyxf7WCv52nH.wav"></audio> |
|
|
|
### ~150K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/dW2gp2Z-Lh4Fr6TbT4Jq2.wav"></audio> |
|
|
|
(Starting to hear traces of the expected text!) |
|
|
|
### ~100K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/ECrM4SY9suLl26vubvCnk.wav"></audio> |
|
|
|
### ~75K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/_KmZwBY9uW-toxBRylkkK.wav"></audio> |
|
|
|
### ~50K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/S7U66DLsnp53wegwa-XIX.wav"></audio> |
|
|
|
### ~25K steps |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/62e54f0eae9d3f10acb95cb9/ukM1_y7DdgI6Vnth7a35E.wav"></audio> |
|
|
|
## Support |
|
|
|
Discussions are disabled until training is complete. For issues, please open a GitHub Issue in the [repository](https://github.com/fakerybakery/OpenF5-TTS). |
|
|
|
## License |
|
|
|
- **Model**: [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) - Free for commercial use |
|
- **Scripts**: MIT License |
|
|
|
**Note:** No restrictions are placed on usage of the outputs of the model. While attribution is appreciated, it is not required for outputs of the model. |
|
|
|
THE MODEL IS PROVIDED “AS IS” UNDER ITS OPEN LICENSE. THE AUTHORS AND CONTRIBUTORS DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. USERS ARE SOLELY RESPONSIBLE FOR ENSURING COMPLIANCE WITH APPLICABLE COPYRIGHT LAWS, INCLUDING THE USE OF INPUT DATA AND GENERATED OUTPUTS. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES, INCLUDING BUT NOT LIMITED TO DAMAGES RESULTING FROM LOSS OF USE, DATA, OR PROFITS, OR ANY CLAIMS RELATED TO THE MODEL’S OUTPUTS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE, OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. |