Naozumi0512
/

g2pW-canto-20241206-bert-base

Model card Files Files and versions Community

g2pW-canto-20241206-bert-base

This is a G2P (Grapheme-to-Phoneme) model trained on the Naozumi0512/g2p-Cantonese-aggregate-pos-retag dataset and evaluated on the yue-g2p-benchmark.

Model Overview

The model uses hon9kon9ize/bert-base-cantonese. For more details see https://github.com/Naozumi520/g2pW-Cantonese .

Dataset

The model was trained on the Naozumi0512/g2p-Cantonese-aggregate-pos-retag dataset, which includes:

68,500 Cantonese words/phrases with corresponding phonetic transcriptions.
Data is formatted to align with the CPP (Chinese Polyphones with Pinyin) structure.
Sources include:
- Rime Cantonese Input Schema (jyut6ping3.words.dict.yaml)
- 粵典 Words.hk
- CantoDict

Evaluation

The model was evaluated on the yue-g2p-benchmark:

Metric	Score
Accuracy	0.9117
Phoneme Error Rate	0.0274

Inference

https://github.com/Naozumi520/g2pW-Cantonese/tree/20241206-bert-base

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train Naozumi0512/g2pW-canto-20241206-bert-base

Collection including Naozumi0512/g2pW-canto-20241206-bert-base

Cantonese G2P (model)

Trained weights for G2P (Grapheme-to-Phoneme) task in Cantonese • 3 items • Updated 16 days ago