khanhld
/

chunkformer-large-vie

@@ -81,12 +81,14 @@ model-index:
 ---
 <a name = "description" ></a>
 ### Model Description
-**ChunkFormer-Large-Vie** is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on approximately **2000 hours** of Vietnamese speech data sourced from diverse datasets.
 ---
 <a name = "implementation" ></a>
 ### Documentation and Implementation
-The [documentation](#) and [implementation](#) of ChunkFormer are publicly available.
 ---
 <a name = "benchmark" ></a>
@@ -112,14 +114,14 @@ pip install -r requirements.txt
 2. **Download the Model Checkpoint from Hugging Face**
 ```bash
 git lfs install
-git clone https://huggingface.co/khanhld/chunkformer-large-vietnamese
 ```
 This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
 3. **Run the model**
 ```bash
 python decode.py \
-    --model_checkpoint path/to/chunkformer-large-vietnamese \
     --long_form_audio path/to/long_audio.wav \
     --chunk_size 64 \
     --left_context_size 128 \

 ---
 <a name = "description" ></a>
 ### Model Description
+**ChunkFormer-Large-Vie** is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on approximately **2000 hours** of public Vietnamese speech data sourced from diverse datasets. A list of datasets can be found [**HERE**](dataset.tsv).
+**!!! Please note that only the train-subset was used for tuning the model.**
 ---
 <a name = "implementation" ></a>
 ### Documentation and Implementation
+The [Documentation](#) and [Implementation](#) of ChunkFormer are publicly available.
 ---
 <a name = "benchmark" ></a>
 2. **Download the Model Checkpoint from Hugging Face**
 ```bash
 git lfs install
+git clone https://huggingface.co/khanhld/chunkformer-large-vie
 ```
 This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
 3. **Run the model**
 ```bash
 python decode.py \
+    --model_checkpoint path/to/chunkformer-large-vie \
     --long_form_audio path/to/long_audio.wav \
     --chunk_size 64 \
     --left_context_size 128 \

dataset.tsv ADDED Viewed

+data	Estimated hour	Link
+AILAB-VNUHCM/vivos	15	https://huggingface.co/datasets/AILAB-VNUHCM/vivos
+doof-ferb/vlsp2020_vinai_100h	100	https://huggingface.co/datasets/doof-ferb/vlsp2020_vinai_100h
+doof-ferb/fpt_fosd	100	https://huggingface.co/datasets/doof-ferb/fpt_fosd
+doof-ferb/infore1_25hours	25	https://huggingface.co/datasets/doof-ferb/infore1_25hours
+linhtran92/viet_bud500	500	https://huggingface.co/datasets/linhtran92/viet_bud500
+doof-ferb/LSVSC	100	https://huggingface.co/datasets/doof-ferb/LSVSC
+doof-ferb/vais1000	2	https://huggingface.co/datasets/doof-ferb/vais1000
+doof-ferb/VietMed_labeled	3	https://huggingface.co/datasets/doof-ferb/VietMed_labeled
+NhutP/VSV-1100	1100	https://huggingface.co/datasets/NhutP/VSV-1100
+doof-ferb/Speech-MASSIVE_vie	1	https://huggingface.co/datasets/doof-ferb/Speech-MASSIVE_vie
+doof-ferb/BibleMMS_vie	1	https://huggingface.co/datasets/doof-ferb/BibleMMS_vie
+capleaf/viVoice	1000	https://huggingface.co/datasets/capleaf/viVoice
+linhtran92/viet_youtube_asr_corpus_v2	100	https://huggingface.co/datasets/linhtran92/viet_youtube_asr_corpus_v2