khanhld3 commited on
Commit
f651420
·
1 Parent(s): d6f86ad

[test] init

Browse files
Files changed (3) hide show
  1. .DS_Store +0 -0
  2. README.md +6 -4
  3. dataset.tsv +15 -0
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
README.md CHANGED
@@ -81,12 +81,14 @@ model-index:
81
  ---
82
  <a name = "description" ></a>
83
  ### Model Description
84
- **ChunkFormer-Large-Vie** is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on approximately **2000 hours** of Vietnamese speech data sourced from diverse datasets.
 
 
85
 
86
  ---
87
  <a name = "implementation" ></a>
88
  ### Documentation and Implementation
89
- The [documentation](#) and [implementation](#) of ChunkFormer are publicly available.
90
 
91
  ---
92
  <a name = "benchmark" ></a>
@@ -112,14 +114,14 @@ pip install -r requirements.txt
112
  2. **Download the Model Checkpoint from Hugging Face**
113
  ```bash
114
  git lfs install
115
- git clone https://huggingface.co/khanhld/chunkformer-large-vietnamese
116
  ```
117
  This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
118
 
119
  3. **Run the model**
120
  ```bash
121
  python decode.py \
122
- --model_checkpoint path/to/chunkformer-large-vietnamese \
123
  --long_form_audio path/to/long_audio.wav \
124
  --chunk_size 64 \
125
  --left_context_size 128 \
 
81
  ---
82
  <a name = "description" ></a>
83
  ### Model Description
84
+ **ChunkFormer-Large-Vie** is a large-scale Vietnamese Automatic Speech Recognition (ASR) model based on the innovative **ChunkFormer** architecture, introduced at **ICASSP 2025**. The model has been fine-tuned on approximately **2000 hours** of public Vietnamese speech data sourced from diverse datasets. A list of datasets can be found [**HERE**](dataset.tsv).
85
+
86
+ **!!! Please note that only the train-subset was used for tuning the model.**
87
 
88
  ---
89
  <a name = "implementation" ></a>
90
  ### Documentation and Implementation
91
+ The [Documentation](#) and [Implementation](#) of ChunkFormer are publicly available.
92
 
93
  ---
94
  <a name = "benchmark" ></a>
 
114
  2. **Download the Model Checkpoint from Hugging Face**
115
  ```bash
116
  git lfs install
117
+ git clone https://huggingface.co/khanhld/chunkformer-large-vie
118
  ```
119
  This will download the model checkpoint to the checkpoints folder inside your chunkformer directory.
120
 
121
  3. **Run the model**
122
  ```bash
123
  python decode.py \
124
+ --model_checkpoint path/to/chunkformer-large-vie \
125
  --long_form_audio path/to/long_audio.wav \
126
  --chunk_size 64 \
127
  --left_context_size 128 \
dataset.tsv ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ data Estimated hour Link
3
+ AILAB-VNUHCM/vivos 15 https://huggingface.co/datasets/AILAB-VNUHCM/vivos
4
+ doof-ferb/vlsp2020_vinai_100h 100 https://huggingface.co/datasets/doof-ferb/vlsp2020_vinai_100h
5
+ doof-ferb/fpt_fosd 100 https://huggingface.co/datasets/doof-ferb/fpt_fosd
6
+ doof-ferb/infore1_25hours 25 https://huggingface.co/datasets/doof-ferb/infore1_25hours
7
+ linhtran92/viet_bud500 500 https://huggingface.co/datasets/linhtran92/viet_bud500
8
+ doof-ferb/LSVSC 100 https://huggingface.co/datasets/doof-ferb/LSVSC
9
+ doof-ferb/vais1000 2 https://huggingface.co/datasets/doof-ferb/vais1000
10
+ doof-ferb/VietMed_labeled 3 https://huggingface.co/datasets/doof-ferb/VietMed_labeled
11
+ NhutP/VSV-1100 1100 https://huggingface.co/datasets/NhutP/VSV-1100
12
+ doof-ferb/Speech-MASSIVE_vie 1 https://huggingface.co/datasets/doof-ferb/Speech-MASSIVE_vie
13
+ doof-ferb/BibleMMS_vie 1 https://huggingface.co/datasets/doof-ferb/BibleMMS_vie
14
+ capleaf/viVoice 1000 https://huggingface.co/datasets/capleaf/viVoice
15
+ linhtran92/viet_youtube_asr_corpus_v2 100 https://huggingface.co/datasets/linhtran92/viet_youtube_asr_corpus_v2