Audio-to-Audio
ASLP-lab commited on
Commit
87723f3
·
verified ·
1 Parent(s): 546e0c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -112
README.md CHANGED
@@ -1,112 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - zh
5
- - en
6
- - es
7
- - de
8
- - fr
9
- pipeline_tag: audio-to-audio
10
- ---
11
-
12
- # LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
13
-
14
- ## Introduction
15
-
16
- LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
17
-
18
- - **Noise Suppression (SE)**
19
- - **Target Speaker Extraction (TSE)**
20
- - **Packet Loss Concealment (PLC)**
21
- - **Acoustic Echo Cancellation (AEC)**
22
- - **Speech Separation (SS)**
23
-
24
- To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
25
-
26
- For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
27
-
28
- ## Demo
29
-
30
- You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
31
-
32
- ## Installation
33
-
34
- ### 1. Clone the repository
35
-
36
- ```bash
37
- git clone https://github.com/your-repo/LLaSE-G1.git
38
- cd LLaSE-G1
39
- ```
40
-
41
- ### 2. Create a Conda environment and install dependencies
42
-
43
- ```bash
44
- conda create -n llase python=3.10
45
- conda activate llase
46
- pip install -r requirements.txt
47
- ```
48
-
49
- ### 3. Download Pretrained Models
50
-
51
- LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
52
-
53
- ```bash
54
- bash ./ckpt/download.sh
55
- ```
56
-
57
- Alternatively, you can download them manually and place them in the `./ckpt/` directory.
58
-
59
- ## Inference
60
-
61
- The main inference script is **`inference.py`**. The inference process consists of two stages:
62
-
63
- 1. Extract the 6th-layer features from WavLM.
64
- 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
65
-
66
- ### Running Inference
67
-
68
- To run inference, configure the parameters in `./config/test.yml`:
69
-
70
- | Parameter | Description |
71
- | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
72
- | `infer_feat_too` | Whether to extract WavLM features during inference. |
73
- | `inference_time` | Number of inference iterations. |
74
- | `feat_dir` | Directory containing extracted features. |
75
- | `wav_dir` | Directory of processed audio files. |
76
- | `task` | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
77
-
78
- Command to run inference:
79
-
80
- ```bash
81
- bash inference.sh
82
- ```
83
-
84
- ## Results
85
-
86
- Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
87
-
88
- ## Model Checkpoints
89
-
90
- Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
91
-
92
- ## Citation
93
-
94
- If you find this work useful, please cite our paper:
95
-
96
- ```bibtex
97
- @article{yourpaper2025,
98
- title={LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement},
99
- author={Your Name and Others},
100
- journal={ACL},
101
- year={2025}
102
- }
103
- ```
104
-
105
- ## License
106
-
107
- This project is released under the **Apache-2.0**.
108
-
109
- ## Contact
110
-
111
- For any questions, please contact: `[email protected]`
112
-
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ - es
7
+ - de
8
+ - fr
9
+ pipeline_tag: audio-to-audio
10
+ ---
11
+
12
+ # LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
13
+
14
+ ## Introduction
15
+
16
+ LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
17
+
18
+ - **Noise Suppression (SE)**
19
+ - **Target Speaker Extraction (TSE)**
20
+ - **Packet Loss Concealment (PLC)**
21
+ - **Acoustic Echo Cancellation (AEC)**
22
+ - **Speech Separation (SS)**
23
+
24
+ To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
25
+
26
+ For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
27
+
28
+ ## Demo
29
+
30
+ You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
31
+
32
+ ## Installation
33
+
34
+ ### 1. Clone the repository
35
+
36
+ ```bash
37
+ git clone https://github.com/your-repo/LLaSE-G1.git
38
+ cd LLaSE-G1
39
+ ```
40
+
41
+ ### 2. Create a Conda environment and install dependencies
42
+
43
+ ```bash
44
+ conda create -n llase python=3.10
45
+ conda activate llase
46
+ pip install -r requirements.txt
47
+ ```
48
+
49
+ ### 3. Download Pretrained Models
50
+
51
+ LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
52
+
53
+ ```bash
54
+ bash ./ckpt/download.sh
55
+ ```
56
+
57
+ Alternatively, you can download them manually and place them in the `./ckpt/` directory.
58
+
59
+ ## Inference
60
+
61
+ The main inference script is **`inference.py`**. The inference process consists of two stages:
62
+
63
+ 1. Extract the 6th-layer features from WavLM.
64
+ 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
65
+
66
+ ### Running Inference
67
+
68
+ To run inference, configure the parameters in `./config/test.yml`:
69
+
70
+ | Parameter | Description |
71
+ | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
72
+ | `infer_feat_too` | Whether to extract WavLM features during inference. |
73
+ | `inference_time` | Number of inference iterations. |
74
+ | `feat_dir` | Directory containing extracted features. |
75
+ | `wav_dir` | Directory of processed audio files. |
76
+ | `task` | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
77
+
78
+ Command to run inference:
79
+
80
+ ```bash
81
+ bash inference.sh
82
+ ```
83
+
84
+ ## Results
85
+
86
+ Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
87
+
88
+ ## Model Checkpoints
89
+
90
+ Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
91
+
92
+ ## Citation
93
+
94
+ If you find this work useful, please cite our paper:
95
+
96
+
97
+ ## License
98
+
99
+ This project is released under the **Apache-2.0**.
100
+
101
+ ## Contact
102
+
103
+ For any questions, please contact: `[email protected]`
104
+