Update README.md
Browse files
README.md
CHANGED
@@ -1,112 +1,104 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
language:
|
4 |
-
- zh
|
5 |
-
- en
|
6 |
-
- es
|
7 |
-
- de
|
8 |
-
- fr
|
9 |
-
pipeline_tag: audio-to-audio
|
10 |
-
---
|
11 |
-
|
12 |
-
# LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
|
13 |
-
|
14 |
-
## Introduction
|
15 |
-
|
16 |
-
LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
|
17 |
-
|
18 |
-
- **Noise Suppression (SE)**
|
19 |
-
- **Target Speaker Extraction (TSE)**
|
20 |
-
- **Packet Loss Concealment (PLC)**
|
21 |
-
- **Acoustic Echo Cancellation (AEC)**
|
22 |
-
- **Speech Separation (SS)**
|
23 |
-
|
24 |
-
To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
|
25 |
-
|
26 |
-
For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
|
27 |
-
|
28 |
-
## Demo
|
29 |
-
|
30 |
-
You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
|
31 |
-
|
32 |
-
## Installation
|
33 |
-
|
34 |
-
### 1. Clone the repository
|
35 |
-
|
36 |
-
```bash
|
37 |
-
git clone https://github.com/your-repo/LLaSE-G1.git
|
38 |
-
cd LLaSE-G1
|
39 |
-
```
|
40 |
-
|
41 |
-
### 2. Create a Conda environment and install dependencies
|
42 |
-
|
43 |
-
```bash
|
44 |
-
conda create -n llase python=3.10
|
45 |
-
conda activate llase
|
46 |
-
pip install -r requirements.txt
|
47 |
-
```
|
48 |
-
|
49 |
-
### 3. Download Pretrained Models
|
50 |
-
|
51 |
-
LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
|
52 |
-
|
53 |
-
```bash
|
54 |
-
bash ./ckpt/download.sh
|
55 |
-
```
|
56 |
-
|
57 |
-
Alternatively, you can download them manually and place them in the `./ckpt/` directory.
|
58 |
-
|
59 |
-
## Inference
|
60 |
-
|
61 |
-
The main inference script is **`inference.py`**. The inference process consists of two stages:
|
62 |
-
|
63 |
-
1. Extract the 6th-layer features from WavLM.
|
64 |
-
2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
|
65 |
-
|
66 |
-
### Running Inference
|
67 |
-
|
68 |
-
To run inference, configure the parameters in `./config/test.yml`:
|
69 |
-
|
70 |
-
| Parameter | Description |
|
71 |
-
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
72 |
-
| `infer_feat_too` | Whether to extract WavLM features during inference. |
|
73 |
-
| `inference_time` | Number of inference iterations. |
|
74 |
-
| `feat_dir` | Directory containing extracted features. |
|
75 |
-
| `wav_dir` | Directory of processed audio files. |
|
76 |
-
| `task` | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
|
77 |
-
|
78 |
-
Command to run inference:
|
79 |
-
|
80 |
-
```bash
|
81 |
-
bash inference.sh
|
82 |
-
```
|
83 |
-
|
84 |
-
## Results
|
85 |
-
|
86 |
-
Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
|
87 |
-
|
88 |
-
## Model Checkpoints
|
89 |
-
|
90 |
-
Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
|
91 |
-
|
92 |
-
## Citation
|
93 |
-
|
94 |
-
If you find this work useful, please cite our paper:
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
|
105 |
-
## License
|
106 |
-
|
107 |
-
This project is released under the **Apache-2.0**.
|
108 |
-
|
109 |
-
## Contact
|
110 |
-
|
111 |
-
For any questions, please contact: `[email protected]`
|
112 |
-
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
- en
|
6 |
+
- es
|
7 |
+
- de
|
8 |
+
- fr
|
9 |
+
pipeline_tag: audio-to-audio
|
10 |
+
---
|
11 |
+
|
12 |
+
# LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
|
13 |
+
|
14 |
+
## Introduction
|
15 |
+
|
16 |
+
LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
|
17 |
+
|
18 |
+
- **Noise Suppression (SE)**
|
19 |
+
- **Target Speaker Extraction (TSE)**
|
20 |
+
- **Packet Loss Concealment (PLC)**
|
21 |
+
- **Acoustic Echo Cancellation (AEC)**
|
22 |
+
- **Speech Separation (SS)**
|
23 |
+
|
24 |
+
To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
|
25 |
+
|
26 |
+
For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
|
27 |
+
|
28 |
+
## Demo
|
29 |
+
|
30 |
+
You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
|
31 |
+
|
32 |
+
## Installation
|
33 |
+
|
34 |
+
### 1. Clone the repository
|
35 |
+
|
36 |
+
```bash
|
37 |
+
git clone https://github.com/your-repo/LLaSE-G1.git
|
38 |
+
cd LLaSE-G1
|
39 |
+
```
|
40 |
+
|
41 |
+
### 2. Create a Conda environment and install dependencies
|
42 |
+
|
43 |
+
```bash
|
44 |
+
conda create -n llase python=3.10
|
45 |
+
conda activate llase
|
46 |
+
pip install -r requirements.txt
|
47 |
+
```
|
48 |
+
|
49 |
+
### 3. Download Pretrained Models
|
50 |
+
|
51 |
+
LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
|
52 |
+
|
53 |
+
```bash
|
54 |
+
bash ./ckpt/download.sh
|
55 |
+
```
|
56 |
+
|
57 |
+
Alternatively, you can download them manually and place them in the `./ckpt/` directory.
|
58 |
+
|
59 |
+
## Inference
|
60 |
+
|
61 |
+
The main inference script is **`inference.py`**. The inference process consists of two stages:
|
62 |
+
|
63 |
+
1. Extract the 6th-layer features from WavLM.
|
64 |
+
2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
|
65 |
+
|
66 |
+
### Running Inference
|
67 |
+
|
68 |
+
To run inference, configure the parameters in `./config/test.yml`:
|
69 |
+
|
70 |
+
| Parameter | Description |
|
71 |
+
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
72 |
+
| `infer_feat_too` | Whether to extract WavLM features during inference. |
|
73 |
+
| `inference_time` | Number of inference iterations. |
|
74 |
+
| `feat_dir` | Directory containing extracted features. |
|
75 |
+
| `wav_dir` | Directory of processed audio files. |
|
76 |
+
| `task` | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
|
77 |
+
|
78 |
+
Command to run inference:
|
79 |
+
|
80 |
+
```bash
|
81 |
+
bash inference.sh
|
82 |
+
```
|
83 |
+
|
84 |
+
## Results
|
85 |
+
|
86 |
+
Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
|
87 |
+
|
88 |
+
## Model Checkpoints
|
89 |
+
|
90 |
+
Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
|
91 |
+
|
92 |
+
## Citation
|
93 |
+
|
94 |
+
If you find this work useful, please cite our paper:
|
95 |
+
|
96 |
+
|
97 |
+
## License
|
98 |
+
|
99 |
+
This project is released under the **Apache-2.0**.
|
100 |
+
|
101 |
+
## Contact
|
102 |
+
|
103 |
+
For any questions, please contact: `[email protected]`
|
104 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|