Audio-to-Audio
BeauKang01 commited on
Commit
4fa4c3a
·
1 Parent(s): 1c37d30

change in readme

Browse files
Files changed (1) hide show
  1. README.md +103 -1
README.md CHANGED
@@ -7,4 +7,106 @@ language:
7
  - de
8
  - fr
9
  pipeline_tag: audio-to-audio
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - de
8
  - fr
9
  pipeline_tag: audio-to-audio
10
+ ---
11
+
12
+ # LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
13
+
14
+ ## Introduction
15
+
16
+ LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
17
+
18
+ - **Noise Suppression (SE)**
19
+ - **Target Speaker Extraction (TSE)**
20
+ - **Packet Loss Concealment (PLC)**
21
+ - **Acoustic Echo Cancellation (AEC)**
22
+ - **Speech Separation (SS)**
23
+
24
+ To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
25
+
26
+ For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
27
+
28
+ ## Demo
29
+
30
+ You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
31
+
32
+ ## Installation
33
+
34
+ ### 1. Clone the repository
35
+
36
+ ```bash
37
+ git clone https://github.com/your-repo/LLaSE-G1.git
38
+ cd LLaSE-G1
39
+ ```
40
+
41
+ ### 2. Create a Conda environment and install dependencies
42
+
43
+ ```bash
44
+ conda create -n llase python=3.10
45
+ conda activate llase
46
+ pip install -r requirements.txt
47
+ ```
48
+
49
+ ### 3. Download Pretrained Models
50
+
51
+ LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
52
+
53
+ ```bash
54
+ bash ./ckpt/download.sh
55
+ ```
56
+
57
+ Alternatively, you can download them manually and place them in the `./ckpt/` directory.
58
+
59
+ ## Inference
60
+
61
+ The main inference script is **`inference.py`**. The inference process consists of two stages:
62
+
63
+ 1. Extract the 6th-layer features from WavLM.
64
+ 2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
65
+
66
+ ### Running Inference
67
+
68
+ To run inference, configure the parameters in `./config/test.yml`:
69
+
70
+ | Parameter | Description |
71
+ | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
72
+ | `infer_feat_too` | Whether to extract WavLM features during inference. |
73
+ | `inference_time` | Number of inference iterations. |
74
+ | `feat_dir` | Directory containing extracted features. |
75
+ | `wav_dir` | Directory of processed audio files. |
76
+ | `task` | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
77
+
78
+ Example command to run inference:
79
+
80
+ ```bash
81
+ python inference.py --config ./config/test.yml
82
+ ```
83
+
84
+ ## Results
85
+
86
+ Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
87
+
88
+ ## Model Checkpoints
89
+
90
+ Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
91
+
92
+ ## Citation
93
+
94
+ If you find this work useful, please cite our paper:
95
+
96
+ ```bibtex
97
+ @article{yourpaper2025,
98
+ title={LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement},
99
+ author={Your Name and Others},
100
+ journal={ACL},
101
+ year={2025}
102
+ }
103
+ ```
104
+
105
+ ## License
106
+
107
+ This project is released under the **MIT License**.
108
+
109
+ ## Contact
110
+
111
+ For any questions, please contact: `[email protected]`
112
+