ASLP-lab
/

LLaSE-G1

Audio-to-Audio

Model card Files Files and versions Community

BeauKang01 commited on Mar 1

Commit

4fa4c3a

1 Parent(s): 1c37d30

change in readme

Browse files

Files changed (1) hide show

README.md +103 -1

README.md CHANGED Viewed

@@ -7,4 +7,106 @@ language:
 - de
 - fr
 pipeline_tag: audio-to-audio
----

 - de
 - fr
 pipeline_tag: audio-to-audio
+---
+# LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
+## Introduction
+LLaSE-G1 is a unified speech enhancement model capable of handling multiple tasks without extra task prompts, including:
+- **Noise Suppression (SE)**
+- **Target Speaker Extraction (TSE)**
+- **Packet Loss Concealment (PLC)**
+- **Acoustic Echo Cancellation (AEC)**
+- **Speech Separation (SS)**
+To mitigate acoustic inconsistency, LLaSE-G1 employs continuous representations from **WavLM** as input and predicts speech tokens using **X-Codec2**, maximizing acoustic preservation. The model surpasses prior task-specific discriminative and generative speech enhancement models, demonstrating scaling effects at test time and emerging capabilities for unseen speech enhancement tasks.
+For more details, refer to our paper: [LLaSE-G1 Paper](https://submission-papers.github.io/LLaSE-G1-demo-page/)
+## Demo
+You can listen to the enhancement results on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
+## Installation
+### 1. Clone the repository
+```bash
+git clone https://github.com/your-repo/LLaSE-G1.git
+cd LLaSE-G1
+```
+### 2. Create a Conda environment and install dependencies
+```bash
+conda create -n llase python=3.10
+conda activate llase
+pip install -r requirements.txt
+```
+### 3. Download Pretrained Models
+LLaSE-G1 requires three additional pre-trained models to function properly. You can download them using the provided shell script:
+```bash
+bash ./ckpt/download.sh
+```
+Alternatively, you can download them manually and place them in the `./ckpt/` directory.
+## Inference
+The main inference script is **`inference.py`**. The inference process consists of two stages:
+1. Extract the 6th-layer features from WavLM.
+2. Use the language model (LM) to predict speech tokens, and then decode them into audio using **X-Codec2**.
+### Running Inference
+To run inference, configure the parameters in `./config/test.yml`:
+| Parameter        | Description                                                                                                                                                            |
+| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `infer_feat_too` | Whether to extract WavLM features during inference.                                                                                                                    |
+| `inference_time` | Number of inference iterations.                                                                                                                                        |
+| `feat_dir`       | Directory containing extracted features.                                                                                                                               |
+| `wav_dir`        | Directory of processed audio files.                                                                                                                                    |
+| `task`           | Task type: `SE` (Noise Suppression), `TSE` (Target Speaker Extraction), `PLC` (Packet Loss Concealment), `AEC` (Acoustic Echo Cancellation), `SS` (Speech Separation). |
+Example command to run inference:
+```bash
+python inference.py --config ./config/test.yml
+```
+## Results
+Samples processed by LLaSE-G1 can be found on our [Demo Page](https://submission-papers.github.io/LLaSE-G1-demo-page/).
+## Model Checkpoints
+Our pretrained model is available on [Hugging Face](https://huggingface.co/ASLP-lab/LLaSE-G1).
+## Citation
+If you find this work useful, please cite our paper:
+```bibtex
+@article{yourpaper2025,
+  title={LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement},
+  author={Your Name and Others},
+  journal={ACL},
+  year={2025}
+}
+```
+## License
+This project is released under the **MIT License**.
+## Contact
+For any questions, please contact: `[email protected]`