JusperLee commited on
Commit
f034034
·
verified ·
1 Parent(s): 05a2f4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -2
README.md CHANGED
@@ -7,7 +7,9 @@ language:
7
  - en
8
  ---
9
 
10
- <h3 align="center">Apollo: Band-sequence Modeling for High-Quality Audio Restoration</h3>
 
 
11
  <p align="center">
12
  <strong>Mohan Xu<sup>*</sup>, Kai Li<sup>*</sup>, Guo Chen, Xiaolin Hu</strong><br>
13
  <strong>Tsinghua University, Beijing, China</strong><br>
@@ -33,6 +35,33 @@ language:
33
 
34
  In this paper, we propose a speech separation model with significantly reduced parameter size and computational cost: Time-Frequency Interleaved Gain Extraction and Reconstruction Network (TIGER). TIGER leverages prior knowledge to divide frequency bands and applies compression on frequency information. We employ a multi-scale selective attention (MSA) module to extract contextual features, while introducing a full-frequency-frame attention (F^3A) module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a novel dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at random proportions. Experimental results demonstrated that TIGER significantly outperformed state-of-the-art (SOTA) model TF-GridNet on the EchoSet dataset in both inference speed and separation quality, while reducing the number of parameters by 94.3% and the MACs by 95.3%. These results indicate that by utilizing frequency band-split and interleaved modeling structures, TIGER achieves a substantial reduction in parameters and computational costs while maintaining high performance. Notably, TIGER is the first speech separation model with fewer than 1 million parameters that achieves performance close to the SOTA model.
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ## 🚀 Quick Start
38
 
@@ -71,4 +100,4 @@ python audio_test.py --conf_dir configs/tiger.yml
71
 
72
  ## 📧 Contact
73
 
74
- If you have any questions, please feel free to contact us via `[email protected]`.
 
7
  - en
8
  ---
9
 
10
+ <p align="center">
11
+ </p>
12
+ <h3 align="center">TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation</h3>
13
  <p align="center">
14
  <strong>Mohan Xu<sup>*</sup>, Kai Li<sup>*</sup>, Guo Chen, Xiaolin Hu</strong><br>
15
  <strong>Tsinghua University, Beijing, China</strong><br>
 
35
 
36
  In this paper, we propose a speech separation model with significantly reduced parameter size and computational cost: Time-Frequency Interleaved Gain Extraction and Reconstruction Network (TIGER). TIGER leverages prior knowledge to divide frequency bands and applies compression on frequency information. We employ a multi-scale selective attention (MSA) module to extract contextual features, while introducing a full-frequency-frame attention (F^3A) module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a novel dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at random proportions. Experimental results demonstrated that TIGER significantly outperformed state-of-the-art (SOTA) model TF-GridNet on the EchoSet dataset in both inference speed and separation quality, while reducing the number of parameters by 94.3% and the MACs by 95.3%. These results indicate that by utilizing frequency band-split and interleaved modeling structures, TIGER achieves a substantial reduction in parameters and computational costs while maintaining high performance. Notably, TIGER is the first speech separation model with fewer than 1 million parameters that achieves performance close to the SOTA model.
37
 
38
+ ## TIGER
39
+
40
+ Overall pipeline of the model architecture of TIGER and its modules.
41
+
42
+ ![TIGER Model Architecture](assets/TIGER.png)
43
+
44
+ ## Results
45
+
46
+ Performance comparisons of TIGER and other existing separation models on ***Libri2Mix, LRS2-2Mix, and EchoSet***. Bold indicates optimal performance, and italics indicate suboptimal performance.
47
+
48
+ ![TIGER Model Architecture](assets/result.png)
49
+
50
+ Efficiency comparisons of TIGER and other models.
51
+
52
+ ![TIGER Model Architecture](assets/efficiency.png)
53
+
54
+ Comparison of performance and efficiency of cinematic sound separation models on DnR. '*' means the result comes from the original paper of DnR.
55
+
56
+ ![TIGER Model Architecture](assets/dnr.png)
57
+
58
+ ## 📦 Installation
59
+
60
+ ```bash
61
+ git clone https://github.com/JusperLee/TIGER.git
62
+ cd TIGER
63
+ pip install -r requirements.txt
64
+ ```
65
 
66
  ## 🚀 Quick Start
67
 
 
100
 
101
  ## 📧 Contact
102
 
103
+ If you have any questions, please feel free to contact us via `[email protected]`.