Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ language:
|
|
4 |
- en
|
5 |
---
|
6 |
|
7 |
-
#
|
8 |
|
9 |
<div align="center">
|
10 |
|
@@ -15,10 +15,6 @@ language:
|
|
15 |
|
16 |
## Introduction
|
17 |
|
18 |
-
<div align="center">
|
19 |
-
<img src='assets/visual_abstract.png' height="50%" width="50%">
|
20 |
-
</div>
|
21 |
-
|
22 |
The key contributions of our work are,
|
23 |
1. We introduce a lightweight transformer decoder with learnable class-specific tokens, that ensures each token is dedicated to a specific class, thereby enabling independent modeling of classes. The design effectively addresses the challenge of poor segmentation performance of long-tail classes, prevalent in existing methods.
|
24 |
2. Our multi-scale feature extraction and MLP fusion strategy, combined with a transformer decoder that leverages learnable class-specific tokens, mitigates the dominance of head classes during training and enhances the feature representation of long-tail classes.
|
@@ -28,7 +24,7 @@ The key contributions of our work are,
|
|
28 |
|
29 |
## Training Framework
|
30 |
<div align="center">
|
31 |
-
<img src='assets/segface'>
|
32 |
</div>
|
33 |
|
34 |
The proposed architecture, <i>SegFace</i>, addresses face segmentation by enhancing the performance on long-tail classes through a transformer-based approach. Specifically, multi-scale features are first extracted from an image encoder and then fused using an MLP fusion module to form face tokens. These tokens, along with class-specific tokens, undergo self-attention, face-to-token, and token-to-face cross-attention operations, refining both class and face tokens to enhance class-specific features. Finally, the upscaled face tokens and learned class tokens are combined to produce segmentation maps for each facial region.
|
|
|
4 |
- en
|
5 |
---
|
6 |
|
7 |
+
# <i>SegFace</i> Model Card
|
8 |
|
9 |
<div align="center">
|
10 |
|
|
|
15 |
|
16 |
## Introduction
|
17 |
|
|
|
|
|
|
|
|
|
18 |
The key contributions of our work are,
|
19 |
1. We introduce a lightweight transformer decoder with learnable class-specific tokens, that ensures each token is dedicated to a specific class, thereby enabling independent modeling of classes. The design effectively addresses the challenge of poor segmentation performance of long-tail classes, prevalent in existing methods.
|
20 |
2. Our multi-scale feature extraction and MLP fusion strategy, combined with a transformer decoder that leverages learnable class-specific tokens, mitigates the dominance of head classes during training and enhances the feature representation of long-tail classes.
|
|
|
24 |
|
25 |
## Training Framework
|
26 |
<div align="center">
|
27 |
+
<img src='assets/segface.png'>
|
28 |
</div>
|
29 |
|
30 |
The proposed architecture, <i>SegFace</i>, addresses face segmentation by enhancing the performance on long-tail classes through a transformer-based approach. Specifically, multi-scale features are first extracted from an image encoder and then fused using an MLP fusion module to form face tokens. These tokens, along with class-specific tokens, undergo self-attention, face-to-token, and token-to-face cross-attention operations, refining both class and face tokens to enhance class-specific features. Finally, the upscaled face tokens and learned class tokens are combined to produce segmentation maps for each facial region.
|