Vittorio Pippi
commited on
Commit
·
dd90e2c
1
Parent(s):
9b27178
Fix the YAML metadata
Browse files
README.md
CHANGED
@@ -1,6 +1,3 @@
|
|
1 |
-
# Emuru Convolutional VAE
|
2 |
-
|
3 |
-
```yaml
|
4 |
---
|
5 |
language:
|
6 |
- "en"
|
@@ -18,9 +15,8 @@ metrics:
|
|
18 |
- CER
|
19 |
library_name: diffusers
|
20 |
---
|
21 |
-
```
|
22 |
|
23 |
-
##
|
24 |
|
25 |
This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image \( I \in \mathbb{R}^{3 \times W \times H} \) to a latent representation with a single channel and spatial dimensions \( h \times w \) (where \( h = H/8 \) and \( w = W/8 \)). This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
|
26 |
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
language:
|
3 |
- "en"
|
|
|
15 |
- CER
|
16 |
library_name: diffusers
|
17 |
---
|
|
|
18 |
|
19 |
+
## Emuru Convolutional VAE
|
20 |
|
21 |
This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image \( I \in \mathbb{R}^{3 \times W \times H} \) to a latent representation with a single channel and spatial dimensions \( h \times w \) (where \( h = H/8 \) and \( w = W/8 \)). This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
|
22 |
|