encoreus
/

Transformer_Autoregressive_Flow

Unconditional Image Generation

Model card Files Files and versions

encoreus commited on May 20

Commit

9511a40

·

verified ·

1 Parent(s): 674e6e4

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ pipeline_tag: unconditional-image-generation
 The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
-It's sampling process is extremely slow, and we want to accelerate it in []. In experiments, we find that the model parameters are not available in [Zhai et al., 2024], so we retrain TarFlow models and upload them.
 As metioned in [Zhai et al., 2024], a TarFlow model can be denoted as P-Ch-T-K-pε, with
 patch size (P), model channel size (Ch), number of autoregressive flow blocks (T), the number of attention layers in each flow (K), the best input noise variance pε that yields the best sampling quality for generation tasks.
@@ -45,4 +45,5 @@ The sampling traces maybe look like this:
 From top to bottom: Img128cond, Img64cond (patch4), Img64uncond, AFHQ. From left to right: noise, Block 7-0, denoised image.
 [1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
-[]

 The TarFlow is proposed by [Zhai et al., 2024], which introduces stacks of autoregressive Transformer blocks (similar to MAF) into the building of affine coupling layers to do Non-Volume Preserving, combined with guidance and denoising }, finally achieves state-of-the-art results across multiple benchmarks.
+It's sampling process is extremely slow, and we want to accelerate it in [Liu and Qin, 2025]. In experiments, we find that the model parameters are not available in [Zhai et al., 2024], so we retrain TarFlow models and upload them.
 As metioned in [Zhai et al., 2024], a TarFlow model can be denoted as P-Ch-T-K-pε, with
 patch size (P), model channel size (Ch), number of autoregressive flow blocks (T), the number of attention layers in each flow (K), the best input noise variance pε that yields the best sampling quality for generation tasks.
 From top to bottom: Img128cond, Img64cond (patch4), Img64uncond, AFHQ. From left to right: noise, Block 7-0, denoised image.
 [1] Zhai S, Zhang R, Nakkiran P, et al. Normalizing flows are capable generative models[J]. arXiv preprint arXiv:2412.06329, 2024.
+[2] Liu, B. \& Qin, Z. Accelerate TarFlow Sampling with GS-Jacobi Iteration.  (2025), https://arxiv.org/abs/2505.12849