Upload folder using huggingface_hub
Browse files- .gitattributes +2 -0
- README.md +63 -3
- assets/logo.png +3 -0
- assets/teaser.png +3 -0
- transformer/config.json +33 -0
- transformer/diffusion_pytorch_model-00001-of-00003.safetensors +3 -0
- transformer/diffusion_pytorch_model-00002-of-00003.safetensors +3 -0
- transformer/diffusion_pytorch_model-00003-of-00003.safetensors +3 -0
- transformer/diffusion_pytorch_model.safetensors.index.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
37 |
+
assets/teaser.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,63 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
|
6 |
+
<div align="center">
|
7 |
+
|
8 |
+
# Aether: Geometric-Aware Unified World Modeling
|
9 |
+
|
10 |
+
</div>
|
11 |
+
|
12 |
+
<div align="center">
|
13 |
+
<img width="400" alt="image" src="assets/logo.png">
|
14 |
+
<!-- <br> -->
|
15 |
+
</div>
|
16 |
+
|
17 |
+
<div align="center">
|
18 |
+
<a href='https://arxiv.org/abs/2503.18945'><img src='https://img.shields.io/badge/arXiv-2503.18945-red'></a>
|
19 |
+
<a href='https://aether-world.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
|
20 |
+
<a href=''><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo%20(Coming%20Soon)-blue'></a>
|
21 |
+
</div>
|
22 |
+
|
23 |
+
Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modeling
|
24 |
+
for human-like spatial reasoning. Our framework unifies three core capabilities: (1) **4D dynamic reconstruction**,
|
25 |
+
(2) **action-conditioned video prediction**, and (3) **goal-conditioned visual planning**. Trained entirely on
|
26 |
+
synthetic data, Aether achieves strong zero-shot generalization to real-world scenarios.
|
27 |
+
|
28 |
+
<div align="center">
|
29 |
+
<img src="assets/teaser.png" alt="Teaser" width="800"/>
|
30 |
+
</div>
|
31 |
+
|
32 |
+
|
33 |
+
## 📝 Citation
|
34 |
+
If you find this work useful in your research, please consider citing:
|
35 |
+
|
36 |
+
```bibtex
|
37 |
+
@article{aether,
|
38 |
+
title = {Aether: Geometric-Aware Unified World Modeling},
|
39 |
+
author = {Aether Team and Haoyi Zhu and Yifan Wang and Jianjun Zhou and Wenzheng Chang and Yang Zhou and Zizun Li and Junyi Chen and Chunhua Shen and Jiangmiao Pang and Tong He},
|
40 |
+
journal = {arXiv preprint arXiv:2503.18945},
|
41 |
+
year = {2025}
|
42 |
+
}
|
43 |
+
```
|
44 |
+
|
45 |
+
## ⚖️ License
|
46 |
+
This repository is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
47 |
+
|
48 |
+
## 🙏 Acknowledgements
|
49 |
+
Our work is primarily built upon
|
50 |
+
[Accelerate](https://github.com/huggingface/accelerate),
|
51 |
+
[Diffusers](https://github.com/huggingface/diffusers),
|
52 |
+
[CogVideoX](https://github.com/THUDM/CogVideo),
|
53 |
+
[Finetrainers](https://github.com/a-r-r-o-w/finetrainers),
|
54 |
+
[DepthAnyVideo](https://github.com/Nightmare-n/DepthAnyVideo),
|
55 |
+
[CUT3R](https://github.com/CUT3R/CUT3R),
|
56 |
+
[MonST3R](https://github.com/Junyi42/monst3r),
|
57 |
+
[VBench](https://github.com/Vchitect/VBench),
|
58 |
+
[GST](https://github.com/SOTAMak1r/GST),
|
59 |
+
[SPA](https://github.com/HaoyiZhu/SPA),
|
60 |
+
[DroidCalib](https://github.com/boschresearch/DroidCalib),
|
61 |
+
[Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2),
|
62 |
+
[ceres-solver](https://github.com/ceres-solver/ceres-solver), etc.
|
63 |
+
We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
|
assets/logo.png
ADDED
![]() |
Git LFS Details
|
assets/teaser.png
ADDED
![]() |
Git LFS Details
|
transformer/config.json
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "CogVideoXTransformer3DModel",
|
3 |
+
"_diffusers_version": "0.32.2",
|
4 |
+
"_name_or_path": "THUDM/CogVideoX-5b-I2V",
|
5 |
+
"activation_fn": "gelu-approximate",
|
6 |
+
"attention_bias": true,
|
7 |
+
"attention_head_dim": 64,
|
8 |
+
"dropout": 0.0,
|
9 |
+
"flip_sin_to_cos": true,
|
10 |
+
"freq_shift": 0,
|
11 |
+
"in_channels": 96,
|
12 |
+
"max_text_seq_length": 226,
|
13 |
+
"norm_elementwise_affine": true,
|
14 |
+
"norm_eps": 1e-05,
|
15 |
+
"num_attention_heads": 48,
|
16 |
+
"num_layers": 42,
|
17 |
+
"ofs_embed_dim": null,
|
18 |
+
"out_channels": 56,
|
19 |
+
"patch_bias": true,
|
20 |
+
"patch_size": 2,
|
21 |
+
"patch_size_t": null,
|
22 |
+
"sample_frames": 41,
|
23 |
+
"sample_height": 60,
|
24 |
+
"sample_width": 90,
|
25 |
+
"spatial_interpolation_scale": 1.875,
|
26 |
+
"temporal_compression_ratio": 4,
|
27 |
+
"temporal_interpolation_scale": 1.0,
|
28 |
+
"text_embed_dim": 4096,
|
29 |
+
"time_embed_dim": 512,
|
30 |
+
"timestep_activation_fn": "silu",
|
31 |
+
"use_learned_positional_embeddings": false,
|
32 |
+
"use_rotary_positional_embeddings": true
|
33 |
+
}
|
transformer/diffusion_pytorch_model-00001-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ba6d3d89bb92a2d9c42e025090317477b2e653b6e081c61d311b6aff866ef020
|
3 |
+
size 4979268296
|
transformer/diffusion_pytorch_model-00002-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:69db4f65a4e99f0ff7fc05574287d3264fb7c1114edfd108d921a89c58640b4e
|
3 |
+
size 4948039832
|
transformer/diffusion_pytorch_model-00003-of-00003.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:52691657f95be290feb229f0665cb11c97fa47a479ec2b9c44e6cb94a3f4b20c
|
3 |
+
size 1216323744
|
transformer/diffusion_pytorch_model.safetensors.index.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|