HaoyiZhu commited on
Commit
0a95223
·
verified ·
1 Parent(s): efd1512

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/logo.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+
6
+ <div align="center">
7
+
8
+ # Aether: Geometric-Aware Unified World Modeling
9
+
10
+ </div>
11
+
12
+ <div align="center">
13
+ <img width="400" alt="image" src="assets/logo.png">
14
+ <!-- <br> -->
15
+ </div>
16
+
17
+ <div align="center">
18
+ <a href='https://arxiv.org/abs/2503.18945'><img src='https://img.shields.io/badge/arXiv-2503.18945-red'></a> &nbsp;
19
+ <a href='https://aether-world.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
20
+ <a href=''><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo%20(Coming%20Soon)-blue'></a> &nbsp;
21
+ </div>
22
+
23
+ Aether addresses a fundamental challenge in AI: integrating geometric reconstruction with generative modeling
24
+ for human-like spatial reasoning. Our framework unifies three core capabilities: (1) **4D dynamic reconstruction**,
25
+ (2) **action-conditioned video prediction**, and (3) **goal-conditioned visual planning**. Trained entirely on
26
+ synthetic data, Aether achieves strong zero-shot generalization to real-world scenarios.
27
+
28
+ <div align="center">
29
+ <img src="assets/teaser.png" alt="Teaser" width="800"/>
30
+ </div>
31
+
32
+
33
+ ## 📝 Citation
34
+ If you find this work useful in your research, please consider citing:
35
+
36
+ ```bibtex
37
+ @article{aether,
38
+ title = {Aether: Geometric-Aware Unified World Modeling},
39
+ author = {Aether Team and Haoyi Zhu and Yifan Wang and Jianjun Zhou and Wenzheng Chang and Yang Zhou and Zizun Li and Junyi Chen and Chunhua Shen and Jiangmiao Pang and Tong He},
40
+ journal = {arXiv preprint arXiv:2503.18945},
41
+ year = {2025}
42
+ }
43
+ ```
44
+
45
+ ## ⚖️ License
46
+ This repository is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
47
+
48
+ ## 🙏 Acknowledgements
49
+ Our work is primarily built upon
50
+ [Accelerate](https://github.com/huggingface/accelerate),
51
+ [Diffusers](https://github.com/huggingface/diffusers),
52
+ [CogVideoX](https://github.com/THUDM/CogVideo),
53
+ [Finetrainers](https://github.com/a-r-r-o-w/finetrainers),
54
+ [DepthAnyVideo](https://github.com/Nightmare-n/DepthAnyVideo),
55
+ [CUT3R](https://github.com/CUT3R/CUT3R),
56
+ [MonST3R](https://github.com/Junyi42/monst3r),
57
+ [VBench](https://github.com/Vchitect/VBench),
58
+ [GST](https://github.com/SOTAMak1r/GST),
59
+ [SPA](https://github.com/HaoyiZhu/SPA),
60
+ [DroidCalib](https://github.com/boschresearch/DroidCalib),
61
+ [Grounded-SAM-2](https://github.com/IDEA-Research/Grounded-SAM-2),
62
+ [ceres-solver](https://github.com/ceres-solver/ceres-solver), etc.
63
+ We extend our gratitude to all these authors for their generously open-sourced code and their significant contributions to the community.
assets/logo.png ADDED

Git LFS Details

  • SHA256: 1fcc6a3c8e5fc8206ce96ca50f85b06aa337d38354b98b4faef986f06026550e
  • Pointer size: 132 Bytes
  • Size of remote file: 1.29 MB
assets/teaser.png ADDED

Git LFS Details

  • SHA256: 3b9cfe7dbabbb999ad75f78ef3a38ffb0ed9f56303cff3a0d9ebfa90bf29031c
  • Pointer size: 133 Bytes
  • Size of remote file: 11.8 MB
transformer/config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "CogVideoXTransformer3DModel",
3
+ "_diffusers_version": "0.32.2",
4
+ "_name_or_path": "THUDM/CogVideoX-5b-I2V",
5
+ "activation_fn": "gelu-approximate",
6
+ "attention_bias": true,
7
+ "attention_head_dim": 64,
8
+ "dropout": 0.0,
9
+ "flip_sin_to_cos": true,
10
+ "freq_shift": 0,
11
+ "in_channels": 96,
12
+ "max_text_seq_length": 226,
13
+ "norm_elementwise_affine": true,
14
+ "norm_eps": 1e-05,
15
+ "num_attention_heads": 48,
16
+ "num_layers": 42,
17
+ "ofs_embed_dim": null,
18
+ "out_channels": 56,
19
+ "patch_bias": true,
20
+ "patch_size": 2,
21
+ "patch_size_t": null,
22
+ "sample_frames": 41,
23
+ "sample_height": 60,
24
+ "sample_width": 90,
25
+ "spatial_interpolation_scale": 1.875,
26
+ "temporal_compression_ratio": 4,
27
+ "temporal_interpolation_scale": 1.0,
28
+ "text_embed_dim": 4096,
29
+ "time_embed_dim": 512,
30
+ "timestep_activation_fn": "silu",
31
+ "use_learned_positional_embeddings": false,
32
+ "use_rotary_positional_embeddings": true
33
+ }
transformer/diffusion_pytorch_model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba6d3d89bb92a2d9c42e025090317477b2e653b6e081c61d311b6aff866ef020
3
+ size 4979268296
transformer/diffusion_pytorch_model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69db4f65a4e99f0ff7fc05574287d3264fb7c1114edfd108d921a89c58640b4e
3
+ size 4948039832
transformer/diffusion_pytorch_model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52691657f95be290feb229f0665cb11c97fa47a479ec2b9c44e6cb94a3f4b20c
3
+ size 1216323744
transformer/diffusion_pytorch_model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff