diff --git a/.gitattributes b/.gitattributes
index a6344aac8c09253b3b630fb776ae94478aa0275b..2c1d7b64c47bc3b72646f0f9c904f8e14fc6d997 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -33,3 +33,20 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
+examples/drone_video.mp4 filter=lfs diff=lfs merge=lfs -text
+examples/IMG_9730.mov filter=lfs diff=lfs merge=lfs -text
+examples/IMG_9731.mov filter=lfs diff=lfs merge=lfs -text
+examples/IMG_9732.mov filter=lfs diff=lfs merge=lfs -text
+examples/test16.mov filter=lfs diff=lfs merge=lfs -text
+examples/test17.mov filter=lfs diff=lfs merge=lfs -text
+examples/test18.mov filter=lfs diff=lfs merge=lfs -text
+examples/test19.mov filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/lib.win-amd64-3.9/lietorch_backends.cp39-win_amd64.pyd filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/altcorr/correlation.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/altcorr/correlation_kernel.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/fastba/ba.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/fastba/ba_cuda.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/lietorch/src/lietorch.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/lietorch/src/lietorch_cpu.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/build/temp.win-amd64-3.9/Release/dpvo/lietorch/src/lietorch_gpu.obj filter=lfs diff=lfs merge=lfs -text
+third-party/DPVO/dist/dpvo-0.0.0-py3.9-win-amd64.egg filter=lfs diff=lfs merge=lfs -text
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..cfd2c0ac8c42ba5f555553be291595b170717b60
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Soyong Shin
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
index 4a1d051a1c83a04f6e20264ae633dbf6eb1ec8d5..378da23c710053edf04083c9d2679032b126a2db 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,120 @@
----
-title: Motionbert Meta Sapiens
-emoji: 🌍
-colorFrom: green
-colorTo: indigo
-sdk: docker
-pinned: false
-short_description: Sapiens
----
-
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
+
+
[](https://arxiv.org/abs/2312.07531)
[](https://colab.research.google.com/drive/1ysUtGSwidTQIdBQRhq0hj63KbseFujkn?usp=sharing)
+ [](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=wham-reconstructing-world-grounded-humans) [](https://paperswithcode.com/sota/3d-human-pose-estimation-on-emdb?p=wham-reconstructing-world-grounded-humans)
+
+
+https://github.com/yohanshin/WHAM/assets/46889727/da4602b4-0597-4e64-8da4-ab06931b23ee
+
+
+## Introduction
+This repository is the official [Pytorch](https://pytorch.org/) implementation of [WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion](https://arxiv.org/abs/2312.07531). For more information, please visit our [project page](https://wham.is.tue.mpg.de/).
+
+
+## Installation
+Please see [Installation](docs/INSTALL.md) for details.
+
+
+## Quick Demo
+
+### [
Google Colab for WHAM demo is now available](https://colab.research.google.com/drive/1ysUtGSwidTQIdBQRhq0hj63KbseFujkn?usp=sharing)
+
+### Registration
+
+To download SMPL body models (Neutral, Female, and Male), you need to register for [SMPL](https://smpl.is.tue.mpg.de/) and [SMPLify](https://smplify.is.tue.mpg.de/). The username and password for both homepages will be used while fetching the demo data.
+
+Next, run the following script to fetch demo data. This script will download all the required dependencies including trained models and demo videos.
+
+```bash
+bash fetch_demo_data.sh
+```
+
+You can try with one examplar video:
+```
+python demo.py --video examples/IMG_9732.mov --visualize
+```
+
+We assume camera focal length following [CLIFF](https://github.com/haofanwang/CLIFF). You can specify known camera intrinsics [fx fy cx cy] for SLAM as the demo example below:
+```
+python demo.py --video examples/drone_video.mp4 --calib examples/drone_calib.txt --visualize
+```
+
+You can skip SLAM if you only want to get camera-coordinate motion. You can run as:
+```
+python demo.py --video examples/IMG_9732.mov --visualize --estimate_local_only
+```
+
+You can further refine the results of WHAM using Temporal SMPLify as a post processing. This will allow better 2D alignment as well as 3D accuracy. All you need to do is add `--run_smplify` flag when running demo.
+
+## Docker
+
+Please refer to [Docker](docs/DOCKER.md) for details.
+
+## Python API
+
+Please refer to [API](docs/API.md) for details.
+
+## Dataset
+Please see [Dataset](docs/DATASET.md) for details.
+
+## Evaluation
+```bash
+# Evaluate on 3DPW dataset
+python -m lib.eval.evaluate_3dpw --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
+
+# Evaluate on RICH dataset
+python -m lib.eval.evaluate_rich --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
+
+# Evaluate on EMDB dataset (also computes W-MPJPE and WA-MPJPE)
+python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 1 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar # EMDB 1
+
+python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 2 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar # EMDB 2
+```
+
+## Training
+WHAM training involves into two different stages; (1) 2D to SMPL lifting through AMASS dataset and (2) finetuning with feature integration using the video datasets. Please see [Dataset](docs/DATASET.md) for preprocessing the training datasets.
+
+### Stage 1.
+```bash
+python train.py --cfg configs/yamls/stage1.yaml
+```
+
+### Stage 2.
+Training stage 2 requires pretrained results from the stage 1. You can use your pretrained results, or download the weight from [Google Drive](https://drive.google.com/file/d/1Erjkho7O0bnZFawarntICRUCroaKabRE/view?usp=sharing) save as `checkpoints/wham_stage1.tar.pth`.
+```bash
+python train.py --cfg configs/yamls/stage2.yaml TRAIN.CHECKPOINT
+```
+
+### Train with BEDLAM
+TBD
+
+## Acknowledgement
+We would like to sincerely appreciate Hongwei Yi and Silvia Zuffi for the discussion and proofreading. Part of this work was done when Soyong Shin was an intern at the Max Planck Institute for Intelligence System.
+
+The base implementation is largely borrowed from [VIBE](https://github.com/mkocabas/VIBE) and [TCMR](https://github.com/hongsukchoi/TCMR_RELEASE). We use [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) for 2D keypoints detection and [DPVO](https://github.com/princeton-vl/DPVO), [DROID-SLAM](https://github.com/princeton-vl/DROID-SLAM) for extracting camera motion. Please visit their official websites for more details.
+
+## TODO
+
+- [ ] Data preprocessing
+
+- [x] Training implementation
+
+- [x] Colab demo release
+
+- [x] Demo for custom videos
+
+## Citation
+```
+@InProceedings{shin2023wham,
+title={WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion},
+author={Shin, Soyong and Kim, Juyong and Halilaj, Eni and Black, Michael J.},
+booktitle={Computer Vision and Pattern Recognition (CVPR)},
+year={2024}
+}
+```
+
+## License
+Please see [License](./LICENSE) for details.
+
+## Contact
+Please contact soyongs@andrew.cmu.edu for any questions related to this work.
diff --git a/checkpoints/dpvo.pth b/checkpoints/dpvo.pth
new file mode 100644
index 0000000000000000000000000000000000000000..25b16864668c8625f38021d17cc534258ff6297f
--- /dev/null
+++ b/checkpoints/dpvo.pth
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:30d02dc2b88a321cf99aad8e4ea1152a44d791b5b65bf95ad036922819c0ff12
+size 14167743
diff --git a/checkpoints/hmr2a.ckpt b/checkpoints/hmr2a.ckpt
new file mode 100644
index 0000000000000000000000000000000000000000..46b527a17ee53d47a16caf15c3d61c878a4790be
--- /dev/null
+++ b/checkpoints/hmr2a.ckpt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2dcf79638109781d1ae5f5c44fee5f55bc83291c210653feead9b7f04fa6f20e
+size 2709494041
diff --git a/checkpoints/vitpose-h-multi-coco.pth b/checkpoints/vitpose-h-multi-coco.pth
new file mode 100644
index 0000000000000000000000000000000000000000..2072ac0591d04aacf4c55af4104d31ee5eb604cd
--- /dev/null
+++ b/checkpoints/vitpose-h-multi-coco.pth
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:50e33f4077ef2a6bcfd7110c58742b24c5859b7798fb0eedd6d2215e0a8980bc
+size 2549075546
diff --git a/checkpoints/wham_vit_bedlam_w_3dpw.pth.tar b/checkpoints/wham_vit_bedlam_w_3dpw.pth.tar
new file mode 100644
index 0000000000000000000000000000000000000000..8ad730690f89ceb46bb3415e675688beb0c8998d
--- /dev/null
+++ b/checkpoints/wham_vit_bedlam_w_3dpw.pth.tar
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:91d250d2d298b00f200aa39df36253b55ca434188c2934d8e91e5e0777fb67fd
+size 527307587
diff --git a/checkpoints/wham_vit_w_3dpw.pth.tar b/checkpoints/wham_vit_w_3dpw.pth.tar
new file mode 100644
index 0000000000000000000000000000000000000000..5bc1ba919994f301d0c4df9b5c88d4cbfb871321
--- /dev/null
+++ b/checkpoints/wham_vit_w_3dpw.pth.tar
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9835bcbc952221ad72fa72e768e1f4620e96788b12cecd676a3b1dbee057dd66
+size 527307587
diff --git a/checkpoints/yolov8x.pt b/checkpoints/yolov8x.pt
new file mode 100644
index 0000000000000000000000000000000000000000..a0510bf3bb96a465f97b81dab2dd2f437e2cccbe
--- /dev/null
+++ b/checkpoints/yolov8x.pt
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c4d5a3f000d771762f03fc8b57ebd0aae324aeaefdd6e68492a9c4470f2d1e8b
+size 136867539
diff --git a/configs/__pycache__/config.cpython-39.pyc b/configs/__pycache__/config.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..d54c1c5e7237aaa07a716b7513de4a26bfdbac2c
Binary files /dev/null and b/configs/__pycache__/config.cpython-39.pyc differ
diff --git a/configs/__pycache__/constants.cpython-39.pyc b/configs/__pycache__/constants.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5159979482d30d508ad35f1370115d7154714364
Binary files /dev/null and b/configs/__pycache__/constants.cpython-39.pyc differ
diff --git a/configs/config.py b/configs/config.py
new file mode 100644
index 0000000000000000000000000000000000000000..2d58747ff089106fb22345a461345e833d711f6a
--- /dev/null
+++ b/configs/config.py
@@ -0,0 +1,111 @@
+import argparse
+from yacs.config import CfgNode as CN
+
+# Configuration variable
+cfg = CN()
+
+cfg.TITLE = 'default'
+cfg.OUTPUT_DIR = 'results'
+cfg.EXP_NAME = 'default'
+cfg.DEVICE = 'cuda'
+cfg.DEBUG = False
+cfg.EVAL = False
+cfg.RESUME = False
+cfg.LOGDIR = ''
+cfg.NUM_WORKERS = 5
+cfg.SEED_VALUE = -1
+cfg.SUMMARY_ITER = 50
+cfg.MODEL_CONFIG = ''
+cfg.FLIP_EVAL = False
+
+cfg.TRAIN = CN()
+cfg.TRAIN.STAGE = 'stage1'
+cfg.TRAIN.DATASET_EVAL = '3dpw'
+cfg.TRAIN.CHECKPOINT = ''
+cfg.TRAIN.BATCH_SIZE = 64
+cfg.TRAIN.START_EPOCH = 0
+cfg.TRAIN.END_EPOCH = 999
+cfg.TRAIN.OPTIM = 'Adam'
+cfg.TRAIN.LR = 3e-4
+cfg.TRAIN.LR_FINETUNE = 5e-5
+cfg.TRAIN.LR_PATIENCE = 5
+cfg.TRAIN.LR_DECAY_RATIO = 0.1
+cfg.TRAIN.WD = 0.0
+cfg.TRAIN.MOMENTUM = 0.9
+cfg.TRAIN.MILESTONES = [50, 70]
+
+cfg.DATASET = CN()
+cfg.DATASET.SEQLEN = 81
+cfg.DATASET.RATIO = [1.0, 0, 0, 0, 0]
+
+cfg.MODEL = CN()
+cfg.MODEL.BACKBONE = 'vit'
+
+cfg.LOSS = CN()
+cfg.LOSS.SHAPE_LOSS_WEIGHT = 0.001
+cfg.LOSS.JOINT2D_LOSS_WEIGHT = 5.
+cfg.LOSS.JOINT3D_LOSS_WEIGHT = 5.
+cfg.LOSS.VERTS3D_LOSS_WEIGHT = 1.
+cfg.LOSS.POSE_LOSS_WEIGHT = 1.
+cfg.LOSS.CASCADED_LOSS_WEIGHT = 0.0
+cfg.LOSS.CONTACT_LOSS_WEIGHT = 0.04
+cfg.LOSS.ROOT_VEL_LOSS_WEIGHT = 0.001
+cfg.LOSS.ROOT_POSE_LOSS_WEIGHT = 0.4
+cfg.LOSS.SLIDING_LOSS_WEIGHT = 0.5
+cfg.LOSS.CAMERA_LOSS_WEIGHT = 0.04
+cfg.LOSS.LOSS_WEIGHT = 60.
+cfg.LOSS.CAMERA_LOSS_SKIP_EPOCH = 5
+
+
+def get_cfg_defaults():
+ """Get a yacs CfgNode object with default values for my_project."""
+ # Return a clone so that the defaults will not be altered
+ # This is for the "local variable" use pattern
+ return cfg.clone()
+
+
+def get_cfg(args, test):
+ """
+ Define configuration.
+ """
+ import os
+
+ cfg = get_cfg_defaults()
+ if os.path.exists(args.cfg):
+ cfg.merge_from_file(args.cfg)
+
+ cfg.merge_from_list(args.opts)
+ if test:
+ cfg.merge_from_list(['EVAL', True])
+
+ return cfg.clone()
+
+
+def bool_arg(value):
+ if value.lower() in ('yes', 'true', 't', 'y', '1'):
+ return True
+ elif value.lower() in ('no', 'false', 'f', 'n', '0'):
+ return False
+
+
+def parse_args(test=False):
+ parser = argparse.ArgumentParser()
+ parser.add_argument('-c', '--cfg', type=str, default='./configs/debug.yaml', help='cfg file path')
+ parser.add_argument(
+ "--eval-set", type=str, default='3dpw', help="Evaluation dataset")
+ parser.add_argument(
+ "--eval-split", type=str, default='test', help="Evaluation data split")
+ parser.add_argument('--render', default=False, type=bool_arg,
+ help='Render SMPL meshes after the evaluation')
+ parser.add_argument('--save-results', default=False, type=bool_arg,
+ help='Save SMPL parameters after the evaluation')
+ parser.add_argument(
+ "opts", default=None, nargs=argparse.REMAINDER,
+ help="Modify config options using the command-line")
+
+ args = parser.parse_args()
+ print(args, end='\n\n')
+ cfg_file = args.cfg
+ cfg = get_cfg(args, test)
+
+ return cfg, cfg_file, args
\ No newline at end of file
diff --git a/configs/constants.py b/configs/constants.py
new file mode 100644
index 0000000000000000000000000000000000000000..ded9d2241a5ecfa192c09af4f27cfd969d1e11fa
--- /dev/null
+++ b/configs/constants.py
@@ -0,0 +1,59 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+
+IMG_FEAT_DIM = {
+ 'resnet': 2048,
+ 'vit': 1024
+}
+
+N_JOINTS = 17
+root = 'dataset'
+class PATHS:
+ # Raw data folders
+ PARSED_DATA = f'{root}/parsed_data'
+ AMASS_PTH = f'{root}/AMASS'
+ THREEDPW_PTH = f'{root}/3DPW'
+ HUMAN36M_PTH = f'{root}/Human36M'
+ RICH_PTH = f'{root}/RICH'
+ EMDB_PTH = f'{root}/EMDB'
+
+ # Processed labels
+ AMASS_LABEL = f'{root}/parsed_data/amass.pth'
+ THREEDPW_LABEL = f'{root}/parsed_data/3dpw_dset_backbone.pth'
+ MPII3D_LABEL = f'{root}/parsed_data/mpii3d_dset_backbone.pth'
+ HUMAN36M_LABEL = f'{root}/parsed_data/human36m_dset_backbone.pth'
+ INSTA_LABEL = f'{root}/parsed_data/insta_dset_backbone.pth'
+ BEDLAM_LABEL = f'{root}/parsed_data/bedlam_train_backbone.pth'
+
+class KEYPOINTS:
+ NUM_JOINTS = N_JOINTS
+ H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9]
+ H36M_TO_J14 = H36M_TO_J17[:14]
+ J17_TO_H36M = [14, 3, 4, 5, 2, 1, 0, 15, 12, 16, 13, 9, 10, 11, 8, 7, 6]
+ COCO_AUG_DICT = f'{root}/body_models/coco_aug_dict.pth'
+ TREE = [[5, 6], 0, 0, 1, 2, -1, -1, 5, 6, 7, 8, -1, -1, 11, 12, 13, 14, 15, 15, 15, 16, 16, 16]
+
+ # STD scale for video noise
+ S_BIAS = 1e-1
+ S_JITTERING = 5e-2
+ S_PEAK = 3e-1
+ S_PEAK_MASK = 5e-3
+ S_MASK = 0.03
+
+
+class BMODEL:
+ MAIN_JOINTS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] # reduced_joints
+
+ FLDR = f'{root}/body_models/smpl/'
+ SMPLX2SMPL = f'{root}/body_models/smplx2smpl.pkl'
+ FACES = f'{root}/body_models/smpl_faces.npy'
+ MEAN_PARAMS = f'{root}/body_models/smpl_mean_params.npz'
+ JOINTS_REGRESSOR_WHAM = f'{root}/body_models/J_regressor_wham.npy'
+ JOINTS_REGRESSOR_H36M = f'{root}/body_models/J_regressor_h36m.npy'
+ JOINTS_REGRESSOR_EXTRA = f'{root}/body_models/J_regressor_extra.npy'
+ JOINTS_REGRESSOR_FEET = f'{root}/body_models/J_regressor_feet.npy'
+ PARENTS = torch.tensor([
+ -1, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 12, 13, 14, 16, 17, 18, 19, 20, 21])
\ No newline at end of file
diff --git a/configs/yamls/demo.yaml b/configs/yamls/demo.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..ed67a32e22ee65ee9c0c67cf4e98bd3a8f577af6
--- /dev/null
+++ b/configs/yamls/demo.yaml
@@ -0,0 +1,14 @@
+LOGDIR: ''
+DEVICE: 'cuda'
+EXP_NAME: 'demo'
+OUTPUT_DIR: 'experiments/'
+NUM_WORKERS: 0
+MODEL_CONFIG: 'configs/yamls/model_base.yaml'
+FLIP_EVAL: True
+
+TRAIN:
+ STAGE: 'stage2'
+ CHECKPOINT: 'checkpoints/wham_vit_bedlam_w_3dpw.pth.tar'
+
+MODEL:
+ BACKBONE: 'vit'
\ No newline at end of file
diff --git a/configs/yamls/model_base.yaml b/configs/yamls/model_base.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..89a1bd8c7392a973cc269601db24960e9924d42e
--- /dev/null
+++ b/configs/yamls/model_base.yaml
@@ -0,0 +1,7 @@
+architecture: 'RNN'
+in_dim: 49
+n_iters: 1
+pose_dr: 0.15
+d_embed: 512
+n_layers: 3
+layer: 'LSTM'
\ No newline at end of file
diff --git a/configs/yamls/stage1.yaml b/configs/yamls/stage1.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..a25fd4bd219cc0842a7f81dcb226048c0793a12c
--- /dev/null
+++ b/configs/yamls/stage1.yaml
@@ -0,0 +1,28 @@
+LOGDIR: ''
+DEVICE: 'cuda'
+EXP_NAME: 'train_stage1'
+OUTPUT_DIR: 'experiments/'
+NUM_WORKERS: 8
+MODEL_CONFIG: 'configs/yamls/model_base.yaml'
+FLIP_EVAL: True
+SEED_VALUE: 42
+
+TRAIN:
+ LR: 5e-4
+ BATCH_SIZE: 64
+ END_EPOCH: 100
+ STAGE: 'stage1'
+ CHECKPOINT: ''
+ MILESTONES: [60, 80]
+
+LOSS:
+ SHAPE_LOSS_WEIGHT: 0.004
+ JOINT3D_LOSS_WEIGHT: 0.4
+ JOINT2D_LOSS_WEIGHT: 0.1
+ POSE_LOSS_WEIGHT: 8.0
+ CASCADED_LOSS_WEIGHT: 0.0
+ SLIDING_LOSS_WEIGHT: 0.5
+ CAMERA_LOSS_WEIGHT: 0.04
+ ROOT_VEL_LOSS_WEIGHT: 0.001
+ LOSS_WEIGHT: 50.0
+ CAMERA_LOSS_SKIP_EPOCH: 5
\ No newline at end of file
diff --git a/configs/yamls/stage2.yaml b/configs/yamls/stage2.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..57e69e0a3bc740daa073e829e01e9053e6dae885
--- /dev/null
+++ b/configs/yamls/stage2.yaml
@@ -0,0 +1,37 @@
+LOGDIR: ''
+DEVICE: 'cuda'
+EXP_NAME: 'train_stage2'
+OUTPUT_DIR: 'experiments'
+NUM_WORKERS: 8
+MODEL_CONFIG: 'configs/yamls/model_base.yaml'
+FLIP_EVAL: True
+SEED_VALUE: 42
+
+TRAIN:
+ LR: 1e-4
+ LR_FINETUNE: 1e-5
+ STAGE: 'stage2'
+ CHECKPOINT: 'checkpoints/wham_stage1.pth.tar'
+ BATCH_SIZE: 64
+ END_EPOCH: 40
+ MILESTONES: [20, 30]
+ LR_DECAY_RATIO: 0.2
+
+MODEL:
+ BACKBONE: 'vit'
+
+LOSS:
+ SHAPE_LOSS_WEIGHT: 0.0
+ JOINT2D_LOSS_WEIGHT: 3.0
+ JOINT3D_LOSS_WEIGHT: 6.0
+ POSE_LOSS_WEIGHT: 1.0
+ CASCADED_LOSS_WEIGHT: 0.05
+ SLIDING_LOSS_WEIGHT: 0.5
+ CAMERA_LOSS_WEIGHT: 0.01
+ ROOT_VEL_LOSS_WEIGHT: 0.001
+ LOSS_WEIGHT: 60.0
+ CAMERA_LOSS_SKIP_EPOCH: 0
+
+DATASET:
+ SEQLEN: 81
+ RATIO: [0.2, 0.2, 0.2, 0.2, 0.2]
\ No newline at end of file
diff --git a/configs/yamls/stage2_b.yaml b/configs/yamls/stage2_b.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9dfa7524d95f943edcbaaf72918c565535773e35
--- /dev/null
+++ b/configs/yamls/stage2_b.yaml
@@ -0,0 +1,38 @@
+LOGDIR: ''
+DEVICE: 'cuda'
+EXP_NAME: 'train_stage2_b'
+OUTPUT_DIR: 'experiments'
+NUM_WORKERS: 8
+MODEL_CONFIG: 'configs/yamls/model_base.yaml'
+FLIP_EVAL: True
+SEED_VALUE: 42
+
+TRAIN:
+ LR: 1e-4
+ LR_FINETUNE: 1e-5
+ STAGE: 'stage2'
+ CHECKPOINT: 'checkpoints/wham_stage1.pth.tar'
+ BATCH_SIZE: 64
+ END_EPOCH: 80
+ MILESTONES: [40, 50, 70]
+ LR_DECAY_RATIO: 0.2
+
+MODEL:
+ BACKBONE: 'vit'
+
+LOSS:
+ SHAPE_LOSS_WEIGHT: 0.0
+ JOINT2D_LOSS_WEIGHT: 5.0
+ JOINT3D_LOSS_WEIGHT: 5.0
+ VERTS3D_LOSS_WEIGHT: 1.0
+ POSE_LOSS_WEIGHT: 3.0
+ CASCADED_LOSS_WEIGHT: 0.05
+ SLIDING_LOSS_WEIGHT: 0.5
+ CAMERA_LOSS_WEIGHT: 0.01
+ ROOT_VEL_LOSS_WEIGHT: 0.001
+ LOSS_WEIGHT: 60.0
+ CAMERA_LOSS_SKIP_EPOCH: 0
+
+DATASET:
+ SEQLEN: 81
+ RATIO: [0.2, 0.2, 0.2, 0.2, 0.0, 0.2]
\ No newline at end of file
diff --git a/dataset/body_models/J_regressor_coco.npy b/dataset/body_models/J_regressor_coco.npy
new file mode 100644
index 0000000000000000000000000000000000000000..3eed75e4c494ded1e239dc939e50182491b2c9f3
--- /dev/null
+++ b/dataset/body_models/J_regressor_coco.npy
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0cd49241810715e752aa7384363b7bc09fb96b386ca99aa1c3eb2c0d15d6b8b9
+size 468648
diff --git a/dataset/body_models/J_regressor_feet.npy b/dataset/body_models/J_regressor_feet.npy
new file mode 100644
index 0000000000000000000000000000000000000000..8731b49f3a6632f26910d77be25b132ca4f041a7
--- /dev/null
+++ b/dataset/body_models/J_regressor_feet.npy
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:7ef9e6d64796f2f342983a9fde6a6d9f8e3544f1239e7f86aa4f6b7aa82f4cf6
+size 220608
diff --git a/dataset/body_models/J_regressor_h36m.npy b/dataset/body_models/J_regressor_h36m.npy
new file mode 100644
index 0000000000000000000000000000000000000000..d8ea80f7f2fa4c3fde21c543d28376b84e22d77a
--- /dev/null
+++ b/dataset/body_models/J_regressor_h36m.npy
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c655cd7013d7829eb9acbebf0e43f952a3fa0305a53c35880e39192bfb6444a0
+size 937168
diff --git a/dataset/body_models/J_regressor_wham.npy b/dataset/body_models/J_regressor_wham.npy
new file mode 100644
index 0000000000000000000000000000000000000000..0befeb8ff8ec0882510cabf925d7ab96d73c7efe
--- /dev/null
+++ b/dataset/body_models/J_regressor_wham.npy
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f938dcfd5cd88d0b19ee34e442d49f1dc370d3d8c4f5aef57a93d0cf2e267c4c
+size 854488
diff --git a/dataset/body_models/smpl/SMPL_FEMALE.pkl b/dataset/body_models/smpl/SMPL_FEMALE.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..92a201f4839bd95c1c1986437c7c6a02d7d1ae99
--- /dev/null
+++ b/dataset/body_models/smpl/SMPL_FEMALE.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a583c1b98e4afc19042641f1bae5cd8a1f712a6724886291a7627ec07acd408d
+size 39056454
diff --git a/dataset/body_models/smpl/SMPL_MALE.pkl b/dataset/body_models/smpl/SMPL_MALE.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..43dfecc57d9b7aa99cd2398df818ba252be7f605
--- /dev/null
+++ b/dataset/body_models/smpl/SMPL_MALE.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0e8c0bbbbc635dcb166ed29c303fb4bef16ea5f623e5a89263495a9e403575bd
+size 39056404
diff --git a/dataset/body_models/smpl/SMPL_NEUTRAL.pkl b/dataset/body_models/smpl/SMPL_NEUTRAL.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..26574fd104c4b69467f3c7c3516a8508d8a1a36e
--- /dev/null
+++ b/dataset/body_models/smpl/SMPL_NEUTRAL.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:98e65c74ad9b998783132f00880d1025a8d64b158e040e6ef13a557e5098bc42
+size 39001280
diff --git a/dataset/body_models/smpl/__MACOSX/._smpl b/dataset/body_models/smpl/__MACOSX/._smpl
new file mode 100644
index 0000000000000000000000000000000000000000..ecd992ce89eb63ad13ac00ecb1840eb08669d78e
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/._smpl differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/._.DS_Store b/dataset/body_models/smpl/__MACOSX/smpl/._.DS_Store
new file mode 100644
index 0000000000000000000000000000000000000000..09fa6bdda3a49951cf3fb7aa68796ee7d5c71310
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/._.DS_Store differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/.___init__.py b/dataset/body_models/smpl/__MACOSX/smpl/.___init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..198315090137148619e28344fa871854f05f2afd
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/.___init__.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/._models b/dataset/body_models/smpl/__MACOSX/smpl/._models
new file mode 100644
index 0000000000000000000000000000000000000000..33583c02c45f5acd6d0a92c24ffdfc98ffc99594
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/._models differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/._smpl_webuser b/dataset/body_models/smpl/__MACOSX/smpl/._smpl_webuser
new file mode 100644
index 0000000000000000000000000000000000000000..ffe2ae843d4c972cdad070513b7a1f0702998da8
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/._smpl_webuser differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/models/basicModel_f_lbs_10_207_0_v1.0.0.pkl b/dataset/body_models/smpl/__MACOSX/smpl/models/basicModel_f_lbs_10_207_0_v1.0.0.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..92a201f4839bd95c1c1986437c7c6a02d7d1ae99
--- /dev/null
+++ b/dataset/body_models/smpl/__MACOSX/smpl/models/basicModel_f_lbs_10_207_0_v1.0.0.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a583c1b98e4afc19042641f1bae5cd8a1f712a6724886291a7627ec07acd408d
+size 39056454
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/models/basicmodel_m_lbs_10_207_0_v1.0.0.pkl b/dataset/body_models/smpl/__MACOSX/smpl/models/basicmodel_m_lbs_10_207_0_v1.0.0.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..43dfecc57d9b7aa99cd2398df818ba252be7f605
--- /dev/null
+++ b/dataset/body_models/smpl/__MACOSX/smpl/models/basicmodel_m_lbs_10_207_0_v1.0.0.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0e8c0bbbbc635dcb166ed29c303fb4bef16ea5f623e5a89263495a9e403575bd
+size 39056404
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._LICENSE.txt b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._LICENSE.txt
new file mode 100644
index 0000000000000000000000000000000000000000..6df69f4fe6a82caa314bf48708774f6c577cade3
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._LICENSE.txt differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._README.txt b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._README.txt
new file mode 100644
index 0000000000000000000000000000000000000000..a3c0861d4d29455621650e98336cb97c09b3e124
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._README.txt differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/.___init__.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/.___init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9d3d005dbdb334ed5c90e8f2a05eafb71307b3e5
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/.___init__.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._hello_world b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._hello_world
new file mode 100644
index 0000000000000000000000000000000000000000..815dfc0483eb0314c48268ed68102a6442e97982
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._hello_world differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._lbs.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._lbs.py
new file mode 100644
index 0000000000000000000000000000000000000000..c141f0b71b678ee836ef1b58733749b8aea579c9
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._lbs.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._posemapper.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._posemapper.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4a067064fdd6a84bd9f7a6042579d744c501927
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._posemapper.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._serialization.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._serialization.py
new file mode 100644
index 0000000000000000000000000000000000000000..5349c7ffefc22416559dcb7ef5cba17164de4391
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._serialization.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._verts.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._verts.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3dbe30d07990f310b3c5bd767953c0715247e20
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/._verts.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._hello_smpl.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._hello_smpl.py
new file mode 100644
index 0000000000000000000000000000000000000000..660954a98795d0a7bf3e7f431936d531229fd661
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._hello_smpl.py differ
diff --git a/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._render_smpl.py b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._render_smpl.py
new file mode 100644
index 0000000000000000000000000000000000000000..6b8b3bece60fad5e282a73384e6e06fa8d04737d
Binary files /dev/null and b/dataset/body_models/smpl/__MACOSX/smpl/smpl_webuser/hello_world/._render_smpl.py differ
diff --git a/dataset/body_models/smpl_faces.npy b/dataset/body_models/smpl_faces.npy
new file mode 100644
index 0000000000000000000000000000000000000000..4b0c3c149ef8a1899182c056ed2cb24746ae7199
--- /dev/null
+++ b/dataset/body_models/smpl_faces.npy
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:51fc11ebadb0487d74bef220c4eea43f014609249f0121413c1fc629d859fecb
+size 165392
diff --git a/dataset/body_models/smpl_mean_params.npz b/dataset/body_models/smpl_mean_params.npz
new file mode 100644
index 0000000000000000000000000000000000000000..c6f60a76976b877cbc08345b2977c6ddd83ced87
--- /dev/null
+++ b/dataset/body_models/smpl_mean_params.npz
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6fd6dd687800da946d0a0492383f973b92ec20f166a0b829775882868c35fcdd
+size 1310
diff --git a/dataset/body_models/smplx2smpl.pkl b/dataset/body_models/smplx2smpl.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..0f25e10571181989524020c803280607b7ee9a85
--- /dev/null
+++ b/dataset/body_models/smplx2smpl.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:c1d912d121ad98132e4492d8e7a0f1a8cf4412811e14a7ef8cb337bb48eef99e
+size 578019251
diff --git a/demo.py b/demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..178255379cec258316cfe7e642d9688219c39faf
--- /dev/null
+++ b/demo.py
@@ -0,0 +1,234 @@
+import os
+import argparse
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import joblib
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs.config import get_cfg_defaults
+from lib.data.datasets import CustomDataset
+from lib.utils.imutils import avg_preds
+from lib.utils.transforms import matrix_to_axis_angle
+from lib.models import build_network, build_body_model
+from lib.models.preproc.detector import DetectionModel
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.smplify import TemporalSMPLify
+
+try:
+ from lib.models.preproc.slam import SLAMModel
+ _run_global = True
+except:
+ logger.info('DPVO is not properly installed. Only estimate in local coordinates !')
+ _run_global = False
+
+def run(cfg,
+ video,
+ output_pth,
+ network,
+ calib=None,
+ run_global=True,
+ save_pkl=False,
+ visualize=False):
+
+ cap = cv2.VideoCapture(video)
+ assert cap.isOpened(), f'Faild to load video file {video}'
+ fps = cap.get(cv2.CAP_PROP_FPS)
+ length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+ width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+
+ # Whether or not estimating motion in global coordinates
+ run_global = run_global and _run_global
+
+ # Preprocess
+ with torch.no_grad():
+ if not (osp.exists(osp.join(output_pth, 'tracking_results.pth')) and
+ osp.exists(osp.join(output_pth, 'slam_results.pth'))):
+
+ detector = DetectionModel(cfg.DEVICE.lower())
+ extractor = FeatureExtractor(cfg.DEVICE.lower(), cfg.FLIP_EVAL)
+
+ if run_global: slam = SLAMModel(video, output_pth, width, height, calib)
+ else: slam = None
+
+ bar = Bar('Preprocess: 2D detection and SLAM', fill='#', max=length)
+ while (cap.isOpened()):
+ flag, img = cap.read()
+ if not flag: break
+
+ # 2D detection and tracking
+ detector.track(img, fps, length)
+
+ # SLAM
+ if slam is not None:
+ slam.track()
+
+ bar.next()
+
+ tracking_results = detector.process(fps)
+
+ if slam is not None:
+ slam_results = slam.process()
+ else:
+ slam_results = np.zeros((length, 7))
+ slam_results[:, 3] = 1.0 # Unit quaternion
+
+ # Extract image features
+ # TODO: Merge this into the previous while loop with an online bbox smoothing.
+ tracking_results = extractor.run(video, tracking_results)
+ logger.info('Complete Data preprocessing!')
+
+ # Save the processed data
+ joblib.dump(tracking_results, osp.join(output_pth, 'tracking_results.pth'))
+ joblib.dump(slam_results, osp.join(output_pth, 'slam_results.pth'))
+ logger.info(f'Save processed data at {output_pth}')
+
+ # If the processed data already exists, load the processed data
+ else:
+ tracking_results = joblib.load(osp.join(output_pth, 'tracking_results.pth'))
+ slam_results = joblib.load(osp.join(output_pth, 'slam_results.pth'))
+ logger.info(f'Already processed data exists at {output_pth} ! Load the data .')
+
+ # Build dataset
+ dataset = CustomDataset(cfg, tracking_results, slam_results, width, height, fps)
+
+ # run WHAM
+ results = defaultdict(dict)
+
+ n_subjs = len(dataset)
+ for subj in range(n_subjs):
+
+ with torch.no_grad():
+ if cfg.FLIP_EVAL:
+ # Forward pass with flipped input
+ flipped_batch = dataset.load_data(subj, True)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = flipped_batch
+ flipped_pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel, return_y_up=True, **kwargs)
+
+ # Forward pass with normal input
+ batch = dataset.load_data(subj)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = batch
+ pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel, return_y_up=True, **kwargs)
+
+ # Merge two predictions
+ flipped_pose, flipped_shape = flipped_pred['pose'].squeeze(0), flipped_pred['betas'].squeeze(0)
+ pose, shape = pred['pose'].squeeze(0), pred['betas'].squeeze(0)
+ flipped_pose, pose = flipped_pose.reshape(-1, 24, 6), pose.reshape(-1, 24, 6)
+ avg_pose, avg_shape = avg_preds(pose, shape, flipped_pose, flipped_shape)
+ avg_pose = avg_pose.reshape(-1, 144)
+ avg_contact = (flipped_pred['contact'][..., [2, 3, 0, 1]] + pred['contact']) / 2
+
+ # Refine trajectory with merged prediction
+ network.pred_pose = avg_pose.view_as(network.pred_pose)
+ network.pred_shape = avg_shape.view_as(network.pred_shape)
+ network.pred_contact = avg_contact.view_as(network.pred_contact)
+ output = network.forward_smpl(**kwargs)
+ pred = network.refine_trajectory(output, cam_angvel, return_y_up=True)
+
+ else:
+ # data
+ batch = dataset.load_data(subj)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = batch
+
+ # inference
+ pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel, return_y_up=True, **kwargs)
+
+ # if False:
+ if args.run_smplify:
+ smplify = TemporalSMPLify(smpl, img_w=width, img_h=height, device=cfg.DEVICE)
+ input_keypoints = dataset.tracking_results[_id]['keypoints']
+ pred = smplify.fit(pred, input_keypoints, **kwargs)
+
+ with torch.no_grad():
+ network.pred_pose = pred['pose']
+ network.pred_shape = pred['betas']
+ network.pred_cam = pred['cam']
+ output = network.forward_smpl(**kwargs)
+ pred = network.refine_trajectory(output, cam_angvel, return_y_up=True)
+
+ # ========= Store results ========= #
+ pred_body_pose = matrix_to_axis_angle(pred['poses_body']).cpu().numpy().reshape(-1, 69)
+ pred_root = matrix_to_axis_angle(pred['poses_root_cam']).cpu().numpy().reshape(-1, 3)
+ pred_root_world = matrix_to_axis_angle(pred['poses_root_world']).cpu().numpy().reshape(-1, 3)
+ pred_pose = np.concatenate((pred_root, pred_body_pose), axis=-1)
+ pred_pose_world = np.concatenate((pred_root_world, pred_body_pose), axis=-1)
+ pred_trans = (pred['trans_cam'] - network.output.offset).cpu().numpy()
+
+ results[_id]['pose'] = pred_pose
+ results[_id]['trans'] = pred_trans
+ results[_id]['pose_world'] = pred_pose_world
+ results[_id]['trans_world'] = pred['trans_world'].cpu().squeeze(0).numpy()
+ results[_id]['betas'] = pred['betas'].cpu().squeeze(0).numpy()
+ results[_id]['verts'] = (pred['verts_cam'] + pred['trans_cam'].unsqueeze(1)).cpu().numpy()
+ results[_id]['frame_ids'] = frame_id
+
+ if save_pkl:
+ joblib.dump(results, osp.join(output_pth, "wham_output.pkl"))
+
+ # Visualize
+ if visualize:
+ from lib.vis.run_vis import run_vis_on_demo
+ with torch.no_grad():
+ run_vis_on_demo(cfg, video, results, output_pth, network.smpl, vis_global=run_global)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+
+ parser.add_argument('--video', type=str,
+ default='examples/demo_video.mp4',
+ help='input video path or youtube link')
+
+ parser.add_argument('--output_pth', type=str, default='output/demo',
+ help='output folder to write results')
+
+ parser.add_argument('--calib', type=str, default=None,
+ help='Camera calibration file path')
+
+ parser.add_argument('--estimate_local_only', action='store_true',
+ help='Only estimate motion in camera coordinate if True')
+
+ parser.add_argument('--visualize', action='store_true',
+ help='Visualize the output mesh if True')
+
+ parser.add_argument('--save_pkl', action='store_true',
+ help='Save output as pkl file')
+
+ parser.add_argument('--run_smplify', action='store_true',
+ help='Run Temporal SMPLify for post processing')
+
+ args = parser.parse_args()
+
+ cfg = get_cfg_defaults()
+ cfg.merge_from_file('configs/yamls/demo.yaml')
+
+ logger.info(f'GPU name -> {torch.cuda.get_device_name()}')
+ logger.info(f'GPU feat -> {torch.cuda.get_device_properties("cuda")}')
+
+ # ========= Load WHAM ========= #
+ smpl_batch_size = cfg.TRAIN.BATCH_SIZE * cfg.DATASET.SEQLEN
+ smpl = build_body_model(cfg.DEVICE, smpl_batch_size)
+ network = build_network(cfg, smpl)
+ network.eval()
+
+ # Output folder
+ sequence = '.'.join(args.video.split('/')[-1].split('.')[:-1])
+ output_pth = osp.join(args.output_pth, sequence)
+ os.makedirs(output_pth, exist_ok=True)
+
+ run(cfg,
+ args.video,
+ output_pth,
+ network,
+ args.calib,
+ run_global=not args.estimate_local_only,
+ save_pkl=args.save_pkl,
+ visualize=args.visualize)
+
+ print()
+ logger.info('Done !')
\ No newline at end of file
diff --git a/docs/API.md b/docs/API.md
new file mode 100644
index 0000000000000000000000000000000000000000..511ac3a0374218d567a97a7e36562d2140d6d074
--- /dev/null
+++ b/docs/API.md
@@ -0,0 +1,18 @@
+## Python API
+
+To use python API of WHAM, please finish the basic installation first ([Installation](INSTALL.md) or [Docker](DOCKER.md)).
+
+If you use Docker environment, please run:
+
+```bash
+cd /path/to/WHAM
+docker run -it -v .:/code/ --rm yusun9/wham-vitpose-dpvo-cuda11.3-python3.9 python
+```
+
+Then you can run wham via python code like
+```bash
+from wham_api import WHAM_API
+wham_model = WHAM_API()
+input_video_path = 'examples/IMG_9732.mov'
+results, tracking_results, slam_results = wham_model(input_video_path)
+```
\ No newline at end of file
diff --git a/docs/DATASET.md b/docs/DATASET.md
new file mode 100644
index 0000000000000000000000000000000000000000..b47a4def1b56f8c39f56052b556edd6602ce8ab0
--- /dev/null
+++ b/docs/DATASET.md
@@ -0,0 +1,42 @@
+# Dataset
+
+## Training Data
+We use [AMASS](https://amass.is.tue.mpg.de/), [InstaVariety](https://github.com/akanazawa/human_dynamics/blob/master/doc/insta_variety.md), [MPI-INF-3DHP](https://vcai.mpi-inf.mpg.de/3dhp-dataset/), [Human3.6M](http://vision.imar.ro/human3.6m/description.php), and [3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW/) datasets for training. Please register to their websites to download and process the data. You can download parsed ViT version of InstaVariety, MPI-INF-3DHP, Human3.6M, and 3DPW data from the [Google Drive](https://drive.google.com/drive/folders/13T2ghVvrw_fEk3X-8L0e6DVSYx_Og8o3?usp=sharing). You can save the data under `dataset/parsed_data` folder.
+
+### Process AMASS dataset
+After downloading AMASS dataset, you can process it by running:
+```bash
+python -m lib.data_utils.amass_utils
+```
+The processed data will be stored at `dataset/parsed_data/amass.pth`.
+
+### Process 3DPW, MPII3D, Human3.6M, and InstaVariety datasets
+First, visit [TCMR](https://github.com/hongsukchoi/TCMR_RELEASE) and download preprocessed data at `dataset/parsed_data/TCMR_preproc/'.
+
+Next, prepare 2D keypoints detection using [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) and store the results at `dataset/detection_results/\/\'. You may need to download all images to prepare the detection results.
+
+For Human36M, MPII3D, and InstaVariety datasets, you need to also download [NeuralAnnot](https://github.com/mks0601/NeuralAnnot_RELEASE) pseudo groundtruth SMPL label. As mentioned in our paper, we do not supervise WHAM on this label, but use it for neural initialization step.
+
+Finally, run following codes to preprocess all training data.
+```bash
+python -m lib.data_utils.threedpw_train_utils # 3DPW dataset
+# [Coming] python -m lib.data_utils.human36m_train_utils # Human3.6M dataset
+# [Coming] python -m lib.data_utils.mpii3d_train_utils # MPI-INF-3DHP dataset
+# [Coming] python -m lib.data_utils.insta_train_utils # InstaVariety dataset
+```
+
+### Process BEDLAM dataset
+Will be updated.
+
+## Evaluation Data
+We use [3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW/), [RICH](https://rich.is.tue.mpg.de/), and [EMDB](https://eth-ait.github.io/emdb/) for the evaluation. We provide the parsed data for the evaluation. Please download the data from [Google Drive](https://drive.google.com/drive/folders/13T2ghVvrw_fEk3X-8L0e6DVSYx_Og8o3?usp=sharing) and place them at `dataset/parsed_data/`.
+
+To process the data at your end, please
+1) Download parsed 3DPW data from [TCMR](https://github.com/hongsukchoi/TCMR_RELEASE) and store `dataset/parsed_data/TCMR_preproc/'.
+2) Run [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) on all test data and store the results at `dataset/detection_results/\'.
+3) Run following codes.
+```bash
+python -m lib.data_utils.threedpw_eval_utils --split <"val" or "test"> # 3DPW dataset
+python -m lib.data_utils.emdb_eval_utils --split <"1" or "2"> # EMDB dataset
+python -m lib.data_utils.rich_eval_utils # RICH dataset
+```
\ No newline at end of file
diff --git a/docs/DOCKER.md b/docs/DOCKER.md
new file mode 100644
index 0000000000000000000000000000000000000000..23f45a20ed665442674f3398ba18b363bb5f7508
--- /dev/null
+++ b/docs/DOCKER.md
@@ -0,0 +1,23 @@
+## Installation
+
+### Pre-requirments
+1. Please make sure that you have properly installed the [Docker](https://www.docker.com/) and [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) before installation.
+
+2. Please prepare the essential data for inference:
+To download SMPL body models (Neutral, Female, and Male), you need to register for [SMPL](https://smpl.is.tue.mpg.de/) and [SMPLify](https://smplify.is.tue.mpg.de/). The username and password for both homepages will be used while fetching the demo data.
+Next, run the following script to fetch demo data. This script will download all the required dependencies including trained models and demo videos.
+```bash
+bash fetch_demo_data.sh
+```
+
+### Usage
+1. Pulling the docker image from docker hub:
+```bash
+docker pull yusun9/wham-vitpose-dpvo-cuda11.3-python3.9:latest
+```
+
+2. Run the code with docker environment:
+```bash
+cd /path/to/WHAM
+docker run -v .:/code/ --rm yusun9/wham-vitpose-dpvo-cuda11.3-python3.9 python demo.py --video examples/IMG_9732.mov
+```
\ No newline at end of file
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
new file mode 100644
index 0000000000000000000000000000000000000000..9f9038009109d1d375b88fd5533d22750a0c53a3
--- /dev/null
+++ b/docs/INSTALL.md
@@ -0,0 +1,38 @@
+# Installation
+
+WHAM has been implemented and tested on Ubuntu 20.04 and 22.04 with python = 3.9. We provide [anaconda](https://www.anaconda.com/) environment to run WHAM as below.
+
+```bash
+# Clone the repo
+git clone https://github.com/yohanshin/WHAM.git --recursive
+cd WHAM/
+
+# Create Conda environment
+conda create -n wham python=3.9
+conda activate wham
+
+# Install PyTorch libraries
+conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
+
+# Install PyTorch3D (optional) for visualization
+conda install -c fvcore -c iopath -c conda-forge fvcore iopath
+pip install pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu113_pyt1110/download.html
+
+# Install WHAM dependencies
+pip install -r requirements.txt
+
+# Install ViTPose
+pip install -v -e third-party/ViTPose
+
+# Install DPVO
+cd third-party/DPVO
+wget https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip
+unzip eigen-3.4.0.zip -d thirdparty && rm -rf eigen-3.4.0.zip
+conda install pytorch-scatter=2.0.9 -c rusty1s
+conda install cudatoolkit-dev=11.3.1 -c conda-forge
+
+# ONLY IF your GCC version is larger than 10
+conda install -c conda-forge gxx=9.5
+
+pip install .
+```
diff --git a/examples/IMG_9730.mov b/examples/IMG_9730.mov
new file mode 100644
index 0000000000000000000000000000000000000000..96e46dcccd85d1561f436db14563ae10103364b8
--- /dev/null
+++ b/examples/IMG_9730.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3739b87ba0c64d047df3d8f5479c530377788fdab4c2283925477894a1d252f9
+size 21526220
diff --git a/examples/IMG_9731.mov b/examples/IMG_9731.mov
new file mode 100644
index 0000000000000000000000000000000000000000..4d409b0a40c04ffeccf55ee215737ca7cb2c14a3
--- /dev/null
+++ b/examples/IMG_9731.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:116ad3f95743524283a234fd9e7a1152b28a04536ab5975f4e4e71c547d9e1a6
+size 22633328
diff --git a/examples/IMG_9732.mov b/examples/IMG_9732.mov
new file mode 100644
index 0000000000000000000000000000000000000000..7ba45a3d48213e3c52c46569131485c93c44429d
--- /dev/null
+++ b/examples/IMG_9732.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:168773c92e0112361dcd1da4154c915983490e58ff89102c1a65edb28d505813
+size 23960355
diff --git a/examples/drone_calib.txt b/examples/drone_calib.txt
new file mode 100644
index 0000000000000000000000000000000000000000..00052336e94e9f785a3ce455700a4bd888d213ce
--- /dev/null
+++ b/examples/drone_calib.txt
@@ -0,0 +1 @@
+1321.0 1321.0 960.0 540.0
\ No newline at end of file
diff --git a/examples/drone_video.mp4 b/examples/drone_video.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..bdcf1548b1ccff376f1d7f358ddd3ff9184ed7af
--- /dev/null
+++ b/examples/drone_video.mp4
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0da55210a305c3c75caa732c46b7330bb3d4e39ebeb9bc3af1e2b100dd8990c1
+size 20601030
diff --git a/examples/test16.mov b/examples/test16.mov
new file mode 100644
index 0000000000000000000000000000000000000000..b00eafc06cd037c17fd61b660030fefabda6a38a
--- /dev/null
+++ b/examples/test16.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f068400bf962e732e5517af45397694f84fae0a6592085b9dd3781fdbacaa550
+size 1567779
diff --git a/examples/test17.mov b/examples/test17.mov
new file mode 100644
index 0000000000000000000000000000000000000000..de4b0669d6867fb70d5acbfe24c0929eb96b2c79
--- /dev/null
+++ b/examples/test17.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ce06d8885332fd0b770273010dbd4da20a0867a386dc55925f85198651651253
+size 2299497
diff --git a/examples/test18.mov b/examples/test18.mov
new file mode 100644
index 0000000000000000000000000000000000000000..97d0466333a92cb4877c80e0d22eeedde01732ee
--- /dev/null
+++ b/examples/test18.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:66fc6eb20e1c8525070c8004bed621e0acc2712accace1dbf1eb72fced62bb14
+size 2033756
diff --git a/examples/test19.mov b/examples/test19.mov
new file mode 100644
index 0000000000000000000000000000000000000000..5414b6061cf39260b47ebd8bcfb124980be6fe3e
--- /dev/null
+++ b/examples/test19.mov
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:878219571dbf0e8ff56f4ba4bf325f90f46a730b57a35a2df91f4f509af616d8
+size 1940593
diff --git a/fetch_demo_data.sh b/fetch_demo_data.sh
new file mode 100644
index 0000000000000000000000000000000000000000..1f5fcfd2a4558a3df705854404c734cbb5a82f66
--- /dev/null
+++ b/fetch_demo_data.sh
@@ -0,0 +1,50 @@
+#!/bin/bash
+urle () { [[ "${1}" ]] || return 1; local LANG=C i x; for (( i = 0; i < ${#1}; i++ )); do x="${1:i:1}"; [[ "${x}" == [a-zA-Z0-9.~-] ]] && echo -n "${x}" || printf '%%%02X' "'${x}"; done; echo; }
+
+# SMPL Neutral model
+echo -e "\nYou need to register at https://smplify.is.tue.mpg.de"
+read -p "Username (SMPLify):" username
+read -p "Password (SMPLify):" password
+username=$(urle $username)
+password=$(urle $password)
+
+mkdir -p dataset/body_models/smpl
+wget --post-data "username=$username&password=$password" 'https://download.is.tue.mpg.de/download.php?domain=smplify&resume=1&sfile=mpips_smplify_public_v2.zip' -O './dataset/body_models/smplify.zip' --no-check-certificate --continue
+unzip dataset/body_models/smplify.zip -d dataset/body_models/smplify
+mv dataset/body_models/smplify/smplify_public/code/models/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl dataset/body_models/smpl/SMPL_NEUTRAL.pkl
+rm -rf dataset/body_models/smplify
+rm -rf dataset/body_models/smplify.zip
+
+# SMPL Male and Female model
+echo -e "\nYou need to register at https://smpl.is.tue.mpg.de"
+read -p "Username (SMPL):" username
+read -p "Password (SMPL):" password
+username=$(urle $username)
+password=$(urle $password)
+
+wget --post-data "username=$username&password=$password" 'https://download.is.tue.mpg.de/download.php?domain=smpl&sfile=SMPL_python_v.1.0.0.zip' -O './dataset/body_models/smpl.zip' --no-check-certificate --continue
+unzip dataset/body_models/smpl.zip -d dataset/body_models/smpl
+mv dataset/body_models/smpl/smpl/models/basicModel_f_lbs_10_207_0_v1.0.0.pkl dataset/body_models/smpl/SMPL_FEMALE.pkl
+mv dataset/body_models/smpl/smpl/models/basicmodel_m_lbs_10_207_0_v1.0.0.pkl dataset/body_models/smpl/SMPL_MALE.pkl
+rm -rf dataset/body_models/smpl/smpl
+rm -rf dataset/body_models/smpl.zip
+
+# Auxiliary SMPL-related data
+wget "https://drive.google.com/uc?id=1pbmzRbWGgae6noDIyQOnohzaVnX_csUZ&export=download&confirm=t" -O 'dataset/body_models.tar.gz'
+tar -xvf dataset/body_models.tar.gz -C dataset/
+rm -rf dataset/body_models.tar.gz
+
+# Checkpoints
+mkdir checkpoints
+gdown "https://drive.google.com/uc?id=1i7kt9RlCCCNEW2aYaDWVr-G778JkLNcB&export=download&confirm=t" -O 'checkpoints/wham_vit_w_3dpw.pth.tar'
+gdown "https://drive.google.com/uc?id=19qkI-a6xuwob9_RFNSPWf1yWErwVVlks&export=download&confirm=t" -O 'checkpoints/wham_vit_bedlam_w_3dpw.pth.tar'
+gdown "https://drive.google.com/uc?id=1J6l8teyZrL0zFzHhzkC7efRhU0ZJ5G9Y&export=download&confirm=t" -O 'checkpoints/hmr2a.ckpt'
+gdown "https://drive.google.com/uc?id=1kXTV4EYb-BI3H7J-bkR3Bc4gT9zfnHGT&export=download&confirm=t" -O 'checkpoints/dpvo.pth'
+gdown "https://drive.google.com/uc?id=1zJ0KP23tXD42D47cw1Gs7zE2BA_V_ERo&export=download&confirm=t" -O 'checkpoints/yolov8x.pt'
+gdown "https://drive.google.com/uc?id=1xyF7F3I7lWtdq82xmEPVQ5zl4HaasBso&export=download&confirm=t" -O 'checkpoints/vitpose-h-multi-coco.pth'
+
+# Demo videos
+gdown "https://drive.google.com/uc?id=1KjfODCcOUm_xIMLLR54IcjJtf816Dkc7&export=download&confirm=t" -O 'examples.tar.gz'
+tar -xvf examples.tar.gz
+rm -rf examples.tar.gz
+
diff --git a/lib/core/loss.py b/lib/core/loss.py
new file mode 100644
index 0000000000000000000000000000000000000000..1fc182812479a90cfab611a140e57a787dab3ce3
--- /dev/null
+++ b/lib/core/loss.py
@@ -0,0 +1,438 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import torch.nn as nn
+from torch.nn import functional as F
+
+from configs import constants as _C
+from lib.utils import transforms
+from lib.utils.kp_utils import root_centering
+
+class WHAMLoss(nn.Module):
+ def __init__(
+ self,
+ cfg=None,
+ device=None,
+ ):
+ super(WHAMLoss, self).__init__()
+
+ self.cfg = cfg
+ self.n_joints = _C.KEYPOINTS.NUM_JOINTS
+ self.criterion = nn.MSELoss()
+ self.criterion_noreduce = nn.MSELoss(reduction='none')
+
+ self.pose_loss_weight = cfg.LOSS.POSE_LOSS_WEIGHT
+ self.shape_loss_weight = cfg.LOSS.SHAPE_LOSS_WEIGHT
+ self.keypoint_2d_loss_weight = cfg.LOSS.JOINT2D_LOSS_WEIGHT
+ self.keypoint_3d_loss_weight = cfg.LOSS.JOINT3D_LOSS_WEIGHT
+ self.cascaded_loss_weight = cfg.LOSS.CASCADED_LOSS_WEIGHT
+ self.vertices_loss_weight = cfg.LOSS.VERTS3D_LOSS_WEIGHT
+ self.contact_loss_weight = cfg.LOSS.CONTACT_LOSS_WEIGHT
+ self.root_vel_loss_weight = cfg.LOSS.ROOT_VEL_LOSS_WEIGHT
+ self.root_pose_loss_weight = cfg.LOSS.ROOT_POSE_LOSS_WEIGHT
+ self.sliding_loss_weight = cfg.LOSS.SLIDING_LOSS_WEIGHT
+ self.camera_loss_weight = cfg.LOSS.CAMERA_LOSS_WEIGHT
+ self.loss_weight = cfg.LOSS.LOSS_WEIGHT
+
+ kp_weights = [
+ 0.5, 0.5, 0.5, 0.5, 0.5, # Face
+ 1.5, 1.5, 4, 4, 4, 4, # Arms
+ 1.5, 1.5, 4, 4, 4, 4, # Legs
+ 4, 4, 1.5, 1.5, 4, 4, # Legs
+ 4, 4, 1.5, 1.5, 4, 4, # Arms
+ 0.5, 0.5 # Head
+ ]
+
+ theta_weights = [
+ 0.1, 1.0, 1.0, 1.0, 1.0, # pelvis, lhip, rhip, spine1, lknee
+ 1.0, 1.0, 1.0, 1.0, 1.0, # rknn, spine2, lankle, rankle, spin3
+ 0.1, 0.1, # Foot
+ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, # neck, lisldr, risldr, head, losldr, rosldr,
+ 1.0, 1.0, 1.0, 1.0, # lelbow, relbow, lwrist, rwrist
+ 0.1, 0.1, # Hand
+ ]
+ self.theta_weights = torch.tensor([[theta_weights]]).float().to(device)
+ self.theta_weights /= self.theta_weights.mean()
+ self.kp_weights = torch.tensor([kp_weights]).float().to(device)
+
+ self.epoch = -1
+ self.step()
+
+ def step(self):
+ self.epoch += 1
+ self.skip_camera_loss = self.epoch < self.cfg.LOSS.CAMERA_LOSS_SKIP_EPOCH
+
+ def forward(self, pred, gt):
+
+ loss = 0.0
+ b, f = gt['kp3d'].shape[:2]
+
+ # <======= Predictions and Groundtruths
+ pred_betas = pred['betas']
+ pred_pose = pred['pose'].reshape(b, f, -1, 6)
+ pred_kp3d_nn = pred['kp3d_nn']
+ pred_kp3d_smpl = root_centering(pred['kp3d'].reshape(b, f, -1, 3))
+ pred_full_kp2d = pred['full_kp2d']
+ pred_weak_kp2d = pred['weak_kp2d']
+ pred_contact = pred['contact']
+ pred_vel_root = pred['vel_root']
+ pred_pose_root = pred['poses_root_r6d'][:, 1:]
+ pred_vel_root_ref = pred['vel_root_refined']
+ pred_pose_root_ref = pred['poses_root_r6d_refined'][:, 1:]
+ pred_cam_r = transforms.matrix_to_rotation_6d(pred['R'])
+
+ gt_betas = gt['betas']
+ gt_pose = gt['pose']
+ gt_kp3d = root_centering(gt['kp3d'])
+ gt_full_kp2d = gt['full_kp2d']
+ gt_weak_kp2d = gt['weak_kp2d']
+ gt_contact = gt['contact']
+ gt_vel_root = gt['vel_root']
+ gt_pose_root = gt['pose_root'][:, 1:]
+ gt_cam_angvel = gt['cam_angvel']
+ gt_cam_r = transforms.matrix_to_rotation_6d(gt['R'][:, 1:])
+ bbox = gt['bbox']
+ # =======>
+
+ loss_keypoints_full = full_projected_keypoint_loss(
+ pred_full_kp2d,
+ gt_full_kp2d,
+ bbox,
+ self.kp_weights,
+ criterion=self.criterion_noreduce,
+ )
+
+ loss_keypoints_weak = weak_projected_keypoint_loss(
+ pred_weak_kp2d,
+ gt_weak_kp2d,
+ self.kp_weights,
+ criterion=self.criterion_noreduce
+ )
+
+ # Compute 3D keypoint loss
+ loss_keypoints_3d_nn = keypoint_3d_loss(
+ pred_kp3d_nn,
+ gt_kp3d[:, :, :self.n_joints],
+ self.kp_weights[:, :self.n_joints],
+ criterion=self.criterion_noreduce,
+ )
+
+ loss_keypoints_3d_smpl = keypoint_3d_loss(
+ pred_kp3d_smpl,
+ gt_kp3d,
+ self.kp_weights,
+ criterion=self.criterion_noreduce,
+ )
+
+ loss_cascaded = keypoint_3d_loss(
+ pred_kp3d_nn,
+ torch.cat((pred_kp3d_smpl[:, :, :self.n_joints], gt_kp3d[:, :, :self.n_joints, -1:]), dim=-1),
+ self.kp_weights[:, :self.n_joints] * 0.5,
+ criterion=self.criterion_noreduce,
+ )
+
+ loss_vertices = vertices_loss(
+ pred['verts_cam'],
+ gt['verts'],
+ gt['has_verts'],
+ criterion=self.criterion_noreduce,
+ )
+
+ # Compute loss on SMPL parameters
+ smpl_mask = gt['has_smpl']
+ loss_regr_pose, loss_regr_betas = smpl_losses(
+ pred_pose,
+ pred_betas,
+ gt_pose,
+ gt_betas,
+ self.theta_weights,
+ smpl_mask,
+ criterion=self.criterion_noreduce
+ )
+
+ # Compute loss on foot contact
+ loss_contact = contact_loss(
+ pred_contact,
+ gt_contact,
+ self.criterion_noreduce
+ )
+
+ # Compute loss on root velocity and angular velocity
+ loss_vel_root, loss_pose_root = root_loss(
+ pred_vel_root,
+ pred_pose_root,
+ gt_vel_root,
+ gt_pose_root,
+ gt_contact,
+ self.criterion_noreduce
+ )
+
+ # Root loss after trajectory refinement
+ loss_vel_root_ref, loss_pose_root_ref = root_loss(
+ pred_vel_root_ref,
+ pred_pose_root_ref,
+ gt_vel_root,
+ gt_pose_root,
+ gt_contact,
+ self.criterion_noreduce
+ )
+
+ # Camera prediction loss
+ loss_camera = camera_loss(
+ pred_cam_r,
+ gt_cam_r,
+ gt_cam_angvel[:, 1:],
+ gt['has_traj'],
+ self.criterion_noreduce,
+ self.skip_camera_loss
+ )
+
+ # Foot sliding loss
+ loss_sliding = sliding_loss(
+ pred['feet'],
+ gt_contact,
+ )
+
+ # Foot sliding loss
+ loss_sliding_ref = sliding_loss(
+ pred['feet_refined'],
+ gt_contact,
+ )
+
+ loss_keypoints = loss_keypoints_full + loss_keypoints_weak
+ loss_keypoints *= self.keypoint_2d_loss_weight
+ loss_keypoints_3d_smpl *= self.keypoint_3d_loss_weight
+ loss_keypoints_3d_nn *= self.keypoint_3d_loss_weight
+ loss_cascaded *= self.cascaded_loss_weight
+ loss_vertices *= self.vertices_loss_weight
+ loss_contact *= self.contact_loss_weight
+ loss_root = loss_vel_root * self.root_vel_loss_weight + loss_pose_root * self.root_pose_loss_weight
+ loss_root_ref = loss_vel_root_ref * self.root_vel_loss_weight + loss_pose_root_ref * self.root_pose_loss_weight
+
+ loss_regr_pose *= self.pose_loss_weight
+ loss_regr_betas *= self.shape_loss_weight
+
+ loss_sliding *= self.sliding_loss_weight
+ loss_camera *= self.camera_loss_weight
+ loss_sliding_ref *= self.sliding_loss_weight
+
+ loss_dict = {
+ 'pose': loss_regr_pose * self.loss_weight,
+ 'betas': loss_regr_betas * self.loss_weight,
+ '2d': loss_keypoints * self.loss_weight,
+ '3d': loss_keypoints_3d_smpl * self.loss_weight,
+ '3d_nn': loss_keypoints_3d_nn * self.loss_weight,
+ 'casc': loss_cascaded * self.loss_weight,
+ 'v3d': loss_vertices * self.loss_weight,
+ 'contact': loss_contact * self.loss_weight,
+ 'root': loss_root * self.loss_weight,
+ 'root_ref': loss_root_ref * self.loss_weight,
+ 'sliding': loss_sliding * self.loss_weight,
+ 'camera': loss_camera * self.loss_weight,
+ 'sliding_ref': loss_sliding_ref * self.loss_weight,
+ }
+
+ loss = sum(loss for loss in loss_dict.values())
+
+ return loss, loss_dict
+
+
+def root_loss(
+ pred_vel_root,
+ pred_pose_root,
+ gt_vel_root,
+ gt_pose_root,
+ stationary,
+ criterion
+):
+
+ mask_r = (gt_pose_root != 0.0).all(dim=-1).all(dim=-1)
+ mask_v = (gt_vel_root != 0.0).all(dim=-1).all(dim=-1)
+ mask_s = (stationary != -1).any(dim=1).any(dim=1)
+ mask_v = mask_v * mask_s
+
+ if mask_r.any():
+ loss_r = criterion(pred_pose_root, gt_pose_root)[mask_r].mean()
+ else:
+ loss_r = torch.FloatTensor(1).fill_(0.).to(gt_pose_root.device)[0]
+
+ if mask_v.any():
+ loss_v = 0
+ T = gt_vel_root.shape[0]
+ ws_list = [1, 3, 9, 27]
+ for ws in ws_list:
+ tmp_v = 0
+ for m in range(T//ws):
+ cumulative_v = torch.sum(pred_vel_root[:, m:(m+1)*ws] - gt_vel_root[:, m:(m+1)*ws], dim=1)
+ tmp_v += torch.norm(cumulative_v, dim=-1)
+ loss_v += tmp_v
+ loss_v = loss_v[mask_v].mean()
+ else:
+ loss_v = torch.FloatTensor(1).fill_(0.).to(gt_vel_root.device)[0]
+
+ return loss_v, loss_r
+
+
+def contact_loss(
+ pred_stationary,
+ gt_stationary,
+ criterion,
+):
+
+ mask = gt_stationary != -1
+ if mask.any():
+ loss = criterion(pred_stationary, gt_stationary)[mask].mean()
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_stationary.device)[0]
+ return loss
+
+
+
+def full_projected_keypoint_loss(
+ pred_keypoints_2d,
+ gt_keypoints_2d,
+ bbox,
+ weight,
+ criterion,
+):
+
+ scale = bbox[..., 2:] * 200.
+ conf = gt_keypoints_2d[..., -1]
+
+ if (conf > 0).any():
+ loss = torch.mean(
+ weight * (conf * torch.norm(pred_keypoints_2d - gt_keypoints_2d[..., :2], dim=-1)
+ ) / scale, dim=1).mean() * conf.mean()
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_keypoints_2d.device)[0]
+ return loss
+
+
+def weak_projected_keypoint_loss(
+ pred_keypoints_2d,
+ gt_keypoints_2d,
+ weight,
+ criterion,
+):
+
+ conf = gt_keypoints_2d[..., -1]
+ if (conf > 0).any():
+ loss = torch.mean(
+ weight * (conf * torch.norm(pred_keypoints_2d - gt_keypoints_2d[..., :2], dim=-1)
+ ), dim=1).mean() * conf.mean() * 5
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_keypoints_2d.device)[0]
+ return loss
+
+
+def keypoint_3d_loss(
+ pred_keypoints_3d,
+ gt_keypoints_3d,
+ weight,
+ criterion,
+):
+
+ conf = gt_keypoints_3d[..., -1]
+ if (conf > 0).any():
+ if weight.shape[-2] > 17:
+ pred_keypoints_3d[..., -14:] = pred_keypoints_3d[..., -14:] - pred_keypoints_3d[..., -14:].mean(dim=-2, keepdims=True)
+ gt_keypoints_3d[..., -14:] = gt_keypoints_3d[..., -14:] - gt_keypoints_3d[..., -14:].mean(dim=-2, keepdims=True)
+
+ loss = torch.mean(
+ weight * (conf * torch.norm(pred_keypoints_3d - gt_keypoints_3d[..., :3], dim=-1)
+ ), dim=1).mean() * conf.mean()
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_keypoints_3d.device)[0]
+ return loss
+
+
+def vertices_loss(
+ pred_verts,
+ gt_verts,
+ mask,
+ criterion,
+):
+
+ if mask.sum() > 0:
+ # Align
+ pred_verts = pred_verts.view_as(gt_verts)
+ pred_verts = pred_verts - pred_verts.mean(-2, True)
+ gt_verts = gt_verts - gt_verts.mean(-2, True)
+
+ # loss = criterion(pred_verts, gt_verts).mean() * mask.float().mean()
+ # loss = torch.mean(
+ # `(torch.norm(pred_verts - gt_verts, dim=-1)[mask]`
+ # ), dim=1).mean() * mask.float().mean()
+ loss = torch.mean(
+ (torch.norm(pred_verts - gt_verts, p=1, dim=-1)[mask]
+ ), dim=1).mean() * mask.float().mean()
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_verts.device)[0]
+ return loss
+
+
+def smpl_losses(
+ pred_pose,
+ pred_betas,
+ gt_pose,
+ gt_betas,
+ weight,
+ mask,
+ criterion,
+):
+
+ if mask.any().item():
+ loss_regr_pose = torch.mean(
+ weight * torch.square(pred_pose - gt_pose)[mask].mean(-1)
+ ) * mask.float().mean()
+ loss_regr_betas = F.mse_loss(pred_betas, gt_betas, reduction='none')[mask].mean() * mask.float().mean()
+ else:
+ loss_regr_pose = torch.FloatTensor(1).fill_(0.).to(gt_pose.device)[0]
+ loss_regr_betas = torch.FloatTensor(1).fill_(0.).to(gt_pose.device)[0]
+
+ return loss_regr_pose, loss_regr_betas
+
+
+def camera_loss(
+ pred_cam_r,
+ gt_cam_r,
+ cam_angvel,
+ mask,
+ criterion,
+ skip
+):
+ # mask = (gt_cam_r != 0.0).all(dim=-1).all(dim=-1)
+
+ if mask.any() and not skip:
+ # Camera pose loss in 6D representation
+ loss_r = criterion(pred_cam_r, gt_cam_r)[mask].mean()
+
+ # Reconstruct camera angular velocity and compute reconstruction loss
+ pred_R = transforms.rotation_6d_to_matrix(pred_cam_r)
+ cam_angvel_from_R = transforms.matrix_to_rotation_6d(pred_R[:, :-1] @ pred_R[:, 1:].transpose(-1, -2))
+ cam_angvel_from_R = (cam_angvel_from_R - torch.tensor([[[1, 0, 0, 0, 1, 0]]]).to(cam_angvel)) * 30
+ loss_a = criterion(cam_angvel, cam_angvel_from_R)[mask].mean()
+
+ loss = loss_r + loss_a
+ else:
+ loss = torch.FloatTensor(1).fill_(0.).to(gt_cam_r.device)[0]
+
+ return loss
+
+
+def sliding_loss(
+ foot_position,
+ contact_prob,
+):
+ """ Compute foot skate loss when foot is assumed to be on contact with ground
+
+ foot_position: 3D foot (heel and toe) position, torch.Tensor (B, F, 4, 3)
+ contact_prob: contact probability of foot (heel and toe), torch.Tensor (B, F, 4)
+ """
+
+ contact_mask = (contact_prob > 0.5).detach().float()
+ foot_velocity = foot_position[:, 1:] - foot_position[:, :-1]
+ loss = (torch.norm(foot_velocity, dim=-1) * contact_mask[:, 1:]).mean()
+ return loss
diff --git a/lib/core/trainer.py b/lib/core/trainer.py
new file mode 100644
index 0000000000000000000000000000000000000000..90d10d4a4bfec7e47373b2520dd78c4eef9ee2ae
--- /dev/null
+++ b/lib/core/trainer.py
@@ -0,0 +1,341 @@
+# -*- coding: utf-8 -*-
+
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+
+import time
+import torch
+import shutil
+import logging
+import numpy as np
+import os.path as osp
+from progress.bar import Bar
+
+from configs import constants as _C
+from lib.utils import transforms
+from lib.utils.utils import AverageMeter, prepare_batch
+from lib.eval.eval_utils import (
+ compute_accel,
+ compute_error_accel,
+ batch_align_by_pelvis,
+ batch_compute_similarity_transform_torch,
+)
+from lib.models import build_body_model
+
+logger = logging.getLogger(__name__)
+
+class Trainer():
+ def __init__(self,
+ data_loaders,
+ network,
+ optimizer,
+ criterion=None,
+ train_stage='syn',
+ start_epoch=0,
+ checkpoint=None,
+ end_epoch=999,
+ lr_scheduler=None,
+ device=None,
+ writer=None,
+ debug=False,
+ resume=False,
+ logdir='output',
+ performance_type='min',
+ summary_iter=1,
+ ):
+
+ self.train_loader, self.valid_loader = data_loaders
+
+ # Model and optimizer
+ self.network = network
+ self.optimizer = optimizer
+
+ # Training parameters
+ self.train_stage = train_stage
+ self.start_epoch = start_epoch
+ self.end_epoch = end_epoch
+ self.criterion = criterion
+ self.lr_scheduler = lr_scheduler
+ self.device = device
+ self.writer = writer
+ self.debug = debug
+ self.resume = resume
+ self.logdir = logdir
+ self.summary_iter = summary_iter
+
+ self.performance_type = performance_type
+ self.train_global_step = 0
+ self.valid_global_step = 0
+ self.epoch = 0
+ self.best_performance = float('inf') if performance_type == 'min' else -float('inf')
+ self.summary_loss_keys = ['pose']
+
+ self.evaluation_accumulators = dict.fromkeys(
+ ['pred_j3d', 'target_j3d', 'pve'])# 'pred_verts', 'target_verts'])
+
+ self.J_regressor_eval = torch.from_numpy(
+ np.load(_C.BMODEL.JOINTS_REGRESSOR_H36M)
+ )[_C.KEYPOINTS.H36M_TO_J14, :].unsqueeze(0).float().to(device)
+
+ if self.writer is None:
+ from torch.utils.tensorboard import SummaryWriter
+ self.writer = SummaryWriter(log_dir=self.logdir)
+
+ if self.device is None:
+ self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
+
+ if checkpoint is not None:
+ self.load_pretrained(checkpoint)
+
+ def train(self, ):
+ # Single epoch training routine
+
+ losses = AverageMeter()
+ kp_2d_loss = AverageMeter()
+ kp_3d_loss = AverageMeter()
+
+ timer = {
+ 'data': 0,
+ 'forward': 0,
+ 'loss': 0,
+ 'backward': 0,
+ 'batch': 0,
+ }
+ self.network.train()
+ start = time.time()
+ summary_string = ''
+
+ bar = Bar(f'Epoch {self.epoch + 1}/{self.end_epoch}', fill='#', max=len(self.train_loader))
+ for i, batch in enumerate(self.train_loader):
+
+ # <======= Feedforward
+ x, inits, features, kwargs, gt = prepare_batch(batch, self.device, self.train_stage=='stage2')
+ timer['data'] = time.time() - start
+ start = time.time()
+ pred = self.network(x, inits, features, **kwargs)
+ timer['forward'] = time.time() - start
+ start = time.time()
+ # =======>
+
+ # <======= Backprop
+ loss, loss_dict = self.criterion(pred, gt)
+ timer['loss'] = time.time() - start
+ start = time.time()
+
+ # Clip gradients
+ self.optimizer.zero_grad()
+ loss.backward()
+ torch.nn.utils.clip_grad_norm_(self.network.parameters(), 1.0)
+ self.optimizer.step()
+ # =======>
+
+ # <======= Log training info
+ total_loss = loss
+ losses.update(total_loss.item(), x.size(0))
+ kp_2d_loss.update(loss_dict['2d'].item(), x.size(0))
+ kp_3d_loss.update(loss_dict['3d'].item(), x.size(0))
+
+ timer['backward'] = time.time() - start
+ timer['batch'] = timer['data'] + timer['forward'] + timer['loss'] + timer['backward']
+ start = time.time()
+
+ summary_string = f'({i + 1}/{len(self.train_loader)}) | Total: {bar.elapsed_td} ' \
+ f'| loss: {losses.avg:.2f} | 2d: {kp_2d_loss.avg:.2f} ' \
+ f'| 3d: {kp_3d_loss.avg:.2f} '
+
+ for k, v in loss_dict.items():
+ if k in self.summary_loss_keys:
+ summary_string += f' | {k}: {v:.2f}'
+ if (i + 1) % self.summary_iter == 0:
+ self.writer.add_scalar('train_loss/'+k, v, global_step=self.train_global_step)
+
+ if (i + 1) % self.summary_iter == 0:
+ self.writer.add_scalar('train_loss/loss', total_loss.item(), global_step=self.train_global_step)
+
+ self.train_global_step += 1
+ bar.suffix = summary_string
+ bar.next(1)
+
+ if torch.isnan(total_loss):
+ exit('Nan value in loss, exiting!...')
+ # =======>
+
+ logger.info(summary_string)
+ bar.finish()
+
+ def validate(self, ):
+ self.network.eval()
+
+ start = time.time()
+ summary_string = ''
+ bar = Bar('Validation', fill='#', max=len(self.valid_loader))
+
+ if self.evaluation_accumulators is not None:
+ for k,v in self.evaluation_accumulators.items():
+ self.evaluation_accumulators[k] = []
+
+ with torch.no_grad():
+ for i, batch in enumerate(self.valid_loader):
+ x, inits, features, kwargs, gt = prepare_batch(batch, self.device, self.train_stage=='stage2')
+
+ # <======= Feedforward
+ pred = self.network(x, inits, features, **kwargs)
+
+ # 3DPW dataset has groundtruth vertices
+ # NOTE: Following SPIN, we compute PVE against ground truth from Gendered SMPL mesh
+ smpl = build_body_model(self.device, batch_size=len(pred['verts_cam']), gender=batch['gender'][0])
+ gt_output = smpl.get_output(
+ body_pose=transforms.rotation_6d_to_matrix(gt['pose'][0, :, 1:]),
+ global_orient=transforms.rotation_6d_to_matrix(gt['pose'][0, :, :1]),
+ betas=gt['betas'][0],
+ pose2rot=False
+ )
+
+ pred_j3d = torch.matmul(self.J_regressor_eval, pred['verts_cam']).cpu()
+ target_j3d = torch.matmul(self.J_regressor_eval, gt_output.vertices).cpu()
+ pred_verts = pred['verts_cam'].cpu()
+ target_verts = gt_output.vertices.cpu()
+
+ pred_j3d, target_j3d, pred_verts, target_verts = batch_align_by_pelvis(
+ [pred_j3d, target_j3d, pred_verts, target_verts], [2, 3]
+ )
+
+ self.evaluation_accumulators['pred_j3d'].append(pred_j3d.numpy())
+ self.evaluation_accumulators['target_j3d'].append(target_j3d.numpy())
+ pve = np.sqrt(np.sum((target_verts.numpy() - pred_verts.numpy()) ** 2, axis=-1)).mean(-1) * 1e3
+ self.evaluation_accumulators['pve'].append(pve[:, None])
+ # =======>
+
+ batch_time = time.time() - start
+
+ summary_string = f'({i + 1}/{len(self.valid_loader)}) | batch: {batch_time * 10.0:.4}ms | ' \
+ f'Total: {bar.elapsed_td} | ETA: {bar.eta_td:}'
+
+ self.valid_global_step += 1
+ bar.suffix = summary_string
+ bar.next()
+
+ logger.info(summary_string)
+
+ bar.finish()
+
+ def evaluate(self, ):
+ for k, v in self.evaluation_accumulators.items():
+ self.evaluation_accumulators[k] = np.vstack(v)
+
+ pred_j3ds = self.evaluation_accumulators['pred_j3d']
+ target_j3ds = self.evaluation_accumulators['target_j3d']
+
+ pred_j3ds = torch.from_numpy(pred_j3ds).float()
+ target_j3ds = torch.from_numpy(target_j3ds).float()
+
+ print(f'Evaluating on {pred_j3ds.shape[0]} number of poses...')
+ errors = torch.sqrt(((pred_j3ds - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy()
+ S1_hat = batch_compute_similarity_transform_torch(pred_j3ds, target_j3ds)
+ errors_pa = torch.sqrt(((S1_hat - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy()
+
+ m2mm = 1000
+ accel = np.mean(compute_accel(pred_j3ds)) * m2mm
+ accel_err = np.mean(compute_error_accel(joints_pred=pred_j3ds, joints_gt=target_j3ds)) * m2mm
+ mpjpe = np.mean(errors) * m2mm
+ pa_mpjpe = np.mean(errors_pa) * m2mm
+
+ eval_dict = {
+ 'mpjpe': mpjpe,
+ 'pa-mpjpe': pa_mpjpe,
+ 'accel': accel,
+ 'accel_err': accel_err
+ }
+
+ if 'pred_verts' in self.evaluation_accumulators.keys():
+ eval_dict.update({'pve': self.evaluation_accumulators['pve'].mean()})
+
+ log_str = f'Epoch {self.epoch}, '
+ log_str += ' '.join([f'{k.upper()}: {v:.4f},'for k,v in eval_dict.items()])
+ logger.info(log_str)
+
+ for k,v in eval_dict.items():
+ self.writer.add_scalar(f'error/{k}', v, global_step=self.epoch)
+
+ # return (mpjpe + pa_mpjpe) / 2.
+ return pa_mpjpe
+
+ def save_model(self, performance, epoch):
+ save_dict = {
+ 'epoch': epoch,
+ 'model': self.network.state_dict(),
+ 'performance': performance,
+ 'optimizer': self.optimizer.state_dict(),
+ }
+
+ filename = osp.join(self.logdir, 'checkpoint.pth.tar')
+ torch.save(save_dict, filename)
+
+ if self.performance_type == 'min':
+ is_best = performance < self.best_performance
+ else:
+ is_best = performance > self.best_performance
+
+ if is_best:
+ logger.info('Best performance achived, saving it!')
+ self.best_performance = performance
+ shutil.copyfile(filename, osp.join(self.logdir, 'model_best.pth.tar'))
+
+ with open(osp.join(self.logdir, 'best.txt'), 'w') as f:
+ f.write(str(float(performance)))
+
+ def fit(self):
+ for epoch in range(self.start_epoch, self.end_epoch):
+ self.epoch = epoch
+ self.train()
+ self.validate()
+ performance = self.evaluate()
+
+ self.criterion.step()
+ if self.lr_scheduler is not None:
+ self.lr_scheduler.step()
+
+ # log the learning rate
+ for param_group in self.optimizer.param_groups[:2]:
+ print(f'Learning rate {param_group["lr"]}')
+ self.writer.add_scalar('lr', param_group['lr'], global_step=self.epoch)
+
+ logger.info(f'Epoch {epoch+1} performance: {performance:.4f}')
+
+ self.save_model(performance, epoch)
+ self.train_loader.dataset.prepare_video_batch()
+
+ self.writer.close()
+
+ def load_pretrained(self, model_path):
+ if osp.isfile(model_path):
+ checkpoint = torch.load(model_path)
+
+ # network
+ ignore_keys = ['smpl.body_pose', 'smpl.betas', 'smpl.global_orient', 'smpl.J_regressor_extra', 'smpl.J_regressor_eval']
+ ignore_keys2 = [k for k in checkpoint['model'].keys() if 'integrator' in k]
+ ignore_keys.extend(ignore_keys2)
+ model_state_dict = {k: v for k, v in checkpoint['model'].items() if k not in ignore_keys}
+ model_state_dict = {k: v for k, v in model_state_dict.items() if k in self.network.state_dict().keys()}
+ self.network.load_state_dict(model_state_dict, strict=False)
+
+ if self.resume:
+ self.start_epoch = checkpoint['epoch']
+ self.best_performance = checkpoint['performance']
+ self.optimizer.load_state_dict(checkpoint['optimizer'])
+
+ logger.info(f"=> loaded checkpoint '{model_path}' "
+ f"(epoch {self.start_epoch}, performance {self.best_performance})")
+ else:
+ logger.info(f"=> no checkpoint found at '{model_path}'")
\ No newline at end of file
diff --git a/lib/data/__init__.py b/lib/data/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/lib/data/__pycache__/__init__.cpython-39.pyc b/lib/data/__pycache__/__init__.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9a50ecdd3317f879777921e8717d6f0c6a46e68d
Binary files /dev/null and b/lib/data/__pycache__/__init__.cpython-39.pyc differ
diff --git a/lib/data/__pycache__/_dataset.cpython-39.pyc b/lib/data/__pycache__/_dataset.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e6eec1e688837ce128f0d6eded25c07424d45025
Binary files /dev/null and b/lib/data/__pycache__/_dataset.cpython-39.pyc differ
diff --git a/lib/data/_dataset.py b/lib/data/_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..17889ecacbf4c5bf601bb4c058d51e8b4f603745
--- /dev/null
+++ b/lib/data/_dataset.py
@@ -0,0 +1,77 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import numpy as np
+from skimage.util.shape import view_as_windows
+
+from configs import constants as _C
+from .utils.normalizer import Normalizer
+from ..utils.imutils import transform
+
+class BaseDataset(torch.utils.data.Dataset):
+ def __init__(self, cfg, training=True):
+ super(BaseDataset, self).__init__()
+ self.epoch = 0
+ self.training = training
+ self.n_joints = _C.KEYPOINTS.NUM_JOINTS
+ self.n_frames = cfg.DATASET.SEQLEN + 1
+ self.keypoints_normalizer = Normalizer(cfg)
+
+ def prepare_video_batch(self):
+ r = self.epoch % 4
+
+ self.video_indices = []
+ vid_name = self.labels['vid']
+ if isinstance(vid_name, torch.Tensor): vid_name = vid_name.numpy()
+ video_names_unique, group = np.unique(
+ vid_name, return_index=True)
+ perm = np.argsort(group)
+ group_perm = group[perm]
+ indices = np.split(
+ np.arange(0, self.labels['vid'].shape[0]), group_perm[1:]
+ )
+ for idx in range(len(video_names_unique)):
+ indexes = indices[idx]
+ if indexes.shape[0] < self.n_frames: continue
+ chunks = view_as_windows(
+ indexes, (self.n_frames), step=self.n_frames // 4
+ )
+ start_finish = chunks[r::4, (0, -1)].tolist()
+ self.video_indices += start_finish
+
+ self.epoch += 1
+
+ def __len__(self):
+ if self.training:
+ return len(self.video_indices)
+ else:
+ return len(self.labels['kp2d'])
+
+ def __getitem__(self, index):
+ return self.get_single_sequence(index)
+
+ def get_single_sequence(self, index):
+ NotImplementedError('get_single_sequence is not implemented')
+
+ def get_naive_intrinsics(self, res):
+ # Assume 45 degree FOV
+ img_w, img_h = res
+ self.focal_length = (img_w * img_w + img_h * img_h) ** 0.5
+ self.cam_intrinsics = torch.eye(3).repeat(1, 1, 1).float()
+ self.cam_intrinsics[:, 0, 0] = self.focal_length
+ self.cam_intrinsics[:, 1, 1] = self.focal_length
+ self.cam_intrinsics[:, 0, 2] = img_w/2.
+ self.cam_intrinsics[:, 1, 2] = img_h/2.
+
+ def j2d_processing(self, kp, bbox):
+ center = bbox[..., :2]
+ scale = bbox[..., -1:]
+ nparts = kp.shape[0]
+ for i in range(nparts):
+ kp[i, 0:2] = transform(kp[i, 0:2] + 1, center, scale,
+ [224, 224])
+ kp[:, :2] = 2. * kp[:, :2] / 224 - 1.
+ kp = kp.astype('float32')
+ return kp
\ No newline at end of file
diff --git a/lib/data/dataloader.py b/lib/data/dataloader.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ff536d354d01bc2f1d5cb0f3531d7daeacd7bc1
--- /dev/null
+++ b/lib/data/dataloader.py
@@ -0,0 +1,46 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+
+from .datasets import EvalDataset, DataFactory
+from ..utils.data_utils import make_collate_fn
+
+
+def setup_eval_dataloader(cfg, data, split='test', backbone=None):
+ if backbone is None:
+ backbone = cfg.MODEL.BACKBONE
+
+ dataset = EvalDataset(cfg, data, split, backbone)
+ dloader = torch.utils.data.DataLoader(
+ dataset,
+ batch_size=1,
+ num_workers=0,
+ shuffle=False,
+ pin_memory=True,
+ collate_fn=make_collate_fn()
+ )
+ return dloader
+
+
+def setup_train_dataloader(cfg, ):
+ n_workers = 0 if cfg.DEBUG else cfg.NUM_WORKERS
+
+ train_dataset = DataFactory(cfg, cfg.TRAIN.STAGE)
+ dloader = torch.utils.data.DataLoader(
+ train_dataset,
+ batch_size=cfg.TRAIN.BATCH_SIZE,
+ num_workers=n_workers,
+ shuffle=True,
+ pin_memory=True,
+ collate_fn=make_collate_fn()
+ )
+ return dloader
+
+
+def setup_dloaders(cfg, dset='3dpw', split='val'):
+ test_dloader = setup_eval_dataloader(cfg, dset, split, cfg.MODEL.BACKBONE)
+ train_dloader = setup_train_dataloader(cfg)
+
+ return train_dloader, test_dloader
\ No newline at end of file
diff --git a/lib/data/datasets/__init__.py b/lib/data/datasets/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..20c4b909be3802c5a99b87be26487d4ecb33a499
--- /dev/null
+++ b/lib/data/datasets/__init__.py
@@ -0,0 +1,3 @@
+from .dataset_eval import EvalDataset
+from .dataset_custom import CustomDataset
+from .mixed_dataset import DataFactory
\ No newline at end of file
diff --git a/lib/data/datasets/__pycache__/__init__.cpython-39.pyc b/lib/data/datasets/__pycache__/__init__.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1ebff022afe5a20e7a69e827fe14a7a503ac7434
Binary files /dev/null and b/lib/data/datasets/__pycache__/__init__.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/amass.cpython-39.pyc b/lib/data/datasets/__pycache__/amass.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cb96f2f3ee07b0c663cd09f34b2c10b06651c514
Binary files /dev/null and b/lib/data/datasets/__pycache__/amass.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/bedlam.cpython-39.pyc b/lib/data/datasets/__pycache__/bedlam.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c83e0fae3c4c94927cd2f3eb1c26cdc87720700f
Binary files /dev/null and b/lib/data/datasets/__pycache__/bedlam.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/dataset2d.cpython-39.pyc b/lib/data/datasets/__pycache__/dataset2d.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..61a27240a9b9ae75d82cd58289162ab1fa308b62
Binary files /dev/null and b/lib/data/datasets/__pycache__/dataset2d.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/dataset3d.cpython-39.pyc b/lib/data/datasets/__pycache__/dataset3d.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7900ba39389db4bb73e550b2b803d6f7a490e8d0
Binary files /dev/null and b/lib/data/datasets/__pycache__/dataset3d.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/dataset_custom.cpython-39.pyc b/lib/data/datasets/__pycache__/dataset_custom.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f5dad3c63cb490577cc37e517db84a51e0d4fb6a
Binary files /dev/null and b/lib/data/datasets/__pycache__/dataset_custom.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/dataset_eval.cpython-39.pyc b/lib/data/datasets/__pycache__/dataset_eval.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..3654858aa591072f5b712193bcc7a448579881be
Binary files /dev/null and b/lib/data/datasets/__pycache__/dataset_eval.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/mixed_dataset.cpython-39.pyc b/lib/data/datasets/__pycache__/mixed_dataset.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..de4a4c9cc8900aa6a027fbe161198f8c2e21fc5a
Binary files /dev/null and b/lib/data/datasets/__pycache__/mixed_dataset.cpython-39.pyc differ
diff --git a/lib/data/datasets/__pycache__/videos.cpython-39.pyc b/lib/data/datasets/__pycache__/videos.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..f9c7b3e16f22dd028aad45cf03b716e145506643
Binary files /dev/null and b/lib/data/datasets/__pycache__/videos.cpython-39.pyc differ
diff --git a/lib/data/datasets/amass.py b/lib/data/datasets/amass.py
new file mode 100644
index 0000000000000000000000000000000000000000..e05cef153787bc17d143272bb7ad8c077d03902b
--- /dev/null
+++ b/lib/data/datasets/amass.py
@@ -0,0 +1,173 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import joblib
+from lib.utils import transforms
+
+from configs import constants as _C
+
+from ..utils.augmentor import *
+from .._dataset import BaseDataset
+from ...models import build_body_model
+from ...utils import data_utils as d_utils
+from ...utils.kp_utils import root_centering
+
+
+
+def compute_contact_label(feet, thr=1e-2, alpha=5):
+ vel = torch.zeros_like(feet[..., 0])
+ label = torch.zeros_like(feet[..., 0])
+
+ vel[1:-1] = (feet[2:] - feet[:-2]).norm(dim=-1) / 2.0
+ vel[0] = vel[1].clone()
+ vel[-1] = vel[-2].clone()
+
+ label = 1 / (1 + torch.exp(alpha * (thr ** -1) * (vel - thr)))
+ return label
+
+
+class AMASSDataset(BaseDataset):
+ def __init__(self, cfg):
+ label_pth = _C.PATHS.AMASS_LABEL
+ super(AMASSDataset, self).__init__(cfg, training=True)
+
+ self.supervise_pose = cfg.TRAIN.STAGE == 'stage1'
+ self.labels = joblib.load(label_pth)
+ self.SequenceAugmentor = SequenceAugmentor(cfg.DATASET.SEQLEN + 1)
+
+ # Load augmentators
+ self.VideoAugmentor = VideoAugmentor(cfg)
+ self.SMPLAugmentor = SMPLAugmentor(cfg)
+ self.d_img_feature = _C.IMG_FEAT_DIM[cfg.MODEL.BACKBONE]
+
+ self.n_frames = int(cfg.DATASET.SEQLEN * self.SequenceAugmentor.l_factor) + 1
+ self.smpl = build_body_model('cpu', self.n_frames)
+ self.prepare_video_batch()
+
+ # Naive assumption of image intrinsics
+ self.img_w, self.img_h = 1000, 1000
+ self.get_naive_intrinsics((self.img_w, self.img_h))
+
+ self.CameraAugmentor = CameraAugmentor(cfg.DATASET.SEQLEN + 1, self.img_w, self.img_h, self.focal_length)
+
+
+ @property
+ def __name__(self, ):
+ return 'AMASS'
+
+ def get_input(self, target):
+ gt_kp3d = target['kp3d']
+ inpt_kp3d = self.VideoAugmentor(gt_kp3d[:, :self.n_joints, :-1].clone())
+ kp2d = perspective_projection(inpt_kp3d, self.cam_intrinsics)
+ mask = self.VideoAugmentor.get_mask()
+ kp2d, bbox = self.keypoints_normalizer(kp2d, target['res'], self.cam_intrinsics, 224, 224)
+
+ target['bbox'] = bbox[1:]
+ target['kp2d'] = kp2d
+ target['mask'] = mask[1:]
+ target['features'] = torch.zeros((self.SMPLAugmentor.n_frames, self.d_img_feature)).float()
+ return target
+
+ def get_groundtruth(self, target):
+ # GT 1. Joints
+ gt_kp3d = target['kp3d']
+ gt_kp2d = perspective_projection(gt_kp3d, self.cam_intrinsics)
+ target['kp3d'] = torch.cat((gt_kp3d, torch.ones_like(gt_kp3d[..., :1]) * float(self.supervise_pose)), dim=-1)
+ target['full_kp2d'] = torch.cat((gt_kp2d, torch.ones_like(gt_kp2d[..., :1]) * float(self.supervise_pose)), dim=-1)[1:]
+ target['weak_kp2d'] = torch.zeros_like(target['full_kp2d'])
+ target['init_kp3d'] = root_centering(gt_kp3d[:1, :self.n_joints].clone()).reshape(1, -1)
+ target['verts'] = torch.zeros((self.SMPLAugmentor.n_frames, 6890, 3)).float()
+
+ # GT 2. Root pose
+ vel_world = (target['transl'][1:] - target['transl'][:-1])
+ pose_root = target['pose_root'].clone()
+ vel_root = (pose_root[:-1].transpose(-1, -2) @ vel_world.unsqueeze(-1)).squeeze(-1)
+ target['vel_root'] = vel_root.clone()
+ target['pose_root'] = transforms.matrix_to_rotation_6d(pose_root)
+ target['init_root'] = target['pose_root'][:1].clone()
+
+ # GT 3. Foot contact
+ contact = compute_contact_label(target['feet'])
+ if 'tread' in target['vid']:
+ target['contact'] = torch.ones_like(contact) * (-1)
+ else:
+ target['contact'] = contact
+
+ return target
+
+ def forward_smpl(self, target):
+ output = self.smpl.get_output(
+ body_pose=torch.cat((target['init_pose'][:, 1:], target['pose'][1:, 1:])),
+ global_orient=torch.cat((target['init_pose'][:, :1], target['pose'][1:, :1])),
+ betas=target['betas'],
+ pose2rot=False)
+
+ target['transl'] = target['transl'] - output.offset
+ target['transl'] = target['transl'] - target['transl'][0]
+ target['kp3d'] = output.joints
+ target['feet'] = output.feet[1:] + target['transl'][1:].unsqueeze(-2)
+
+ return target
+
+ def augment_data(self, target):
+ # Augmentation 1. SMPL params augmentation
+ target = self.SMPLAugmentor(target)
+
+ # Augmentation 2. Sequence speed augmentation
+ target = self.SequenceAugmentor(target)
+
+ # Get world-coordinate SMPL
+ target = self.forward_smpl(target)
+
+ # Augmentation 3. Virtual camera generation
+ target = self.CameraAugmentor(target)
+
+ return target
+
+ def load_amass(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # Load AMASS labels
+ pose = torch.from_numpy(self.labels['pose'][start_index:end_index+1].copy())
+ pose = transforms.axis_angle_to_matrix(pose.reshape(-1, 24, 3))
+ transl = torch.from_numpy(self.labels['transl'][start_index:end_index+1].copy())
+ betas = torch.from_numpy(self.labels['betas'][start_index:end_index+1].copy())
+
+ # Stack GT
+ target.update({'vid': self.labels['vid'][start_index],
+ 'pose': pose,
+ 'transl': transl,
+ 'betas': betas})
+
+ return target
+
+ def get_single_sequence(self, index):
+ target = {'res': torch.tensor([self.img_w, self.img_h]).float(),
+ 'cam_intrinsics': self.cam_intrinsics.clone(),
+ 'has_full_screen': torch.tensor(True),
+ 'has_smpl': torch.tensor(self.supervise_pose),
+ 'has_traj': torch.tensor(True),
+ 'has_verts': torch.tensor(False),}
+
+ target = self.load_amass(index, target)
+ target = self.augment_data(target)
+ target = self.get_groundtruth(target)
+ target = self.get_input(target)
+
+ target = d_utils.prepare_keypoints_data(target)
+ target = d_utils.prepare_smpl_data(target)
+
+ return target
+
+
+def perspective_projection(points, cam_intrinsics, rotation=None, translation=None):
+ K = cam_intrinsics
+ if rotation is not None:
+ points = torch.matmul(rotation, points.transpose(1, 2)).transpose(1, 2)
+ if translation is not None:
+ points = points + translation.unsqueeze(1)
+ projected_points = points / points[:, :, -1].unsqueeze(-1)
+ projected_points = torch.einsum('bij,bkj->bki', K, projected_points.float())
+ return projected_points[:, :, :-1]
\ No newline at end of file
diff --git a/lib/data/datasets/bedlam.py b/lib/data/datasets/bedlam.py
new file mode 100644
index 0000000000000000000000000000000000000000..9aba9757388218c16b641aa2ad9fcaa00fc37900
--- /dev/null
+++ b/lib/data/datasets/bedlam.py
@@ -0,0 +1,165 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import joblib
+from lib.utils import transforms
+
+from configs import constants as _C
+
+from .amass import compute_contact_label, perspective_projection
+from ..utils.augmentor import *
+from .._dataset import BaseDataset
+from ...models import build_body_model
+from ...utils import data_utils as d_utils
+from ...utils.kp_utils import root_centering
+
+class BEDLAMDataset(BaseDataset):
+ def __init__(self, cfg):
+ label_pth = _C.PATHS.BEDLAM_LABEL.replace('backbone', cfg.MODEL.BACKBONE)
+ super(BEDLAMDataset, self).__init__(cfg, training=True)
+
+ self.labels = joblib.load(label_pth)
+
+ self.VideoAugmentor = VideoAugmentor(cfg)
+ self.SMPLAugmentor = SMPLAugmentor(cfg, False)
+
+ self.smpl = build_body_model('cpu', self.n_frames)
+ self.prepare_video_batch()
+
+ @property
+ def __name__(self, ):
+ return 'BEDLAM'
+
+ def get_inputs(self, index, target, vis_thr=0.6):
+ start_index, end_index = self.video_indices[index]
+
+ bbox = self.labels['bbox'][start_index:end_index+1].clone()
+ bbox[:, 2] = bbox[:, 2] / 200
+
+ gt_kp3d = target['kp3d']
+ inpt_kp3d = self.VideoAugmentor(gt_kp3d[:, :self.n_joints, :-1].clone())
+ # kp2d = perspective_projection(inpt_kp3d, target['K'])
+ kp2d = perspective_projection(inpt_kp3d, self.cam_intrinsics)
+ mask = self.VideoAugmentor.get_mask()
+ # kp2d, bbox = self.keypoints_normalizer(kp2d, target['res'], self.cam_intrinsics, 224, 224, bbox)
+ kp2d, bbox = self.keypoints_normalizer(kp2d, target['res'], self.cam_intrinsics, 224, 224)
+
+ target['bbox'] = bbox[1:]
+ target['kp2d'] = kp2d
+ target['mask'] = mask[1:]
+
+ # Image features
+ target['features'] = self.labels['features'][start_index+1:end_index+1].clone()
+
+ return target
+
+ def get_groundtruth(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # GT 1. Joints
+ gt_kp3d = target['kp3d']
+ # gt_kp2d = perspective_projection(gt_kp3d, target['K'])
+ gt_kp2d = perspective_projection(gt_kp3d, self.cam_intrinsics)
+ target['kp3d'] = torch.cat((gt_kp3d, torch.ones_like(gt_kp3d[..., :1])), dim=-1)
+ # target['full_kp2d'] = torch.cat((gt_kp2d, torch.zeros_like(gt_kp2d[..., :1])), dim=-1)[1:]
+ target['full_kp2d'] = torch.cat((gt_kp2d, torch.ones_like(gt_kp2d[..., :1])), dim=-1)[1:]
+ target['weak_kp2d'] = torch.zeros_like(target['full_kp2d'])
+ target['init_kp3d'] = root_centering(gt_kp3d[:1, :self.n_joints].clone()).reshape(1, -1)
+
+ # GT 2. Root pose
+ w_transl = self.labels['w_trans'][start_index:end_index+1]
+ pose_root = transforms.axis_angle_to_matrix(self.labels['root'][start_index:end_index+1])
+ vel_world = (w_transl[1:] - w_transl[:-1])
+ vel_root = (pose_root[:-1].transpose(-1, -2) @ vel_world.unsqueeze(-1)).squeeze(-1)
+ target['vel_root'] = vel_root.clone()
+ target['pose_root'] = transforms.matrix_to_rotation_6d(pose_root)
+ target['init_root'] = target['pose_root'][:1].clone()
+
+ return target
+
+ def forward_smpl(self, target):
+ output = self.smpl.get_output(
+ body_pose=torch.cat((target['init_pose'][:, 1:], target['pose'][1:, 1:])),
+ global_orient=torch.cat((target['init_pose'][:, :1], target['pose'][1:, :1])),
+ betas=target['betas'],
+ transl=target['transl'],
+ pose2rot=False)
+
+ target['kp3d'] = output.joints + output.offset.unsqueeze(1)
+ target['feet'] = output.feet[1:] + target['transl'][1:].unsqueeze(-2)
+ target['verts'] = output.vertices[1:, ].clone()
+
+ return target
+
+ def augment_data(self, target):
+ # Augmentation 1. SMPL params augmentation
+ target = self.SMPLAugmentor(target)
+
+ # Get world-coordinate SMPL
+ target = self.forward_smpl(target)
+
+ return target
+
+ def load_camera(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # Get camera info
+ extrinsics = self.labels['extrinsics'][start_index:end_index+1].clone()
+ R = extrinsics[:, :3, :3]
+ T = extrinsics[:, :3, -1]
+ K = self.labels['intrinsics'][start_index:end_index+1].clone()
+ width, height = K[0, 0, 2] * 2, K[0, 1, 2] * 2
+ target['R'] = R
+ target['res'] = torch.tensor([width, height]).float()
+
+ # Compute angular velocity
+ cam_angvel = transforms.matrix_to_rotation_6d(R[:-1] @ R[1:].transpose(-1, -2))
+ cam_angvel = cam_angvel - torch.tensor([[1, 0, 0, 0, 1, 0]]).to(cam_angvel) # Normalize
+ target['cam_angvel'] = cam_angvel * 3e1 # BEDLAM is 30-fps
+
+ target['K'] = K # Use GT camera intrinsics for projecting keypoints
+ self.get_naive_intrinsics(target['res'])
+ target['cam_intrinsics'] = self.cam_intrinsics
+
+ return target
+
+ def load_params(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # Load AMASS labels
+ pose = self.labels['pose'][start_index:end_index+1].clone()
+ pose = transforms.axis_angle_to_matrix(pose.reshape(-1, 24, 3))
+ transl = self.labels['c_trans'][start_index:end_index+1].clone()
+ betas = self.labels['betas'][start_index:end_index+1, :10].clone()
+
+ # Stack GT
+ target.update({'vid': self.labels['vid'][start_index].clone(),
+ 'pose': pose,
+ 'transl': transl,
+ 'betas': betas})
+
+ return target
+
+
+ def get_single_sequence(self, index):
+ target = {'has_full_screen': torch.tensor(True),
+ 'has_smpl': torch.tensor(True),
+ 'has_traj': torch.tensor(False),
+ 'has_verts': torch.tensor(True),
+
+ # Null contact label
+ 'contact': torch.ones((self.n_frames - 1, 4)) * (-1),
+ }
+
+ target = self.load_params(index, target)
+ target = self.load_camera(index, target)
+ target = self.augment_data(target)
+ target = self.get_groundtruth(index, target)
+ target = self.get_inputs(index, target)
+
+ target = d_utils.prepare_keypoints_data(target)
+ target = d_utils.prepare_smpl_data(target)
+
+ return target
\ No newline at end of file
diff --git a/lib/data/datasets/dataset2d.py b/lib/data/datasets/dataset2d.py
new file mode 100644
index 0000000000000000000000000000000000000000..d2b06f8b5d61c951922f55ca17e4982cc3a0ba07
--- /dev/null
+++ b/lib/data/datasets/dataset2d.py
@@ -0,0 +1,140 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import joblib
+
+from .._dataset import BaseDataset
+from ..utils.augmentor import *
+from ...utils import data_utils as d_utils
+from ...utils import transforms
+from ...models import build_body_model
+from ...utils.kp_utils import convert_kps, root_centering
+
+
+class Dataset2D(BaseDataset):
+ def __init__(self, cfg, fname, training):
+ super(Dataset2D, self).__init__(cfg, training)
+
+ self.epoch = 0
+ self.n_frames = cfg.DATASET.SEQLEN + 1
+ self.labels = joblib.load(fname)
+
+ if self.training:
+ self.prepare_video_batch()
+
+ self.smpl = build_body_model('cpu', self.n_frames)
+ self.SMPLAugmentor = SMPLAugmentor(cfg, False)
+
+ def __getitem__(self, index):
+ return self.get_single_sequence(index)
+
+ def get_inputs(self, index, target, vis_thr=0.6):
+ start_index, end_index = self.video_indices[index]
+
+ # 2D keypoints detection
+ kp2d = self.labels['kp2d'][start_index:end_index+1][..., :2].clone()
+ kp2d, bbox = self.keypoints_normalizer(kp2d, target['res'], target['cam_intrinsics'], 224, 224, target['bbox'])
+ target['bbox'] = bbox[1:]
+ target['kp2d'] = kp2d
+
+ # Detection mask
+ target['mask'] = ~self.labels['joints2D'][start_index+1:end_index+1][..., -1].clone().bool()
+
+ # Image features
+ target['features'] = self.labels['features'][start_index+1:end_index+1].clone()
+
+ return target
+
+ def get_labels(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # SMPL parameters
+ # NOTE: We use NeuralAnnot labels for Human36m and MPII3D only for the 0th frame input.
+ # We do not supervise the network on SMPL parameters.
+ target['pose'] = transforms.axis_angle_to_matrix(
+ self.labels['pose'][start_index:end_index+1].clone().reshape(-1, 24, 3))
+ target['betas'] = self.labels['betas'][start_index:end_index+1].clone() # No t
+
+ # Apply SMPL augmentor (y-axis rotation and initial frame noise)
+ target = self.SMPLAugmentor(target)
+
+ # 2D keypoints
+ kp2d = self.labels['kp2d'][start_index:end_index+1].clone().float()[..., :2]
+ gt_kp2d = torch.zeros((self.n_frames - 1, 31, 2))
+ gt_kp2d[:, :17] = kp2d[1:].clone()
+
+ # Set 0 confidence to the masked keypoints
+ mask = torch.zeros((self.n_frames - 1, 31))
+ mask[:, :17] = self.labels['joints2D'][start_index+1:end_index+1][..., -1].clone()
+ mask = torch.logical_and(gt_kp2d.mean(-1) != 0, mask)
+ gt_kp2d = torch.cat((gt_kp2d, mask.float().unsqueeze(-1)), dim=-1)
+
+ _gt_kp2d = gt_kp2d.clone()
+ for idx in range(len(_gt_kp2d)):
+ _gt_kp2d[idx][..., :2] = torch.from_numpy(
+ self.j2d_processing(gt_kp2d[idx][..., :2].numpy().copy(),
+ target['bbox'][idx].numpy().copy()))
+
+ target['weak_kp2d'] = _gt_kp2d.clone()
+ target['full_kp2d'] = torch.zeros_like(gt_kp2d)
+ target['kp3d'] = torch.zeros((kp2d.shape[0], 31, 4))
+
+ # No SMPL vertices available
+ target['verts'] = torch.zeros((self.n_frames - 1, 6890, 3)).float()
+ return target
+
+ def get_init_frame(self, target):
+ # Prepare initial frame
+ output = self.smpl.get_output(
+ body_pose=target['init_pose'][:, 1:],
+ global_orient=target['init_pose'][:, :1],
+ betas=target['betas'][:1],
+ pose2rot=False
+ )
+ target['init_kp3d'] = root_centering(output.joints[:1, :self.n_joints]).reshape(1, -1)
+
+ return target
+
+ def get_single_sequence(self, index):
+ # Camera parameters
+ res = (224.0, 224.0)
+ bbox = torch.tensor([112.0, 112.0, 1.12])
+ res = torch.tensor(res)
+ self.get_naive_intrinsics(res)
+ bbox = bbox.repeat(self.n_frames, 1)
+
+ # Universal target
+ target = {'has_full_screen': torch.tensor(False),
+ 'has_smpl': torch.tensor(self.has_smpl),
+ 'has_traj': torch.tensor(self.has_traj),
+ 'has_verts': torch.tensor(False),
+ 'transl': torch.zeros((self.n_frames, 3)),
+
+ # Camera parameters and bbox
+ 'res': res,
+ 'cam_intrinsics': self.cam_intrinsics,
+ 'bbox': bbox,
+
+ # Null camera motion
+ 'R': torch.eye(3).repeat(self.n_frames, 1, 1),
+ 'cam_angvel': torch.zeros((self.n_frames - 1, 6)),
+
+ # Null root orientation and velocity
+ 'pose_root': torch.zeros((self.n_frames, 6)),
+ 'vel_root': torch.zeros((self.n_frames - 1, 3)),
+ 'init_root': torch.zeros((1, 6)),
+
+ # Null contact label
+ 'contact': torch.ones((self.n_frames - 1, 4)) * (-1)
+ }
+
+ self.get_inputs(index, target)
+ self.get_labels(index, target)
+ self.get_init_frame(target)
+
+ target = d_utils.prepare_keypoints_data(target)
+ target = d_utils.prepare_smpl_data(target)
+
+ return target
\ No newline at end of file
diff --git a/lib/data/datasets/dataset3d.py b/lib/data/datasets/dataset3d.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed8e3c122d1df5f96a9c0b3695da9927460a2ee0
--- /dev/null
+++ b/lib/data/datasets/dataset3d.py
@@ -0,0 +1,172 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import joblib
+import numpy as np
+
+from .._dataset import BaseDataset
+from ..utils.augmentor import *
+from ...utils import data_utils as d_utils
+from ...utils import transforms
+from ...models import build_body_model
+from ...utils.kp_utils import convert_kps, root_centering
+
+
+class Dataset3D(BaseDataset):
+ def __init__(self, cfg, fname, training):
+ super(Dataset3D, self).__init__(cfg, training)
+
+ self.epoch = 0
+ self.labels = joblib.load(fname)
+ self.n_frames = cfg.DATASET.SEQLEN + 1
+
+ if self.training:
+ self.prepare_video_batch()
+
+ self.smpl = build_body_model('cpu', self.n_frames)
+ self.SMPLAugmentor = SMPLAugmentor(cfg, False)
+ self.VideoAugmentor = VideoAugmentor(cfg)
+
+ def __getitem__(self, index):
+ return self.get_single_sequence(index)
+
+ def get_inputs(self, index, target, vis_thr=0.6):
+ start_index, end_index = self.video_indices[index]
+
+ # 2D keypoints detection
+ kp2d = self.labels['kp2d'][start_index:end_index+1][..., :2].clone()
+ bbox = self.labels['bbox'][start_index:end_index+1][..., [0, 1, -1]].clone()
+ bbox[:, 2] = bbox[:, 2] / 200
+ kp2d, bbox = self.keypoints_normalizer(kp2d, target['res'], self.cam_intrinsics, 224, 224, bbox)
+
+ target['bbox'] = bbox[1:]
+ target['kp2d'] = kp2d
+ target['mask'] = self.labels['kp2d'][start_index+1:end_index+1][..., -1] < vis_thr
+
+ # Image features
+ target['features'] = self.labels['features'][start_index+1:end_index+1].clone()
+
+ return target
+
+ def get_labels(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # SMPL parameters
+ # NOTE: We use NeuralAnnot labels for Human36m and MPII3D only for the 0th frame input.
+ # We do not supervise the network on SMPL parameters.
+ target['pose'] = transforms.axis_angle_to_matrix(
+ self.labels['pose'][start_index:end_index+1].clone().reshape(-1, 24, 3))
+ target['betas'] = self.labels['betas'][start_index:end_index+1].clone() # No t
+
+ # Apply SMPL augmentor (y-axis rotation and initial frame noise)
+ target = self.SMPLAugmentor(target)
+
+ # 3D and 2D keypoints
+ if self.__name__ == 'ThreeDPW': # 3DPW has SMPL labels
+ gt_kp3d = self.labels['joints3D'][start_index:end_index+1].clone()
+ gt_kp2d = self.labels['joints2D'][start_index+1:end_index+1, ..., :2].clone()
+ gt_kp3d = root_centering(gt_kp3d.clone())
+
+ else: # Human36m and MPII do not have SMPL labels
+ gt_kp3d = torch.zeros((self.n_frames, self.n_joints + 14, 3))
+ gt_kp3d[:, self.n_joints:] = convert_kps(self.labels['joints3D'][start_index:end_index+1], 'spin', 'common')
+ gt_kp2d = torch.zeros((self.n_frames - 1, self.n_joints + 14, 2))
+ gt_kp2d[:, self.n_joints:] = convert_kps(self.labels['joints2D'][start_index+1:end_index+1, ..., :2], 'spin', 'common')
+
+ conf = self.mask.repeat(self.n_frames, 1).unsqueeze(-1)
+ gt_kp2d = torch.cat((gt_kp2d, conf[1:]), dim=-1)
+ gt_kp3d = torch.cat((gt_kp3d, conf), dim=-1)
+ target['kp3d'] = gt_kp3d
+ target['full_kp2d'] = gt_kp2d
+ target['weak_kp2d'] = torch.zeros_like(gt_kp2d)
+
+ if self.__name__ != 'ThreeDPW': # 3DPW does not contain world-coordinate motion
+ # Foot ground contact labels for Human36M and MPII3D
+ target['contact'] = self.labels['stationaries'][start_index+1:end_index+1].clone()
+ else:
+ # No foot ground contact label available for 3DPW
+ target['contact'] = torch.ones((self.n_frames - 1, 4)) * (-1)
+
+ if self.has_verts:
+ # SMPL vertices available for 3DPW
+ with torch.no_grad():
+ start_index, end_index = self.video_indices[index]
+ gender = self.labels['gender'][start_index].item()
+ output = self.smpl_gender[gender](
+ body_pose=target['pose'][1:, 1:],
+ global_orient=target['pose'][1:, :1],
+ betas=target['betas'][1:],
+ pose2rot=False,
+ )
+ target['verts'] = output.vertices.clone()
+ else:
+ # No SMPL vertices available
+ target['verts'] = torch.zeros((self.n_frames - 1, 6890, 3)).float()
+
+ return target
+
+ def get_init_frame(self, target):
+ # Prepare initial frame
+ output = self.smpl.get_output(
+ body_pose=target['init_pose'][:, 1:],
+ global_orient=target['init_pose'][:, :1],
+ betas=target['betas'][:1],
+ pose2rot=False
+ )
+ target['init_kp3d'] = root_centering(output.joints[:1, :self.n_joints]).reshape(1, -1)
+
+ return target
+
+ def get_camera_info(self, index, target):
+ start_index, end_index = self.video_indices[index]
+
+ # Intrinsics
+ target['res'] = self.labels['res'][start_index:end_index+1][0].clone()
+ self.get_naive_intrinsics(target['res'])
+ target['cam_intrinsics'] = self.cam_intrinsics.clone()
+
+ # Extrinsics pose
+ R = self.labels['cam_poses'][start_index:end_index+1, :3, :3].clone().float()
+ yaw = transforms.axis_angle_to_matrix(torch.tensor([[0, 2 * np.pi * np.random.uniform(), 0]])).float()
+ if self.__name__ == 'Human36M':
+ # Map Z-up to Y-down coordinate
+ zup2ydown = transforms.axis_angle_to_matrix(torch.tensor([[-np.pi/2, 0, 0]])).float()
+ zup2ydown = torch.matmul(yaw, zup2ydown)
+ R = torch.matmul(R, zup2ydown)
+ elif self.__name__ == 'MPII3D':
+ # Map Y-up to Y-down coordinate
+ yup2ydown = transforms.axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float()
+ yup2ydown = torch.matmul(yaw, yup2ydown)
+ R = torch.matmul(R, yup2ydown)
+
+ return target
+
+ def get_single_sequence(self, index):
+ # Universal target
+ target = {'has_full_screen': torch.tensor(True),
+ 'has_smpl': torch.tensor(self.has_smpl),
+ 'has_traj': torch.tensor(self.has_traj),
+ 'has_verts': torch.tensor(self.has_verts),
+ 'transl': torch.zeros((self.n_frames, 3)),
+
+ # Null camera motion
+ 'R': torch.eye(3).repeat(self.n_frames, 1, 1),
+ 'cam_angvel': torch.zeros((self.n_frames - 1, 6)),
+
+ # Null root orientation and velocity
+ 'pose_root': torch.zeros((self.n_frames, 6)),
+ 'vel_root': torch.zeros((self.n_frames - 1, 3)),
+ 'init_root': torch.zeros((1, 6)),
+ }
+
+ self.get_camera_info(index, target)
+ self.get_inputs(index, target)
+ self.get_labels(index, target)
+ self.get_init_frame(target)
+
+ target = d_utils.prepare_keypoints_data(target)
+ target = d_utils.prepare_smpl_data(target)
+
+ return target
\ No newline at end of file
diff --git a/lib/data/datasets/dataset_custom.py b/lib/data/datasets/dataset_custom.py
new file mode 100644
index 0000000000000000000000000000000000000000..dd27140050725aa715a42f411dc270c12c13723d
--- /dev/null
+++ b/lib/data/datasets/dataset_custom.py
@@ -0,0 +1,115 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+
+from ..utils.normalizer import Normalizer
+from ...models import build_body_model
+from ...utils import transforms
+from ...utils.kp_utils import root_centering
+from ...utils.imutils import compute_cam_intrinsics
+
+KEYPOINTS_THR = 0.3
+
+def convert_dpvo_to_cam_angvel(traj, fps):
+ """Function to convert DPVO trajectory output to camera angular velocity"""
+
+ # 0 ~ 3: translation, 3 ~ 7: Quaternion
+ quat = traj[:, 3:]
+
+ # Convert (x,y,z,q) to (q,x,y,z)
+ quat = quat[:, [3, 0, 1, 2]]
+
+ # Quat is camera to world transformation. Convert it to world to camera
+ world2cam = transforms.quaternion_to_matrix(torch.from_numpy(quat)).float()
+ R = world2cam.mT
+
+ # Compute the rotational changes over time.
+ cam_angvel = transforms.matrix_to_axis_angle(R[:-1] @ R[1:].transpose(-1, -2))
+
+ # Convert matrix to 6D representation
+ cam_angvel = transforms.matrix_to_rotation_6d(transforms.axis_angle_to_matrix(cam_angvel))
+
+ # Normalize 6D angular velocity
+ cam_angvel = cam_angvel - torch.tensor([[1, 0, 0, 0, 1, 0]]).to(cam_angvel) # Normalize
+ cam_angvel = cam_angvel * fps
+ cam_angvel = torch.cat((cam_angvel, cam_angvel[:1]), dim=0)
+ return cam_angvel
+
+
+class CustomDataset(torch.utils.data.Dataset):
+ def __init__(self, cfg, tracking_results, slam_results, width, height, fps):
+
+ self.tracking_results = tracking_results
+ self.slam_results = slam_results
+ self.width = width
+ self.height = height
+ self.fps = fps
+ self.res = torch.tensor([width, height]).float()
+ self.intrinsics = compute_cam_intrinsics(self.res)
+
+ self.device = cfg.DEVICE.lower()
+
+ self.smpl = build_body_model('cpu')
+ self.keypoints_normalizer = Normalizer(cfg)
+
+ self._to = lambda x: x.unsqueeze(0).to(self.device)
+
+ def __len__(self):
+ return len(self.tracking_results.keys())
+
+ def load_data(self, index, flip=False):
+ if flip:
+ self.prefix = 'flipped_'
+ else:
+ self.prefix = ''
+
+ return self.__getitem__(index)
+
+ def __getitem__(self, _index):
+ if _index >= len(self): return
+
+ index = sorted(list(self.tracking_results.keys()))[_index]
+
+ # Process 2D keypoints
+ kp2d = torch.from_numpy(self.tracking_results[index][self.prefix + 'keypoints']).float()
+ mask = kp2d[..., -1] < KEYPOINTS_THR
+ bbox = torch.from_numpy(self.tracking_results[index][self.prefix + 'bbox']).float()
+
+ norm_kp2d, _ = self.keypoints_normalizer(
+ kp2d[..., :-1].clone(), self.res, self.intrinsics, 224, 224, bbox
+ )
+
+ # Process image features
+ features = self.tracking_results[index][self.prefix + 'features']
+
+ # Process initial pose
+ init_output = self.smpl.get_output(
+ global_orient=self.tracking_results[index][self.prefix + 'init_global_orient'],
+ body_pose=self.tracking_results[index][self.prefix + 'init_body_pose'],
+ betas=self.tracking_results[index][self.prefix + 'init_betas'],
+ pose2rot=False,
+ return_full_pose=True
+ )
+ init_kp3d = root_centering(init_output.joints[:, :17], 'coco')
+ init_kp = torch.cat((init_kp3d.reshape(1, -1), norm_kp2d[0].clone().reshape(1, -1)), dim=-1)
+ init_smpl = transforms.matrix_to_rotation_6d(init_output.full_pose)
+ init_root = transforms.matrix_to_rotation_6d(init_output.global_orient)
+
+ # Process SLAM results
+ cam_angvel = convert_dpvo_to_cam_angvel(self.slam_results, self.fps)
+
+ return (
+ index, # subject id
+ self._to(norm_kp2d), # 2d keypoints
+ (self._to(init_kp), self._to(init_smpl)), # initial pose
+ self._to(features), # image features
+ self._to(mask), # keypoints mask
+ init_root.to(self.device), # initial root orientation
+ self._to(cam_angvel), # camera angular velocity
+ self.tracking_results[index]['frame_id'], # frame indices
+ {'cam_intrinsics': self._to(self.intrinsics), # other keyword arguments
+ 'bbox': self._to(bbox),
+ 'res': self._to(self.res)},
+ )
\ No newline at end of file
diff --git a/lib/data/datasets/dataset_eval.py b/lib/data/datasets/dataset_eval.py
new file mode 100644
index 0000000000000000000000000000000000000000..85d7569027b09b08b91c6cc9c35404a800709aee
--- /dev/null
+++ b/lib/data/datasets/dataset_eval.py
@@ -0,0 +1,113 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import torch
+import joblib
+
+from configs import constants as _C
+from .._dataset import BaseDataset
+from ...utils import transforms
+from ...utils import data_utils as d_utils
+from ...utils.kp_utils import root_centering
+
+FPS = 30
+class EvalDataset(BaseDataset):
+ def __init__(self, cfg, data, split, backbone):
+ super(EvalDataset, self).__init__(cfg, False)
+
+ self.prefix = ''
+ self.data = data
+ parsed_data_path = os.path.join(_C.PATHS.PARSED_DATA, f'{data}_{split}_{backbone}.pth')
+ self.labels = joblib.load(parsed_data_path)
+
+ def load_data(self, index, flip=False):
+ if flip:
+ self.prefix = 'flipped_'
+ else:
+ self.prefix = ''
+
+ target = self.__getitem__(index)
+ for key, val in target.items():
+ if isinstance(val, torch.Tensor):
+ target[key] = val.unsqueeze(0)
+ return target
+
+ def __getitem__(self, index):
+ target = {}
+ target = self.get_data(index)
+ target = d_utils.prepare_keypoints_data(target)
+ target = d_utils.prepare_smpl_data(target)
+
+ return target
+
+ def __len__(self):
+ return len(self.labels['kp2d'])
+
+ def prepare_labels(self, index, target):
+ # Ground truth SMPL parameters
+ target['pose'] = transforms.axis_angle_to_matrix(self.labels['pose'][index].reshape(-1, 24, 3))
+ target['betas'] = self.labels['betas'][index]
+ target['gender'] = self.labels['gender'][index]
+
+ # Sequence information
+ target['res'] = self.labels['res'][index][0]
+ target['vid'] = self.labels['vid'][index]
+ target['frame_id'] = self.labels['frame_id'][index][1:]
+
+ # Camera information
+ self.get_naive_intrinsics(target['res'])
+ target['cam_intrinsics'] = self.cam_intrinsics
+ R = self.labels['cam_poses'][index][:, :3, :3].clone()
+ if 'emdb' in self.data.lower():
+ # Use groundtruth camera angular velocity.
+ # Can be updated with SLAM results if you have it.
+ cam_angvel = transforms.matrix_to_rotation_6d(R[:-1] @ R[1:].transpose(-1, -2))
+ cam_angvel = (cam_angvel - torch.tensor([[1, 0, 0, 0, 1, 0]]).to(cam_angvel)) * FPS
+ target['R'] = R
+ else:
+ cam_angvel = torch.zeros((len(target['pose']) - 1, 6))
+ target['cam_angvel'] = cam_angvel
+ return target
+
+ def prepare_inputs(self, index, target):
+ for key in ['features', 'bbox']:
+ data = self.labels[self.prefix + key][index][1:]
+ target[key] = data
+
+ bbox = self.labels[self.prefix + 'bbox'][index][..., [0, 1, -1]].clone().float()
+ bbox[:, 2] = bbox[:, 2] / 200
+
+ # Normalize keypoints
+ kp2d, bbox = self.keypoints_normalizer(
+ self.labels[self.prefix + 'kp2d'][index][..., :2].clone().float(),
+ target['res'], target['cam_intrinsics'], 224, 224, bbox)
+ target['kp2d'] = kp2d
+ target['bbox'] = bbox[1:]
+
+ # Masking out low confident keypoints
+ mask = self.labels[self.prefix + 'kp2d'][index][..., -1] < 0.3
+ target['input_kp2d'] = self.labels['kp2d'][index][1:]
+ target['input_kp2d'][mask[1:]] *= 0
+ target['mask'] = mask[1:]
+
+ return target
+
+ def prepare_initialization(self, index, target):
+ # Initial frame per-frame estimation
+ target['init_kp3d'] = root_centering(self.labels[self.prefix + 'init_kp3d'][index][:1, :self.n_joints]).reshape(1, -1)
+ target['init_pose'] = transforms.axis_angle_to_matrix(self.labels[self.prefix + 'init_pose'][index][:1]).cpu()
+ pose_root = target['pose'][:, 0].clone()
+ target['init_root'] = transforms.matrix_to_rotation_6d(pose_root)
+
+ return target
+
+ def get_data(self, index):
+ target = {}
+
+ target = self.prepare_labels(index, target)
+ target = self.prepare_inputs(index, target)
+ target = self.prepare_initialization(index, target)
+
+ return target
\ No newline at end of file
diff --git a/lib/data/datasets/mixed_dataset.py b/lib/data/datasets/mixed_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..33168238d2745014b92c8d9809d54c8e422291f4
--- /dev/null
+++ b/lib/data/datasets/mixed_dataset.py
@@ -0,0 +1,61 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import numpy as np
+
+from .amass import AMASSDataset
+from .videos import Human36M, ThreeDPW, MPII3D, InstaVariety
+from .bedlam import BEDLAMDataset
+from lib.utils.data_utils import make_collate_fn
+
+
+class DataFactory(torch.utils.data.Dataset):
+ def __init__(self, cfg, train_stage='syn'):
+ super(DataFactory, self).__init__()
+
+ if train_stage == 'stage1':
+ self.datasets = [AMASSDataset(cfg)]
+ self.dataset_names = ['AMASS']
+ elif train_stage == 'stage2':
+ self.datasets = [
+ AMASSDataset(cfg), ThreeDPW(cfg),
+ Human36M(cfg), MPII3D(cfg), InstaVariety(cfg)
+ ]
+ self.dataset_names = ['AMASS', '3DPW', 'Human36M', 'MPII3D', 'Insta']
+
+ if len(cfg.DATASET.RATIO) == 6: # Use BEDLAM
+ self.datasets.append(BEDLAMDataset(cfg))
+ self.dataset_names.append('BEDLAM')
+
+ self._set_partition(cfg.DATASET.RATIO)
+ self.lengths = [len(ds) for ds in self.datasets]
+
+ @property
+ def __name__(self, ):
+ return 'MixedData'
+
+ def prepare_video_batch(self):
+ [ds.prepare_video_batch() for ds in self.datasets]
+ self.lengths = [len(ds) for ds in self.datasets]
+
+ def _set_partition(self, partition):
+ self.partition = partition
+ self.ratio = partition
+ self.partition = np.array(self.partition).cumsum()
+ self.partition /= self.partition[-1]
+
+ def __len__(self):
+ return int(np.array([l for l, r in zip(self.lengths, self.ratio) if r > 0]).mean())
+
+ def __getitem__(self, index):
+ # Get the dataset to sample from
+ p = np.random.rand()
+ for i in range(len(self.datasets)):
+ if p <= self.partition[i]:
+ if len(self.datasets) == 1:
+ return self.datasets[i][index % self.lengths[i]]
+ else:
+ d_index = np.random.randint(0, self.lengths[i])
+ return self.datasets[i][d_index]
\ No newline at end of file
diff --git a/lib/data/datasets/videos.py b/lib/data/datasets/videos.py
new file mode 100644
index 0000000000000000000000000000000000000000..2001e86c64774849ef5f47f36248b7cef8f54afa
--- /dev/null
+++ b/lib/data/datasets/videos.py
@@ -0,0 +1,105 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import torch
+
+from configs import constants as _C
+from .dataset3d import Dataset3D
+from .dataset2d import Dataset2D
+from ...utils.kp_utils import convert_kps
+from smplx import SMPL
+
+
+class Human36M(Dataset3D):
+ def __init__(self, cfg, dset='train'):
+ parsed_data_path = os.path.join(_C.PATHS.PARSED_DATA, f'human36m_{dset}_backbone.pth')
+ parsed_data_path = parsed_data_path.replace('backbone', cfg.MODEL.BACKBONE.lower())
+ super(Human36M, self).__init__(cfg, parsed_data_path, dset=='train')
+
+ self.has_3d = True
+ self.has_traj = True
+ self.has_smpl = False
+ self.has_verts = False
+
+ # Among 31 joints format, 14 common joints are avaialable
+ self.mask = torch.zeros(_C.KEYPOINTS.NUM_JOINTS + 14)
+ self.mask[-14:] = 1
+
+ @property
+ def __name__(self, ):
+ return 'Human36M'
+
+ def compute_3d_keypoints(self, index):
+ return convert_kps(self.labels['joints3D'][index], 'spin', 'h36m'
+ )[:, _C.KEYPOINTS.H36M_TO_J14].float()
+
+class MPII3D(Dataset3D):
+ def __init__(self, cfg, dset='train'):
+ parsed_data_path = os.path.join(_C.PATHS.PARSED_DATA, f'mpii3d_{dset}_backbone.pth')
+ parsed_data_path = parsed_data_path.replace('backbone', cfg.MODEL.BACKBONE.lower())
+ super(MPII3D, self).__init__(cfg, parsed_data_path, dset=='train')
+
+ self.has_3d = True
+ self.has_traj = True
+ self.has_smpl = False
+ self.has_verts = False
+
+ # Among 31 joints format, 14 common joints are avaialable
+ self.mask = torch.zeros(_C.KEYPOINTS.NUM_JOINTS + 14)
+ self.mask[-14:] = 1
+
+ @property
+ def __name__(self, ):
+ return 'MPII3D'
+
+ def compute_3d_keypoints(self, index):
+ return convert_kps(self.labels['joints3D'][index], 'spin', 'h36m'
+ )[:, _C.KEYPOINTS.H36M_TO_J17].float()
+
+class ThreeDPW(Dataset3D):
+ def __init__(self, cfg, dset='train'):
+ parsed_data_path = os.path.join(_C.PATHS.PARSED_DATA, f'3dpw_{dset}_backbone.pth')
+ parsed_data_path = parsed_data_path.replace('backbone', cfg.MODEL.BACKBONE.lower())
+ super(ThreeDPW, self).__init__(cfg, parsed_data_path, dset=='train')
+
+ self.has_3d = True
+ self.has_traj = False
+ self.has_smpl = True
+ self.has_verts = True # In testing
+
+ # Among 31 joints format, 14 common joints are avaialable
+ self.mask = torch.zeros(_C.KEYPOINTS.NUM_JOINTS + 14)
+ self.mask[:-14] = 1
+
+ self.smpl_gender = {
+ 0: SMPL(_C.BMODEL.FLDR, gender='male', num_betas=10),
+ 1: SMPL(_C.BMODEL.FLDR, gender='female', num_betas=10)
+ }
+
+ @property
+ def __name__(self, ):
+ return 'ThreeDPW'
+
+ def compute_3d_keypoints(self, index):
+ return self.labels['joints3D'][index]
+
+
+class InstaVariety(Dataset2D):
+ def __init__(self, cfg, dset='train'):
+ parsed_data_path = os.path.join(_C.PATHS.PARSED_DATA, f'insta_{dset}_backbone.pth')
+ parsed_data_path = parsed_data_path.replace('backbone', cfg.MODEL.BACKBONE.lower())
+ super(InstaVariety, self).__init__(cfg, parsed_data_path, dset=='train')
+
+ self.has_3d = False
+ self.has_traj = False
+ self.has_smpl = False
+
+ # Among 31 joints format, 17 coco joints are avaialable
+ self.mask = torch.zeros(_C.KEYPOINTS.NUM_JOINTS + 14)
+ self.mask[:17] = 1
+
+ @property
+ def __name__(self, ):
+ return 'InstaVariety'
\ No newline at end of file
diff --git a/lib/data/utils/__pycache__/augmentor.cpython-39.pyc b/lib/data/utils/__pycache__/augmentor.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c0330486fd2e6ed222ac687d420bb112eea0349b
Binary files /dev/null and b/lib/data/utils/__pycache__/augmentor.cpython-39.pyc differ
diff --git a/lib/data/utils/__pycache__/normalizer.cpython-39.pyc b/lib/data/utils/__pycache__/normalizer.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c46765d436822c52ba518881a0c17eca7da9b165
Binary files /dev/null and b/lib/data/utils/__pycache__/normalizer.cpython-39.pyc differ
diff --git a/lib/data/utils/augmentor.py b/lib/data/utils/augmentor.py
new file mode 100644
index 0000000000000000000000000000000000000000..baa338013ba668192f854b39faf8cf2399803996
--- /dev/null
+++ b/lib/data/utils/augmentor.py
@@ -0,0 +1,292 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+from configs import constants as _C
+
+import torch
+import numpy as np
+from torch.nn import functional as F
+
+from ...utils import transforms
+
+__all__ = ['VideoAugmentor', 'SMPLAugmentor', 'SequenceAugmentor', 'CameraAugmentor']
+
+
+num_joints = _C.KEYPOINTS.NUM_JOINTS
+class VideoAugmentor():
+ def __init__(self, cfg, train=True):
+ self.train = train
+ self.l = cfg.DATASET.SEQLEN + 1
+ self.aug_dict = torch.load(_C.KEYPOINTS.COCO_AUG_DICT)
+
+ def get_jitter(self, ):
+ """Guassian jitter modeling."""
+ jittering_noise = torch.normal(
+ mean=torch.zeros((self.l, num_joints, 3)),
+ std=self.aug_dict['jittering'].reshape(1, num_joints, 1).expand(self.l, -1, 3)
+ ) * _C.KEYPOINTS.S_JITTERING
+ return jittering_noise
+
+ def get_lfhp(self, ):
+ """Low-frequency high-peak noise modeling."""
+ def get_peak_noise_mask():
+ peak_noise_mask = torch.rand(self.l, num_joints).float() * self.aug_dict['pmask'].squeeze(0)
+ peak_noise_mask = peak_noise_mask < _C.KEYPOINTS.S_PEAK_MASK
+ return peak_noise_mask
+
+ peak_noise_mask = get_peak_noise_mask()
+ peak_noise = peak_noise_mask.float().unsqueeze(-1).repeat(1, 1, 3)
+ peak_noise = peak_noise * torch.randn(3) * self.aug_dict['peak'].reshape(1, -1, 1) * _C.KEYPOINTS.S_PEAK
+ return peak_noise
+
+ def get_bias(self, ):
+ """Bias noise modeling."""
+ bias_noise = torch.normal(
+ mean=torch.zeros((num_joints, 3)), std=self.aug_dict['bias'].reshape(num_joints, 1)
+ ).unsqueeze(0) * _C.KEYPOINTS.S_BIAS
+ return bias_noise
+
+ def get_mask(self, scale=None):
+ """Mask modeling."""
+
+ if scale is None:
+ scale = _C.KEYPOINTS.S_MASK
+ # Per-frame and joint
+ mask = torch.rand(self.l, num_joints) < scale
+ visible = (~mask).clone()
+ for child in range(num_joints):
+ parent = _C.KEYPOINTS.TREE[child]
+ if parent == -1: continue
+ if isinstance(parent, list):
+ visible[:, child] *= (visible[:, parent[0]] * visible[:, parent[1]])
+ else:
+ visible[:, child] *= visible[:, parent]
+ mask = (~visible).clone()
+
+ return mask
+
+ def __call__(self, keypoints):
+ keypoints += self.get_bias() + self.get_jitter() + self.get_lfhp()
+ return keypoints
+
+
+class SMPLAugmentor():
+ noise_scale = 1e-2
+
+ def __init__(self, cfg, augment=True):
+ self.n_frames = cfg.DATASET.SEQLEN
+ self.augment = augment
+
+ def __call__(self, target):
+ if not self.augment:
+ # Only add initial frame augmentation
+ if not 'init_pose' in target:
+ target['init_pose'] = target['pose'][:1] @ self.get_initial_pose_augmentation()
+ return target
+
+ n_frames = target['pose'].shape[0]
+
+ # Global rotation
+ rmat = self.get_global_augmentation()
+ target['pose'][:, 0] = rmat @ target['pose'][:, 0]
+ target['transl'] = (rmat.squeeze() @ target['transl'].T).T
+
+ # Shape
+ shape_noise = self.get_shape_augmentation(n_frames)
+ target['betas'] = target['betas'] + shape_noise
+
+ # Initial frames mis-prediction
+ target['init_pose'] = target['pose'][:1] @ self.get_initial_pose_augmentation()
+
+ return target
+
+ def get_global_augmentation(self, ):
+ """Global coordinate augmentation. Random rotation around y-axis"""
+
+ angle_y = torch.rand(1) * 2 * np.pi * float(self.augment)
+ aa = torch.tensor([0.0, angle_y, 0.0]).float().unsqueeze(0)
+ rmat = transforms.axis_angle_to_matrix(aa)
+
+ return rmat
+
+ def get_shape_augmentation(self, n_frames):
+ """Shape noise modeling."""
+
+ shape_noise = torch.normal(
+ mean=torch.zeros((1, 10)),
+ std=torch.ones((1, 10)) * 0.1 * float(self.augment)).expand(n_frames, 10)
+
+ return shape_noise
+
+ def get_initial_pose_augmentation(self, ):
+ """Initial frame pose noise modeling. Random rotation around all joints."""
+
+ euler = torch.normal(
+ mean=torch.zeros((24, 3)),
+ std=torch.ones((24, 3))
+ ) * self.noise_scale #* float(self.augment)
+ rmat = transforms.axis_angle_to_matrix(euler)
+
+ return rmat.unsqueeze(0)
+
+
+class SequenceAugmentor:
+ """Augment the play speed of the motion sequence"""
+ l_factor = 1.5
+ def __init__(self, l_default):
+ self.l_default = l_default
+
+ def __call__(self, target):
+ l = torch.randint(low=int(self.l_default / self.l_factor), high=int(self.l_default * self.l_factor), size=(1, ))
+
+ pose = transforms.matrix_to_rotation_6d(target['pose'])
+ resampled_pose = F.interpolate(
+ pose[:l].permute(1, 2, 0), self.l_default, mode='linear', align_corners=True
+ ).permute(2, 0, 1)
+ resampled_pose = transforms.rotation_6d_to_matrix(resampled_pose)
+
+ transl = target['transl'].unsqueeze(1)
+ resampled_transl = F.interpolate(
+ transl[:l].permute(1, 2, 0), self.l_default, mode='linear', align_corners=True
+ ).squeeze(0).T
+
+ target['pose'] = resampled_pose
+ target['transl'] = resampled_transl
+ target['betas'] = target['betas'][:self.l_default]
+
+ return target
+
+
+class CameraAugmentor:
+ rx_factor = np.pi/8
+ ry_factor = np.pi/4
+ rz_factor = np.pi/8
+
+ pitch_std = np.pi/8
+ pitch_mean = np.pi/36
+ roll_std = np.pi/24
+ t_factor = 1
+
+ tz_scale = 10
+ tz_min = 2
+
+ motion_prob = 0.75
+ interp_noise = 0.2
+
+ def __init__(self, l, w, h, f):
+ self.l = l
+ self.w = w
+ self.h = h
+ self.f = f
+ self.fov_tol = 1.2 * (0.5 ** 0.5)
+
+ def __call__(self, target):
+
+ R, T = self.create_camera(target)
+
+ if np.random.rand() < self.motion_prob:
+ R = self.create_rotation_move(R)
+ T = self.create_translation_move(T)
+
+ return self.apply(target, R, T)
+
+ def create_camera(self, target):
+ """Create the initial frame camera pose"""
+ yaw = np.random.rand() * 2 * np.pi
+ pitch = np.random.normal(scale=self.pitch_std) + self.pitch_mean
+ roll = np.random.normal(scale=self.roll_std)
+
+ yaw_rm = transforms.axis_angle_to_matrix(torch.tensor([[0, yaw, 0]]).float())
+ pitch_rm = transforms.axis_angle_to_matrix(torch.tensor([[pitch, 0, 0]]).float())
+ roll_rm = transforms.axis_angle_to_matrix(torch.tensor([[0, 0, roll]]).float())
+ R = (roll_rm @ pitch_rm @ yaw_rm)
+
+ # Place people in the scene
+ tz = np.random.rand() * self.tz_scale + self.tz_min
+ max_d = self.w * tz / self.f / 2
+ tx = np.random.normal(scale=0.25) * max_d
+ ty = np.random.normal(scale=0.25) * max_d
+ dist = torch.tensor([tx, ty, tz]).float()
+ T = dist - torch.matmul(R, target['transl'][0])
+
+ return R.repeat(self.l, 1, 1), T.repeat(self.l, 1)
+
+ def create_rotation_move(self, R):
+ """Create rotational move for the camera"""
+
+ # Create final camera pose
+ rx = np.random.normal(scale=self.rx_factor)
+ ry = np.random.normal(scale=self.ry_factor)
+ rz = np.random.normal(scale=self.rz_factor)
+ Rf = R[0] @ transforms.axis_angle_to_matrix(torch.tensor([rx, ry, rz]).float())
+
+ # Inbetweening two poses
+ Rs = torch.stack((R[0], Rf))
+ rs = transforms.matrix_to_rotation_6d(Rs).numpy()
+ rs_move = self.noisy_interpolation(rs)
+ R_move = transforms.rotation_6d_to_matrix(torch.from_numpy(rs_move).float())
+ return R_move
+
+ def create_translation_move(self, T):
+ """Create translational move for the camera"""
+
+ # Create final camera position
+ tx = np.random.normal(scale=self.t_factor)
+ ty = np.random.normal(scale=self.t_factor)
+ tz = np.random.normal(scale=self.t_factor)
+ Ts = np.array([[0, 0, 0], [tx, ty, tz]])
+
+ T_move = self.noisy_interpolation(Ts)
+ T_move = torch.from_numpy(T_move).float()
+ return T_move + T
+
+ def noisy_interpolation(self, data):
+ """Non-linear interpolation with noise"""
+
+ dim = data.shape[-1]
+ output = np.zeros((self.l, dim))
+
+ linspace = np.stack([np.linspace(0, 1, self.l) for _ in range(dim)])
+ noise = (linspace[0, 1] - linspace[0, 0]) * self.interp_noise
+ space_noise = np.stack([np.random.uniform(-noise, noise, self.l - 2) for _ in range(dim)])
+
+ linspace[:, 1:-1] = linspace[:, 1:-1] + space_noise
+ for i in range(dim):
+ output[:, i] = np.interp(linspace[i], np.array([0., 1.,]), data[:, i])
+ return output
+
+ def apply(self, target, R, T):
+ target['R'] = R
+ target['T'] = T
+
+ # Recompute the translation
+ transl_cam = torch.matmul(R, target['transl'].unsqueeze(-1)).squeeze(-1)
+ transl_cam = transl_cam + T
+ if transl_cam[..., 2].min() < 0.5: # If the person is too close to the camera
+ transl_cam[..., 2] = transl_cam[..., 2] + (1.0 - transl_cam[..., 2].min())
+
+ # If the subject is away from the field of view, put the camera behind
+ fov = torch.div(transl_cam[..., :2], transl_cam[..., 2:]).abs()
+ if fov.max() > self.fov_tol:
+ t_max = transl_cam[fov.max(1)[0].max(0)[1].item()]
+ z_trg = t_max[:2].abs().max(0)[0] / self.fov_tol
+ pad = z_trg - t_max[2]
+ transl_cam[..., 2] = transl_cam[..., 2] + pad
+
+ target['transl_cam'] = transl_cam
+
+ # Transform world coordinate to camera coordinate
+ target['pose_root'] = target['pose'][:, 0].clone()
+ target['pose'][:, 0] = R @ target['pose'][:, 0] # pose
+ target['init_pose'][:, 0] = R[:1] @ target['init_pose'][:, 0] # init pose
+
+ # Compute angular velocity
+ cam_angvel = transforms.matrix_to_rotation_6d(R[:-1] @ R[1:].transpose(-1, -2))
+ cam_angvel = cam_angvel - torch.tensor([[1, 0, 0, 0, 1, 0]]).to(cam_angvel) # Normalize
+ target['cam_angvel'] = cam_angvel * 3e1 # assume 30-fps
+
+ if 'kp3d' in target:
+ target['kp3d'] = torch.matmul(R, target['kp3d'].transpose(1, 2)).transpose(1, 2) + target['transl_cam'].unsqueeze(1)
+
+ return target
\ No newline at end of file
diff --git a/lib/data/utils/normalizer.py b/lib/data/utils/normalizer.py
new file mode 100644
index 0000000000000000000000000000000000000000..9ea8c049f424b4896c9c473802ff6908328e8b36
--- /dev/null
+++ b/lib/data/utils/normalizer.py
@@ -0,0 +1,105 @@
+import torch
+import numpy as np
+
+from ...utils.imutils import transform_keypoints
+
+class Normalizer:
+ def __init__(self, cfg):
+ pass
+
+ def __call__(self, kp_2d, res, cam_intrinsics, patch_width=224, patch_height=224, bbox=None, mask=None):
+ if bbox is None:
+ bbox = compute_bbox_from_keypoints(kp_2d, do_augment=True, mask=mask)
+
+ out_kp_2d = self.bbox_normalization(kp_2d, bbox, res, patch_width, patch_height)
+ return out_kp_2d, bbox
+
+ def bbox_normalization(self, kp_2d, bbox, res, patch_width, patch_height):
+ to_torch = False
+ if isinstance(kp_2d, torch.Tensor):
+ to_torch = True
+ kp_2d = kp_2d.numpy()
+ bbox = bbox.numpy()
+
+ out_kp_2d = np.zeros_like(kp_2d)
+ for idx in range(len(out_kp_2d)):
+ out_kp_2d[idx] = transform_keypoints(kp_2d[idx], bbox[idx][:3], patch_width, patch_height)[0]
+ out_kp_2d[idx] = normalize_keypoints_to_patch(out_kp_2d[idx], patch_width)
+
+ if to_torch:
+ out_kp_2d = torch.from_numpy(out_kp_2d)
+ bbox = torch.from_numpy(bbox)
+
+ centers = normalize_keypoints_to_image(bbox[:, :2].unsqueeze(1), res).squeeze(1)
+ scale = bbox[:, 2:] * 200 / res.max()
+ location = torch.cat((centers, scale), dim=-1)
+
+ out_kp_2d = out_kp_2d.reshape(out_kp_2d.shape[0], -1)
+ out_kp_2d = torch.cat((out_kp_2d, location), dim=-1)
+ return out_kp_2d
+
+
+def normalize_keypoints_to_patch(kp_2d, crop_size=224, inv=False):
+ # Normalize keypoints between -1, 1
+ if not inv:
+ ratio = 1.0 / crop_size
+ kp_2d = 2.0 * kp_2d * ratio - 1.0
+ else:
+ ratio = 1.0 / crop_size
+ kp_2d = (kp_2d + 1.0)/(2*ratio)
+
+ return kp_2d
+
+
+def normalize_keypoints_to_image(x, res):
+ res = res.to(x.device)
+ scale = res.max(-1)[0].reshape(-1)
+ mean = torch.stack([res[..., 0] / scale, res[..., 1] / scale], dim=-1).to(x.device)
+ x = (2 * x / scale.reshape(*[1 for i in range(len(x.shape[1:]))]) - \
+ mean.reshape(*[1 for i in range(len(x.shape[1:-1]))], -1))
+ return x
+
+
+def compute_bbox_from_keypoints(X, do_augment=False, mask=None):
+ def smooth_bbox(bb):
+ # Smooth bounding box detection
+ import scipy.signal as signal
+ smoothed = np.array([signal.medfilt(param, int(30 / 2)) for param in bb])
+ return smoothed
+
+ def do_augmentation(scale_factor=0.2, trans_factor=0.05):
+ _scaleFactor = np.random.uniform(1.0 - scale_factor, 1.2 + scale_factor)
+ _trans_x = np.random.uniform(-trans_factor, trans_factor)
+ _trans_y = np.random.uniform(-trans_factor, trans_factor)
+
+ return _scaleFactor, _trans_x, _trans_y
+
+ if do_augment:
+ scaleFactor, trans_x, trans_y = do_augmentation()
+ else:
+ scaleFactor, trans_x, trans_y = 1.2, 0.0, 0.0
+
+ if mask is None:
+ bbox = [X[:, :, 0].min(-1)[0], X[:, :, 1].min(-1)[0],
+ X[:, :, 0].max(-1)[0], X[:, :, 1].max(-1)[0]]
+ else:
+ bbox = []
+ for x, _mask in zip(X, mask):
+ if _mask.sum() > 10:
+ _mask[:] = False
+ _bbox = [x[~_mask, 0].min(-1)[0], x[~_mask, 1].min(-1)[0],
+ x[~_mask, 0].max(-1)[0], x[~_mask, 1].max(-1)[0]]
+ bbox.append(_bbox)
+ bbox = torch.tensor(bbox).T
+
+ cx, cy = [(bbox[2]+bbox[0])/2, (bbox[3]+bbox[1])/2]
+ bbox_w = bbox[2] - bbox[0]
+ bbox_h = bbox[3] - bbox[1]
+ bbox_size = torch.stack((bbox_w, bbox_h)).max(0)[0]
+ scale = bbox_size * scaleFactor
+ bbox = torch.stack((cx + trans_x * scale, cy + trans_y * scale, scale / 200))
+
+ if do_augment:
+ bbox = torch.from_numpy(smooth_bbox(bbox.numpy()))
+
+ return bbox.T
\ No newline at end of file
diff --git a/lib/data_utils/amass_utils.py b/lib/data_utils/amass_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..94e4e5b6dad8772568928272ba97d972837a8dda
--- /dev/null
+++ b/lib/data_utils/amass_utils.py
@@ -0,0 +1,107 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import os.path as osp
+from collections import defaultdict
+
+import torch
+import joblib
+import numpy as np
+from tqdm import tqdm
+from smplx import SMPL
+
+from configs import constants as _C
+from lib.utils.data_utils import map_dmpl_to_smpl, transform_global_coordinate
+
+
+@torch.no_grad()
+def process_amass():
+ target_fps = 30
+
+ _, seqs, _ = next(os.walk(_C.PATHS.AMASS_PTH))
+
+ zup2ydown = torch.Tensor(
+ [[1, 0, 0], [0, 0, -1], [0, 1, 0]]
+ ).unsqueeze(0).float()
+
+ smpl_dict = {'male': SMPL(model_path=_C.BMODEL.FLDR, gender='male'),
+ 'female': SMPL(model_path=_C.BMODEL.FLDR, gender='female'),
+ 'neutral': SMPL(model_path=_C.BMODEL.FLDR)}
+ processed_data = defaultdict(list)
+
+ for seq in (seq_bar := tqdm(sorted(seqs), leave=True)):
+ seq_bar.set_description(f'Dataset: {seq}')
+ seq_fldr = osp.join(_C.PATHS.AMASS_PTH, seq)
+ _, subjs, _ = next(os.walk(seq_fldr))
+
+ for subj in (subj_bar := tqdm(sorted(subjs), leave=False)):
+ subj_bar.set_description(f'Subject: {subj}')
+ subj_fldr = osp.join(seq_fldr, subj)
+ acts = [x for x in os.listdir(subj_fldr) if x.endswith('.npz')]
+
+ for act in (act_bar := tqdm(sorted(acts), leave=False)):
+ act_bar.set_description(f'Action: {act}')
+
+ # Load data
+ fname = osp.join(subj_fldr, act)
+ if fname.endswith('shape.npz') or fname.endswith('stagei.npz'):
+ # Skip shape and stagei files
+ continue
+ data = dict(np.load(fname, allow_pickle=True))
+
+ # Resample data to target_fps
+ key = [k for k in data.keys() if 'mocap_frame' in k][0]
+ mocap_framerate = data[key]
+ retain_freq = int(mocap_framerate / target_fps + 0.5)
+ num_frames = len(data['poses'][::retain_freq])
+
+ # Skip if the sequence is too short
+ if num_frames < 25: continue
+
+ # Get SMPL groundtruth from MoSh fitting
+ pose = map_dmpl_to_smpl(torch.from_numpy(data['poses'][::retain_freq]).float())
+ transl = torch.from_numpy(data['trans'][::retain_freq]).float()
+ betas = torch.from_numpy(
+ np.repeat(data['betas'][:10][np.newaxis], pose.shape[0], axis=0)).float()
+
+ # Convert Z-up coordinate to Y-down
+ pose, transl = transform_global_coordinate(pose, zup2ydown, transl)
+ pose = pose.reshape(-1, 72)
+
+ # Create SMPL mesh
+ gender = str(data['gender'])
+ if not gender in ['male', 'female', 'neutral']:
+ if 'female' in gender: gender = 'female'
+ elif 'neutral' in gender: gender = 'neutral'
+ elif 'male' in gender: gender = 'male'
+
+ output = smpl_dict[gender](body_pose=pose[:, 3:],
+ global_orient=pose[:, :3],
+ betas=betas,
+ transl=transl)
+ vertices = output.vertices
+
+ # Assume motion starts with 0-height
+ init_height = vertices[0].max(0)[0][1]
+ transl[:, 1] = transl[:, 1] + init_height
+ vertices[:, :, 1] = vertices[:, :, 1] - init_height
+
+ # Append data
+ processed_data['pose'].append(pose.numpy())
+ processed_data['betas'].append(betas.numpy())
+ processed_data['transl'].append(transl.numpy())
+ processed_data['vid'].append(np.array([f'{seq}_{subj}_{act}'] * pose.shape[0]))
+
+ for key, val in processed_data.items():
+ processed_data[key] = np.concatenate(val)
+
+ joblib.dump(processed_data, _C.PATHS.AMASS_LABEL)
+ print('\nDone!')
+
+if __name__ == '__main__':
+ out_path = '/'.join(_C.PATHS.AMASS_LABEL.split('/')[:-1])
+ os.makedirs(out_path, exist_ok=True)
+
+ process_amass()
\ No newline at end of file
diff --git a/lib/data_utils/emdb_eval_utils.py b/lib/data_utils/emdb_eval_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..d519cb9d289c68534c84bef9dfa47183e415a67e
--- /dev/null
+++ b/lib/data_utils/emdb_eval_utils.py
@@ -0,0 +1,189 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import pickle
+import joblib
+import argparse
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from lib.models.smpl import SMPL
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.preproc.backbone.utils import process_image
+from lib.utils import transforms
+from lib.utils.imutils import (
+ flip_kp, flip_bbox
+)
+
+dataset = defaultdict(list)
+detection_results_dir = 'dataset/detection_results/EMDB'
+
+def is_dset(emdb_pkl_file, dset):
+ target_dset = 'emdb' + dset
+ with open(emdb_pkl_file, "rb") as f:
+ data = pickle.load(f)
+ return data[target_dset]
+
+@torch.no_grad()
+def preprocess(dset, batch_size):
+
+ tt = lambda x: torch.from_numpy(x).float()
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+ save_pth = osp.join(_C.PATHS.PARSED_DATA, f'emdb_{dset}_vit.pth') # Use ViT feature extractor
+ extractor = FeatureExtractor(device, flip_eval=True, max_batch_size=batch_size)
+
+ all_emdb_pkl_files = sorted(glob(os.path.join(_C.PATHS.EMDB_PTH, "*/*/*_data.pkl")))
+ emdb_sequence_roots = []
+ both = []
+ for emdb_pkl_file in all_emdb_pkl_files:
+ if is_dset(emdb_pkl_file, dset):
+ emdb_sequence_roots.append(os.path.dirname(emdb_pkl_file))
+
+ smpl = {
+ 'neutral': SMPL(model_path=_C.BMODEL.FLDR),
+ 'male': SMPL(model_path=_C.BMODEL.FLDR, gender='male'),
+ 'female': SMPL(model_path=_C.BMODEL.FLDR, gender='female'),
+ }
+
+ for sequence in emdb_sequence_roots:
+ subj, seq = sequence.split('/')[-2:]
+ annot_pth = glob(osp.join(sequence, '*_data.pkl'))[0]
+ annot = pickle.load(open(annot_pth, 'rb'))
+
+ # Get ground truth data
+ gender = annot['gender']
+ masks = annot['good_frames_mask']
+ poses_body = annot["smpl"]["poses_body"]
+ poses_root = annot["smpl"]["poses_root"]
+ betas = np.repeat(annot["smpl"]["betas"].reshape((1, -1)), repeats=annot["n_frames"], axis=0)
+ extrinsics = annot["camera"]["extrinsics"]
+ width, height = annot['camera']['width'], annot['camera']['height']
+ xyxys = annot['bboxes']['bboxes']
+
+ # Map to camear coordinate
+ poses_root_cam = transforms.matrix_to_axis_angle(tt(extrinsics[:, :3, :3]) @ transforms.axis_angle_to_matrix(tt(poses_root)))
+ poses = np.concatenate([poses_root_cam, poses_body], axis=-1)
+
+ pred_kp2d = np.load(osp.join(detection_results_dir, f'{subj}_{seq}.npy'))
+
+ # ======== Extract features ======== #
+ imname_list = sorted(glob(osp.join(sequence, 'images/*')))
+ bboxes, frame_ids, patch_list, features, flipped_features = [], [], [], [], []
+ bar = Bar(f'Load images', fill='#', max=len(imname_list))
+ for idx, (imname, xyxy, mask) in enumerate(zip(imname_list, xyxys, masks)):
+ if not mask: continue
+
+ # ========= Load image ========= #
+ img_rgb = cv2.cvtColor(cv2.imread(imname), cv2.COLOR_BGR2RGB)
+
+ # ========= Load bbox ========= #
+ x1, y1, x2, y2 = xyxy
+ bbox = np.array([(x1 + x2)/2., (y1 + y2)/2., max(x2 - x1, y2 - y1) / 1.1])
+
+ # ========= Process image ========= #
+ norm_img, crop_img = process_image(img_rgb, bbox[:2], bbox[2] / 200, 256, 256)
+
+ patch_list.append(torch.from_numpy(norm_img).unsqueeze(0).float())
+ bboxes.append(bbox)
+ frame_ids.append(idx)
+ bar.next()
+
+ patch_list = torch.split(torch.cat(patch_list), batch_size)
+ bboxes = torch.from_numpy(np.stack(bboxes)).float()
+ for i, patch in enumerate(patch_list):
+ bbox = bboxes[i*batch_size:min((i+1)*batch_size, len(frame_ids))].float().cuda()
+ bbox_center = bbox[:, :2]
+ bbox_scale = bbox[:, 2] / 200
+
+ feature = extractor.model(patch.cuda(), encode=True)
+ features.append(feature.cpu())
+
+ flipped_feature = extractor.model(torch.flip(patch, (3, )).cuda(), encode=True)
+ flipped_features.append(flipped_feature.cpu())
+
+ if i == 0:
+ init_patch = patch[[0]].clone()
+
+ features = torch.cat(features)
+ flipped_features = torch.cat(flipped_features)
+ res_h, res_w = img_rgb.shape[:2]
+
+ # ======== Append data ======== #
+ dataset['gender'].append(gender)
+ dataset['bbox'].append(bboxes)
+ dataset['res'].append(torch.tensor([[width, height]]).repeat(len(frame_ids), 1).float())
+ dataset['vid'].append(f'{subj}_{seq}')
+ dataset['pose'].append(tt(poses)[frame_ids])
+ dataset['betas'].append(tt(betas)[frame_ids])
+ dataset['kp2d'].append(tt(pred_kp2d)[frame_ids])
+ dataset['frame_id'].append(torch.from_numpy(np.array(frame_ids)))
+ dataset['cam_poses'].append(tt(extrinsics)[frame_ids])
+ dataset['features'].append(features)
+ dataset['flipped_features'].append(flipped_features)
+
+ # Flipped data
+ dataset['flipped_bbox'].append(
+ torch.from_numpy(flip_bbox(dataset['bbox'][-1].clone().numpy(), res_w, res_h)).float()
+ )
+ dataset['flipped_kp2d'].append(
+ torch.from_numpy(flip_kp(dataset['kp2d'][-1].clone().numpy(), res_w)).float()
+ )
+ # ======== Append data ======== #
+
+ # Pad 1 frame
+ for key, val in dataset.items():
+ if isinstance(val[-1], torch.Tensor):
+ dataset[key][-1] = torch.cat((val[-1][:1].clone(), val[-1][:]), dim=0)
+
+ # Initial predictions
+ bbox = bboxes[:1].clone().cuda()
+ bbox_center = bbox[:, :2].clone()
+ bbox_scale = bbox[:, 2].clone() / 200
+ kwargs = {'img_w': torch.tensor(res_w).repeat(1).float().cuda(),
+ 'img_h': torch.tensor(res_h).repeat(1).float().cuda(),
+ 'bbox_center': bbox_center, 'bbox_scale': bbox_scale}
+
+ pred_global_orient, pred_pose, pred_shape, _ = extractor.model(init_patch.cuda(), **kwargs)
+ pred_output = smpl['neutral'].get_output(global_orient=pred_global_orient.cpu(),
+ body_pose=pred_pose.cpu(),
+ betas=pred_shape.cpu(),
+ pose2rot=False)
+ init_kp3d = pred_output.joints
+ init_pose = transforms.matrix_to_axis_angle(torch.cat((pred_global_orient, pred_pose), dim=1))
+
+ dataset['init_kp3d'].append(init_kp3d)
+ dataset['init_pose'].append(init_pose.cpu())
+
+ # Flipped initial predictions
+ bbox_center[:, 0] = res_w - bbox_center[:, 0]
+ pred_global_orient, pred_pose, pred_shape, _ = extractor.model(torch.flip(init_patch, (3, )).cuda(), **kwargs)
+ pred_output = smpl['neutral'].get_output(global_orient=pred_global_orient.cpu(),
+ body_pose=pred_pose.cpu(),
+ betas=pred_shape.cpu(),
+ pose2rot=False)
+ init_kp3d = pred_output.joints
+ init_pose = transforms.matrix_to_axis_angle(torch.cat((pred_global_orient, pred_pose), dim=1))
+
+ dataset['flipped_init_kp3d'].append(init_kp3d)
+ dataset['flipped_init_pose'].append(init_pose.cpu())
+
+ joblib.dump(dataset, save_pth)
+ logger.info(f'==> Done !')
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('-s', '--split', type=str, choices=['1', '2'], help='Data split')
+ parser.add_argument('-b', '--batch_size', type=int, default=128, help='Data split')
+ args = parser.parse_args()
+
+ preprocess(args.split, args.batch_size)
\ No newline at end of file
diff --git a/lib/data_utils/rich_eval_utils.py b/lib/data_utils/rich_eval_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..f4a174ad2a867159da007e59819a4030910728a7
--- /dev/null
+++ b/lib/data_utils/rich_eval_utils.py
@@ -0,0 +1,69 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import pickle
+import joblib
+import argparse
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from lib.models.smpl import SMPL
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.preproc.backbone.utils import process_image
+from lib.utils import transforms
+from lib.utils.imutils import (
+ flip_kp, flip_bbox
+)
+
+dataset = defaultdict(list)
+detection_results_dir = 'dataset/detection_results/RICH'
+
+def extract_cam_param_xml(xml_path='', dtype=torch.float32):
+
+ import xml.etree.ElementTree as ET
+ tree = ET.parse(xml_path)
+
+ extrinsics_mat = [float(s) for s in tree.find('./CameraMatrix/data').text.split()]
+ intrinsics_mat = [float(s) for s in tree.find('./Intrinsics/data').text.split()]
+ # distortion_vec = [float(s) for s in tree.find('./Distortion/data').text.split()]
+
+ focal_length_x = intrinsics_mat[0]
+ focal_length_y = intrinsics_mat[4]
+ center = torch.tensor([[intrinsics_mat[2], intrinsics_mat[5]]], dtype=dtype)
+
+ rotation = torch.tensor([[extrinsics_mat[0], extrinsics_mat[1], extrinsics_mat[2]],
+ [extrinsics_mat[4], extrinsics_mat[5], extrinsics_mat[6]],
+ [extrinsics_mat[8], extrinsics_mat[9], extrinsics_mat[10]]], dtype=dtype)
+
+ translation = torch.tensor([[extrinsics_mat[3], extrinsics_mat[7], extrinsics_mat[11]]], dtype=dtype)
+
+ # t = -Rc --> c = -R^Tt
+ cam_center = [ -extrinsics_mat[0]*extrinsics_mat[3] - extrinsics_mat[4]*extrinsics_mat[7] - extrinsics_mat[8]*extrinsics_mat[11],
+ -extrinsics_mat[1]*extrinsics_mat[3] - extrinsics_mat[5]*extrinsics_mat[7] - extrinsics_mat[9]*extrinsics_mat[11],
+ -extrinsics_mat[2]*extrinsics_mat[3] - extrinsics_mat[6]*extrinsics_mat[7] - extrinsics_mat[10]*extrinsics_mat[11]]
+
+ cam_center = torch.tensor([cam_center], dtype=dtype)
+
+ return focal_length_x, focal_length_y, center, rotation, translation, cam_center
+
+@torch.no_grad()
+def preprocess(dset, batch_size):
+ import pdb; pdb.set_trace()
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('-s', '--split', type=str, choices=['1', '2'], help='Data split')
+ parser.add_argument('-b', '--batch_size', type=int, default=128, help='Data split')
+ args = parser.parse_args()
+
+ preprocess(args.split, args.batch_size)
\ No newline at end of file
diff --git a/lib/data_utils/threedpw_eval_utils.py b/lib/data_utils/threedpw_eval_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..ac981693c3a639fbbfd65af49d76b53c3e44c9e0
--- /dev/null
+++ b/lib/data_utils/threedpw_eval_utils.py
@@ -0,0 +1,185 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import pickle
+import joblib
+import argparse
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from lib.models.smpl import SMPL
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.preproc.backbone.utils import process_image
+from lib.utils import transforms
+from lib.utils.imutils import (
+ flip_kp, flip_bbox
+)
+
+
+dataset = defaultdict(list)
+detection_results_dir = 'dataset/detection_results/3DPW'
+tcmr_annot_pth = 'dataset/parsed_data/TCMR_preproc/3dpw_dset_db.pt'
+
+@torch.no_grad()
+def preprocess(dset, batch_size):
+
+ if dset == 'val': _dset = 'validation'
+ else: _dset = dset
+
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+ save_pth = osp.join(_C.PATHS.PARSED_DATA, f'3pdw_{dset}_vit.pth') # Use ViT feature extractor
+ extractor = FeatureExtractor(device, flip_eval=True, max_batch_size=batch_size)
+
+ tcmr_data = joblib.load(tcmr_annot_pth.replace('dset', dset))
+ smpl_neutral = SMPL(model_path=_C.BMODEL.FLDR)
+
+ annot_file_list, idxs = np.unique(tcmr_data['vid_name'], return_index=True)
+ idxs = idxs.tolist()
+ annot_file_list = [annot_file_list[idxs.index(idx)] for idx in sorted(idxs)]
+ annot_file_list = [osp.join(_C.PATHS.THREEDPW_PTH, 'sequenceFiles', _dset, annot_file[:-2] + '.pkl') for annot_file in annot_file_list]
+ annot_file_list = list(dict.fromkeys(annot_file_list))
+
+ for annot_file in annot_file_list:
+ seq = annot_file.split('/')[-1].split('.')[0]
+
+ data = pickle.load(open(annot_file, 'rb'), encoding='latin1')
+
+ num_people = len(data['poses'])
+ num_frames = len(data['img_frame_ids'])
+ assert (data['poses2d'][0].shape[0] == num_frames)
+
+ K = torch.from_numpy(data['cam_intrinsics']).unsqueeze(0).float()
+
+ for p_id in range(num_people):
+
+ logger.info(f'==> {seq} {p_id}')
+ gender = {'m': 'male', 'f': 'female'}[data['genders'][p_id]]
+
+ # ======== Add TCMR data ======== #
+ vid_name = f'{seq}_{p_id}'
+ tcmr_ids = [i for i, v in enumerate(tcmr_data['vid_name']) if vid_name in v]
+ frame_ids = tcmr_data['frame_id'][tcmr_ids]
+
+ pose = torch.from_numpy(data['poses'][p_id]).float()[frame_ids]
+ shape = torch.from_numpy(data['betas'][p_id][:10]).float().repeat(pose.size(0), 1)
+ pose = torch.from_numpy(tcmr_data['pose'][tcmr_ids]).float() # Camera coordinate
+ cam_poses = torch.from_numpy(data['cam_poses'][frame_ids]).float()
+
+ # ======== Get detection results ======== #
+ fname = f'{seq}_{p_id}.npy'
+ pred_kp2d = torch.from_numpy(
+ np.load(osp.join(detection_results_dir, fname))
+ ).float()[frame_ids]
+ # ======== Get detection results ======== #
+
+ img_paths = sorted(glob(osp.join(_C.PATHS.THREEDPW_PTH, 'imageFiles', seq, '*.jpg')))
+ img_paths = [img_path for i, img_path in enumerate(img_paths) if i in frame_ids]
+ img = cv2.imread(img_paths[0]); res_h, res_w = img.shape[:2]
+ vid_idxs = fname.split('.')[0]
+
+ # ======== Append data ======== #
+ dataset['gender'].append(gender)
+ dataset['vid'].append(vid_idxs)
+ dataset['pose'].append(pose)
+ dataset['betas'].append(shape)
+ dataset['cam_poses'].append(cam_poses)
+ dataset['frame_id'].append(torch.from_numpy(frame_ids))
+ dataset['res'].append(torch.tensor([[res_w, res_h]]).repeat(len(frame_ids), 1).float())
+ dataset['bbox'].append(torch.from_numpy(tcmr_data['bbox'][tcmr_ids].copy()).float())
+ dataset['kp2d'].append(pred_kp2d)
+
+ # Flipped data
+ dataset['flipped_bbox'].append(
+ torch.from_numpy(flip_bbox(dataset['bbox'][-1].clone().numpy(), res_w, res_h)).float()
+ )
+ dataset['flipped_kp2d'].append(
+ torch.from_numpy(flip_kp(dataset['kp2d'][-1].clone().numpy(), res_w)).float()
+ )
+ # ======== Append data ======== #
+
+ # ======== Extract features ======== #
+ patch_list = []
+ bboxes = dataset['bbox'][-1].clone().numpy()
+ bar = Bar(f'Load images', fill='#', max=len(img_paths))
+
+ for img_path, bbox in zip(img_paths, bboxes):
+ img_rgb = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)
+ norm_img, crop_img = process_image(img_rgb, bbox[:2], bbox[2] / 200, 256, 256)
+ patch_list.append(torch.from_numpy(norm_img).unsqueeze(0).float())
+ bar.next()
+
+ patch_list = torch.split(torch.cat(patch_list), batch_size)
+ features, flipped_features = [], []
+ for i, patch in enumerate(patch_list):
+ feature = extractor.model(patch.cuda(), encode=True)
+ features.append(feature.cpu())
+
+ flipped_feature = extractor.model(torch.flip(patch, (3, )).cuda(), encode=True)
+ flipped_features.append(flipped_feature.cpu())
+
+ if i == 0:
+ init_patch = patch[[0]].clone()
+
+ features = torch.cat(features)
+ flipped_features = torch.cat(flipped_features)
+ dataset['features'].append(features)
+ dataset['flipped_features'].append(flipped_features)
+ # ======== Extract features ======== #
+
+ # Pad 1 frame
+ for key, val in dataset.items():
+ if isinstance(val[-1], torch.Tensor):
+ dataset[key][-1] = torch.cat((val[-1][:1].clone(), val[-1][:]), dim=0)
+
+ # Initial predictions
+ bbox = torch.from_numpy(bboxes[:1].copy()).float().cuda()
+ bbox_center = bbox[:, :2].clone()
+ bbox_scale = bbox[:, 2].clone() / 200
+ kwargs = {'img_w': torch.tensor(res_w).repeat(1).float().cuda(),
+ 'img_h': torch.tensor(res_h).repeat(1).float().cuda(),
+ 'bbox_center': bbox_center, 'bbox_scale': bbox_scale}
+
+ pred_global_orient, pred_pose, pred_shape, _ = extractor.model(init_patch.cuda(), **kwargs)
+ pred_output = smpl_neutral.get_output(global_orient=pred_global_orient.cpu(),
+ body_pose=pred_pose.cpu(),
+ betas=pred_shape.cpu(),
+ pose2rot=False)
+ init_kp3d = pred_output.joints
+ init_pose = transforms.matrix_to_axis_angle(torch.cat((pred_global_orient, pred_pose), dim=1))
+
+ dataset['init_kp3d'].append(init_kp3d)
+ dataset['init_pose'].append(init_pose.cpu())
+
+ # Flipped initial predictions
+ bbox_center[:, 0] = res_w - bbox_center[:, 0]
+ pred_global_orient, pred_pose, pred_shape, _ = extractor.model(torch.flip(init_patch, (3, )).cuda(), **kwargs)
+ pred_output = smpl_neutral.get_output(global_orient=pred_global_orient.cpu(),
+ body_pose=pred_pose.cpu(),
+ betas=pred_shape.cpu(),
+ pose2rot=False)
+ init_kp3d = pred_output.joints
+ init_pose = transforms.matrix_to_axis_angle(torch.cat((pred_global_orient, pred_pose), dim=1))
+
+ dataset['flipped_init_kp3d'].append(init_kp3d)
+ dataset['flipped_init_pose'].append(init_pose.cpu())
+
+ joblib.dump(dataset, save_pth)
+ logger.info(f'\n ==> Done !')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('-s', '--split', type=str, choices=['val', 'test'], help='Data split')
+ parser.add_argument('-b', '--batch_size', type=int, default=128, help='Data split')
+ args = parser.parse_args()
+
+ preprocess(args.split, args.batch_size)
\ No newline at end of file
diff --git a/lib/data_utils/threedpw_train_utils.py b/lib/data_utils/threedpw_train_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..637b0e22066f35d55ea57f7d4216e0b09e88db37
--- /dev/null
+++ b/lib/data_utils/threedpw_train_utils.py
@@ -0,0 +1,146 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import pickle
+import joblib
+import argparse
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from lib.models.smpl import SMPL
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.preproc.backbone.utils import process_image
+
+dataset = defaultdict(list)
+detection_results_dir = 'dataset/detection_results/3DPW'
+tcmr_annot_pth = 'dataset/parsed_data/TCMR_preproc/3dpw_train_db.pt'
+
+
+@torch.no_grad()
+def preprocess(batch_size):
+ device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
+ save_pth = osp.join(_C.PATHS.PARSED_DATA, f'3pdw_train_vit.pth') # Use ViT feature extractor
+ extractor = FeatureExtractor(device, flip_eval=True, max_batch_size=batch_size)
+
+ tcmr_data = joblib.load(tcmr_annot_pth)
+
+ annot_file_list, idxs = np.unique(tcmr_data['vid_name'], return_index=True)
+ idxs = idxs.tolist()
+ annot_file_list = [annot_file_list[idxs.index(idx)] for idx in sorted(idxs)]
+ annot_file_list = [osp.join(_C.PATHS.THREEDPW_PTH, 'sequenceFiles', 'train', annot_file[:-2] + '.pkl') for annot_file in annot_file_list]
+ annot_file_list = list(dict.fromkeys(annot_file_list))
+
+ vid_idx = 0
+ for annot_file in annot_file_list:
+ seq = annot_file.split('/')[-1].split('.')[0]
+
+ data = pickle.load(open(annot_file, 'rb'), encoding='latin1')
+
+ num_people = len(data['poses'])
+ num_frames = len(data['img_frame_ids'])
+ assert (data['poses2d'][0].shape[0] == num_frames)
+
+ K = torch.from_numpy(data['cam_intrinsics']).unsqueeze(0).float()
+
+ for p_id in range(num_people):
+
+ logger.info(f'==> {seq} {p_id}')
+ gender = {'m': 'male', 'f': 'female'}[data['genders'][p_id]]
+ smpl_gender = SMPL(model_path=_C.BMODEL.FLDR, gender=gender)
+
+ # ======== Add TCMR data ======== #
+ vid_name = f'{seq}_{p_id}'
+ tcmr_ids = [i for i, v in enumerate(tcmr_data['vid_name']) if vid_name in v]
+ frame_ids = tcmr_data['frame_id'][tcmr_ids]
+
+ pose = torch.from_numpy(data['poses'][p_id]).float()[frame_ids]
+ shape = torch.from_numpy(data['betas'][p_id][:10]).float().repeat(pose.size(0), 1)
+ trans = torch.from_numpy(data['trans'][p_id]).float()[frame_ids]
+ cam_poses = torch.from_numpy(data['cam_poses'][frame_ids]).float()
+
+ # ======== Align the mesh params ======== #
+ Rc = cam_poses[:, :3, :3]
+ Tc = cam_poses[:, :3, 3]
+ org_output = smpl_gender.get_output(betas=shape, body_pose=pose[:,3:], global_orient=pose[:,:3], transl=trans)
+ org_v0 = (org_output.vertices + org_output.offset.unsqueeze(1)).mean(1)
+ pose = torch.from_numpy(tcmr_data['pose'][tcmr_ids]).float()
+
+ output = smpl_gender.get_output(betas=shape, body_pose=pose[:,3:], global_orient=pose[:,:3])
+ v0 = (output.vertices + output.offset.unsqueeze(1)).mean(1)
+ trans = (Rc @ org_v0.reshape(-1, 3, 1)).reshape(-1, 3) + Tc - v0
+ j3d = output.joints + (output.offset + trans).unsqueeze(1)
+ j2d = torch.div(j3d, j3d[..., 2:])
+ kp2d = torch.matmul(K, j2d.transpose(-1, -2)).transpose(-1, -2)[..., :2]
+ # ======== Align the mesh params ======== #
+
+ # ======== Get detection results ======== #
+ fname = f'{seq}_{p_id}.npy'
+ pred_kp2d = torch.from_numpy(
+ np.load(osp.join(detection_results_dir, fname))
+ ).float()[frame_ids]
+ # ======== Get detection results ======== #
+
+ img_paths = sorted(glob(osp.join(_C.PATHS.THREEDPW_PTH, 'imageFiles', seq, '*.jpg')))
+ img_paths = [img_path for i, img_path in enumerate(img_paths) if i in frame_ids]
+ img = cv2.imread(img_paths[0]); res_h, res_w = img.shape[:2]
+ vid_idxs = torch.from_numpy(np.array([vid_idx] * len(img_paths)).astype(int))
+ vid_idx += 1
+
+ # ======== Append data ======== #
+ dataset['bbox'].append(torch.from_numpy(tcmr_data['bbox'][tcmr_ids].copy()).float())
+ dataset['res'].append(torch.tensor([[res_w, res_h]]).repeat(len(frame_ids), 1).float())
+ dataset['vid'].append(vid_idxs)
+ dataset['pose'].append(pose)
+ dataset['betas'].append(shape)
+ dataset['transl'].append(trans)
+ dataset['kp2d'].append(pred_kp2d)
+ dataset['joints3D'].append(j3d)
+ dataset['joints2D'].append(kp2d)
+ dataset['frame_id'].append(torch.from_numpy(frame_ids))
+ dataset['cam_poses'].append(cam_poses)
+ dataset['gender'].append(torch.tensor([['male','female'].index(gender)]).repeat(len(frame_ids)))
+ # ======== Append data ======== #
+
+ # ======== Extract features ======== #
+ patch_list = []
+ bboxes = dataset['bbox'][-1].clone().numpy()
+ bar = Bar(f'Load images', fill='#', max=len(img_paths))
+
+ for img_path, bbox in zip(img_paths, bboxes):
+ img_rgb = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)
+ norm_img, crop_img = process_image(img_rgb, bbox[:2], bbox[2] / 200, 256, 256)
+ patch_list.append(torch.from_numpy(norm_img).unsqueeze(0).float())
+ bar.next()
+
+ patch_list = torch.split(torch.cat(patch_list), batch_size)
+ features = []
+ for i, patch in enumerate(patch_list):
+ pred = extractor.model(patch.cuda(), encode=True)
+ features.append(pred.cpu())
+
+ features = torch.cat(features)
+ dataset['features'].append(features)
+ # ======== Extract features ======== #
+
+ for key in dataset.keys():
+ dataset[key] = torch.cat(dataset[key])
+
+ joblib.dump(dataset, save_pth)
+ logger.info(f'\n ==> Done !')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('-b', '--batch_size', type=int, default=128, help='Data split')
+ args = parser.parse_args()
+
+ preprocess(args.batch_size)
\ No newline at end of file
diff --git a/lib/eval/eval_utils.py b/lib/eval/eval_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..270491c09c942c3aa47dfc53224f050dea91353c
--- /dev/null
+++ b/lib/eval/eval_utils.py
@@ -0,0 +1,482 @@
+# Some functions are borrowed from https://github.com/akanazawa/human_dynamics/blob/master/src/evaluation/eval_util.py
+# Adhere to their licence to use these functions
+from pathlib import Path
+
+import torch
+import numpy as np
+from matplotlib import pyplot as plt
+
+
+def compute_accel(joints):
+ """
+ Computes acceleration of 3D joints.
+ Args:
+ joints (Nx25x3).
+ Returns:
+ Accelerations (N-2).
+ """
+ velocities = joints[1:] - joints[:-1]
+ acceleration = velocities[1:] - velocities[:-1]
+ acceleration_normed = np.linalg.norm(acceleration, axis=2)
+ return np.mean(acceleration_normed, axis=1)
+
+
+def compute_error_accel(joints_gt, joints_pred, vis=None):
+ """
+ Computes acceleration error:
+ 1/(n-2) \sum_{i=1}^{n-1} X_{i-1} - 2X_i + X_{i+1}
+ Note that for each frame that is not visible, three entries in the
+ acceleration error should be zero'd out.
+ Args:
+ joints_gt (Nx14x3).
+ joints_pred (Nx14x3).
+ vis (N).
+ Returns:
+ error_accel (N-2).
+ """
+ # (N-2)x14x3
+ accel_gt = joints_gt[:-2] - 2 * joints_gt[1:-1] + joints_gt[2:]
+ accel_pred = joints_pred[:-2] - 2 * joints_pred[1:-1] + joints_pred[2:]
+
+ normed = np.linalg.norm(accel_pred - accel_gt, axis=2)
+
+ if vis is None:
+ new_vis = np.ones(len(normed), dtype=bool)
+ else:
+ invis = np.logical_not(vis)
+ invis1 = np.roll(invis, -1)
+ invis2 = np.roll(invis, -2)
+ new_invis = np.logical_or(invis, np.logical_or(invis1, invis2))[:-2]
+ new_vis = np.logical_not(new_invis)
+
+ return np.mean(normed[new_vis], axis=1)
+
+
+def compute_error_verts(pred_verts, target_verts=None, target_theta=None):
+ """
+ Computes MPJPE over 6890 surface vertices.
+ Args:
+ verts_gt (Nx6890x3).
+ verts_pred (Nx6890x3).
+ Returns:
+ error_verts (N).
+ """
+
+ if target_verts is None:
+ from lib.models.smpl import SMPL_MODEL_DIR
+ from lib.models.smpl import SMPL
+ device = 'cpu'
+ smpl = SMPL(
+ SMPL_MODEL_DIR,
+ batch_size=1, # target_theta.shape[0],
+ ).to(device)
+
+ betas = torch.from_numpy(target_theta[:,75:]).to(device)
+ pose = torch.from_numpy(target_theta[:,3:75]).to(device)
+
+ target_verts = []
+ b_ = torch.split(betas, 5000)
+ p_ = torch.split(pose, 5000)
+
+ for b,p in zip(b_,p_):
+ output = smpl(betas=b, body_pose=p[:, 3:], global_orient=p[:, :3], pose2rot=True)
+ target_verts.append(output.vertices.detach().cpu().numpy())
+
+ target_verts = np.concatenate(target_verts, axis=0)
+
+ assert len(pred_verts) == len(target_verts)
+ error_per_vert = np.sqrt(np.sum((target_verts - pred_verts) ** 2, axis=2))
+ return np.mean(error_per_vert, axis=1)
+
+
+def compute_similarity_transform(S1, S2):
+ '''
+ Computes a similarity transform (sR, t) that takes
+ a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
+ where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
+ i.e. solves the orthogonal Procrutes problem.
+ '''
+ transposed = False
+ if S1.shape[0] != 3 and S1.shape[0] != 2:
+ S1 = S1.T
+ S2 = S2.T
+ transposed = True
+ assert(S2.shape[1] == S1.shape[1])
+
+ # 1. Remove mean.
+ mu1 = S1.mean(axis=1, keepdims=True)
+ mu2 = S2.mean(axis=1, keepdims=True)
+ X1 = S1 - mu1
+ X2 = S2 - mu2
+
+ # 2. Compute variance of X1 used for scale.
+ var1 = np.sum(X1**2)
+
+ # 3. The outer product of X1 and X2.
+ K = X1.dot(X2.T)
+
+ # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
+ # singular vectors of K.
+ U, s, Vh = np.linalg.svd(K)
+ V = Vh.T
+ # Construct Z that fixes the orientation of R to get det(R)=1.
+ Z = np.eye(U.shape[0])
+ Z[-1, -1] *= np.sign(np.linalg.det(U.dot(V.T)))
+ # Construct R.
+ R = V.dot(Z.dot(U.T))
+
+ # 5. Recover scale.
+ scale = np.trace(R.dot(K)) / var1
+
+ # 6. Recover translation.
+ t = mu2 - scale*(R.dot(mu1))
+
+ # 7. Error:
+ S1_hat = scale*R.dot(S1) + t
+
+ if transposed:
+ S1_hat = S1_hat.T
+
+ return S1_hat
+
+
+def compute_similarity_transform_torch(S1, S2):
+ '''
+ Computes a similarity transform (sR, t) that takes
+ a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
+ where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
+ i.e. solves the orthogonal Procrutes problem.
+ '''
+ transposed = False
+ if S1.shape[0] != 3 and S1.shape[0] != 2:
+ S1 = S1.T
+ S2 = S2.T
+ transposed = True
+ assert (S2.shape[1] == S1.shape[1])
+
+ # 1. Remove mean.
+ mu1 = S1.mean(axis=1, keepdims=True)
+ mu2 = S2.mean(axis=1, keepdims=True)
+ X1 = S1 - mu1
+ X2 = S2 - mu2
+
+ # print('X1', X1.shape)
+
+ # 2. Compute variance of X1 used for scale.
+ var1 = torch.sum(X1 ** 2)
+
+ # print('var', var1.shape)
+
+ # 3. The outer product of X1 and X2.
+ K = X1.mm(X2.T)
+
+ # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
+ # singular vectors of K.
+ U, s, V = torch.svd(K)
+ # V = Vh.T
+ # Construct Z that fixes the orientation of R to get det(R)=1.
+ Z = torch.eye(U.shape[0], device=S1.device)
+ Z[-1, -1] *= torch.sign(torch.det(U @ V.T))
+ # Construct R.
+ R = V.mm(Z.mm(U.T))
+
+ # print('R', X1.shape)
+
+ # 5. Recover scale.
+ scale = torch.trace(R.mm(K)) / var1
+ # print(R.shape, mu1.shape)
+ # 6. Recover translation.
+ t = mu2 - scale * (R.mm(mu1))
+ # print(t.shape)
+
+ # 7. Error:
+ S1_hat = scale * R.mm(S1) + t
+
+ if transposed:
+ S1_hat = S1_hat.T
+
+ return S1_hat
+
+
+def batch_compute_similarity_transform_torch(S1, S2):
+ '''
+ Computes a similarity transform (sR, t) that takes
+ a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
+ where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
+ i.e. solves the orthogonal Procrutes problem.
+ '''
+ transposed = False
+ if S1.shape[0] != 3 and S1.shape[0] != 2:
+ S1 = S1.permute(0,2,1)
+ S2 = S2.permute(0,2,1)
+ transposed = True
+ assert(S2.shape[1] == S1.shape[1])
+
+ # 1. Remove mean.
+ mu1 = S1.mean(axis=-1, keepdims=True)
+ mu2 = S2.mean(axis=-1, keepdims=True)
+
+ X1 = S1 - mu1
+ X2 = S2 - mu2
+
+ # 2. Compute variance of X1 used for scale.
+ var1 = torch.sum(X1**2, dim=1).sum(dim=1)
+
+ # 3. The outer product of X1 and X2.
+ K = X1.bmm(X2.permute(0,2,1))
+
+ # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
+ # singular vectors of K.
+ U, s, V = torch.svd(K)
+
+ # Construct Z that fixes the orientation of R to get det(R)=1.
+ Z = torch.eye(U.shape[1], device=S1.device).unsqueeze(0)
+ Z = Z.repeat(U.shape[0],1,1)
+ Z[:,-1, -1] *= torch.sign(torch.det(U.bmm(V.permute(0,2,1))))
+
+ # Construct R.
+ R = V.bmm(Z.bmm(U.permute(0,2,1)))
+
+ # 5. Recover scale.
+ scale = torch.cat([torch.trace(x).unsqueeze(0) for x in R.bmm(K)]) / var1
+
+ # 6. Recover translation.
+ t = mu2 - (scale.unsqueeze(-1).unsqueeze(-1) * (R.bmm(mu1)))
+
+ # 7. Error:
+ S1_hat = scale.unsqueeze(-1).unsqueeze(-1) * R.bmm(S1) + t
+
+ if transposed:
+ S1_hat = S1_hat.permute(0,2,1)
+
+ return S1_hat
+
+
+def align_by_pelvis(joints):
+ """
+ Assumes joints is 14 x 3 in LSP order.
+ Then hips are: [3, 2]
+ Takes mid point of these points, then subtracts it.
+ """
+
+ left_id = 2
+ right_id = 3
+
+ pelvis = (joints[left_id, :] + joints[right_id, :]) / 2.0
+ return joints - np.expand_dims(pelvis, axis=0)
+
+
+def compute_errors(gt3ds, preds):
+ """
+ Gets MPJPE after pelvis alignment + MPJPE after Procrustes.
+ Evaluates on the 14 common joints.
+ Inputs:
+ - gt3ds: N x 14 x 3
+ - preds: N x 14 x 3
+ """
+ errors, errors_pa = [], []
+ for i, (gt3d, pred) in enumerate(zip(gt3ds, preds)):
+ gt3d = gt3d.reshape(-1, 3)
+ # Root align.
+ gt3d = align_by_pelvis(gt3d)
+ pred3d = align_by_pelvis(pred)
+
+ joint_error = np.sqrt(np.sum((gt3d - pred3d)**2, axis=1))
+ errors.append(np.mean(joint_error))
+
+ # Get PA error.
+ pred3d_sym = compute_similarity_transform(pred3d, gt3d)
+ pa_error = np.sqrt(np.sum((gt3d - pred3d_sym)**2, axis=1))
+ errors_pa.append(np.mean(pa_error))
+
+ return errors, errors_pa
+
+
+def batch_align_by_pelvis(data_list, pelvis_idxs):
+ """
+ Assumes data is given as [pred_j3d, target_j3d, pred_verts, target_verts].
+ Each data is in shape of (frames, num_points, 3)
+ Pelvis is notated as one / two joints indices.
+ Align all data to the corresponding pelvis location.
+ """
+
+ pred_j3d, target_j3d, pred_verts, target_verts = data_list
+
+ pred_pelvis = pred_j3d[:, pelvis_idxs].mean(dim=1, keepdims=True).clone()
+ target_pelvis = target_j3d[:, pelvis_idxs].mean(dim=1, keepdims=True).clone()
+
+ # Align to the pelvis
+ pred_j3d = pred_j3d - pred_pelvis
+ target_j3d = target_j3d - target_pelvis
+ pred_verts = pred_verts - pred_pelvis
+ target_verts = target_verts - target_pelvis
+
+ return (pred_j3d, target_j3d, pred_verts, target_verts)
+
+def compute_jpe(S1, S2):
+ return torch.sqrt(((S1 - S2) ** 2).sum(dim=-1)).mean(dim=-1).numpy()
+
+
+# The functions below are borrowed from SLAHMR official implementation.
+# Reference: https://github.com/vye16/slahmr/blob/main/slahmr/eval/tools.py
+def global_align_joints(gt_joints, pred_joints):
+ """
+ :param gt_joints (T, J, 3)
+ :param pred_joints (T, J, 3)
+ """
+ s_glob, R_glob, t_glob = align_pcl(
+ gt_joints.reshape(-1, 3), pred_joints.reshape(-1, 3)
+ )
+ pred_glob = (
+ s_glob * torch.einsum("ij,tnj->tni", R_glob, pred_joints) + t_glob[None, None]
+ )
+ return pred_glob
+
+
+def first_align_joints(gt_joints, pred_joints):
+ """
+ align the first two frames
+ :param gt_joints (T, J, 3)
+ :param pred_joints (T, J, 3)
+ """
+ # (1, 1), (1, 3, 3), (1, 3)
+ s_first, R_first, t_first = align_pcl(
+ gt_joints[:2].reshape(1, -1, 3), pred_joints[:2].reshape(1, -1, 3)
+ )
+ pred_first = (
+ s_first * torch.einsum("tij,tnj->tni", R_first, pred_joints) + t_first[:, None]
+ )
+ return pred_first
+
+
+def local_align_joints(gt_joints, pred_joints):
+ """
+ :param gt_joints (T, J, 3)
+ :param pred_joints (T, J, 3)
+ """
+ s_loc, R_loc, t_loc = align_pcl(gt_joints, pred_joints)
+ pred_loc = (
+ s_loc[:, None] * torch.einsum("tij,tnj->tni", R_loc, pred_joints)
+ + t_loc[:, None]
+ )
+ return pred_loc
+
+
+def align_pcl(Y, X, weight=None, fixed_scale=False):
+ """align similarity transform to align X with Y using umeyama method
+ X' = s * R * X + t is aligned with Y
+ :param Y (*, N, 3) first trajectory
+ :param X (*, N, 3) second trajectory
+ :param weight (*, N, 1) optional weight of valid correspondences
+ :returns s (*, 1), R (*, 3, 3), t (*, 3)
+ """
+ *dims, N, _ = Y.shape
+ N = torch.ones(*dims, 1, 1) * N
+
+ if weight is not None:
+ Y = Y * weight
+ X = X * weight
+ N = weight.sum(dim=-2, keepdim=True) # (*, 1, 1)
+
+ # subtract mean
+ my = Y.sum(dim=-2) / N[..., 0] # (*, 3)
+ mx = X.sum(dim=-2) / N[..., 0]
+ y0 = Y - my[..., None, :] # (*, N, 3)
+ x0 = X - mx[..., None, :]
+
+ if weight is not None:
+ y0 = y0 * weight
+ x0 = x0 * weight
+
+ # correlation
+ C = torch.matmul(y0.transpose(-1, -2), x0) / N # (*, 3, 3)
+ U, D, Vh = torch.linalg.svd(C) # (*, 3, 3), (*, 3), (*, 3, 3)
+
+ S = torch.eye(3).reshape(*(1,) * (len(dims)), 3, 3).repeat(*dims, 1, 1)
+ neg = torch.det(U) * torch.det(Vh.transpose(-1, -2)) < 0
+ S[neg, 2, 2] = -1
+
+ R = torch.matmul(U, torch.matmul(S, Vh)) # (*, 3, 3)
+
+ D = torch.diag_embed(D) # (*, 3, 3)
+ if fixed_scale:
+ s = torch.ones(*dims, 1, device=Y.device, dtype=torch.float32)
+ else:
+ var = torch.sum(torch.square(x0), dim=(-1, -2), keepdim=True) / N # (*, 1, 1)
+ s = (
+ torch.diagonal(torch.matmul(D, S), dim1=-2, dim2=-1).sum(
+ dim=-1, keepdim=True
+ )
+ / var[..., 0]
+ ) # (*, 1)
+
+ t = my - s * torch.matmul(R, mx[..., None])[..., 0] # (*, 3)
+
+ return s, R, t
+
+
+def compute_foot_sliding(target_output, pred_output, masks, thr=1e-2):
+ """compute foot sliding error
+ The foot ground contact label is computed by the threshold of 1 cm/frame
+ Args:
+ target_output (SMPL ModelOutput).
+ pred_output (SMPL ModelOutput).
+ masks (N).
+ Returns:
+ error (N frames in contact).
+ """
+
+ # Foot vertices idxs
+ foot_idxs = [3216, 3387, 6617, 6787]
+
+ # Compute contact label
+ foot_loc = target_output.vertices[masks][:, foot_idxs]
+ foot_disp = (foot_loc[1:] - foot_loc[:-1]).norm(2, dim=-1)
+ contact = foot_disp[:] < thr
+
+ pred_feet_loc = pred_output.vertices[:, foot_idxs]
+ pred_disp = (pred_feet_loc[1:] - pred_feet_loc[:-1]).norm(2, dim=-1)
+
+ error = pred_disp[contact]
+
+ return error.cpu().numpy()
+
+
+def compute_jitter(pred_output, fps=30):
+ """compute jitter of the motion
+ Args:
+ pred_output (SMPL ModelOutput).
+ fps (float).
+ Returns:
+ jitter (N-3).
+ """
+
+ pred3d = pred_output.joints[:, :24]
+
+ pred_jitter = torch.norm(
+ (pred3d[3:] - 3 * pred3d[2:-1] + 3 * pred3d[1:-2] - pred3d[:-3]) * (fps**3),
+ dim=2,
+ ).mean(dim=-1)
+
+ return pred_jitter.cpu().numpy() / 10.0
+
+
+def compute_rte(target_trans, pred_trans):
+ # Compute the global alignment
+ _, rot, trans = align_pcl(target_trans[None, :], pred_trans[None, :], fixed_scale=True)
+ pred_trans_hat = (
+ torch.einsum("tij,tnj->tni", rot, pred_trans[None, :]) + trans[None, :]
+ )[0]
+
+ # Compute the entire displacement of ground truth trajectory
+ disps, disp = [], 0
+ for p1, p2 in zip(target_trans, target_trans[1:]):
+ delta = (p2 - p1).norm(2, dim=-1)
+ disp += delta
+ disps.append(disp)
+
+ # Compute absolute root-translation-error (RTE)
+ rte = torch.norm(target_trans - pred_trans_hat, 2, dim=-1)
+
+ # Normalize it to the displacement
+ return (rte / disp).numpy()
\ No newline at end of file
diff --git a/lib/eval/evaluate_3dpw.py b/lib/eval/evaluate_3dpw.py
new file mode 100644
index 0000000000000000000000000000000000000000..36736288f5c02d128d17fffc1598606ada009631
--- /dev/null
+++ b/lib/eval/evaluate_3dpw.py
@@ -0,0 +1,181 @@
+import os
+import time
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import torch
+import imageio
+import numpy as np
+from smplx import SMPL
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from configs.config import parse_args
+from lib.data.dataloader import setup_eval_dataloader
+from lib.models import build_network, build_body_model
+from lib.eval.eval_utils import (
+ compute_error_accel,
+ batch_align_by_pelvis,
+ batch_compute_similarity_transform_torch,
+)
+from lib.utils import transforms
+from lib.utils.utils import prepare_output_dir
+from lib.utils.utils import prepare_batch
+from lib.utils.imutils import avg_preds
+
+try:
+ from lib.vis.renderer import Renderer
+ _render = True
+except:
+ print("PyTorch3D is not properly installed! Cannot render the SMPL mesh")
+ _render = False
+
+
+m2mm = 1e3
+@torch.no_grad()
+def main(cfg, args):
+ torch.backends.cuda.matmul.allow_tf32 = False
+ torch.backends.cudnn.allow_tf32 = False
+
+ logger.info(f'GPU name -> {torch.cuda.get_device_name()}')
+ logger.info(f'GPU feat -> {torch.cuda.get_device_properties("cuda")}')
+
+ # ========= Dataloaders ========= #
+ eval_loader = setup_eval_dataloader(cfg, '3dpw', 'test', cfg.MODEL.BACKBONE)
+ logger.info(f'Dataset loaded')
+
+ # ========= Load WHAM ========= #
+ smpl_batch_size = cfg.TRAIN.BATCH_SIZE * cfg.DATASET.SEQLEN
+ smpl = build_body_model(cfg.DEVICE, smpl_batch_size)
+ network = build_network(cfg, smpl)
+ network.eval()
+
+ # Build SMPL models with each gender
+ smpl = {k: SMPL(_C.BMODEL.FLDR, gender=k).to(cfg.DEVICE) for k in ['male', 'female', 'neutral']}
+
+ # Load vertices -> joints regression matrix to evaluate
+ J_regressor_eval = torch.from_numpy(
+ np.load(_C.BMODEL.JOINTS_REGRESSOR_H36M)
+ )[_C.KEYPOINTS.H36M_TO_J14, :].unsqueeze(0).float().to(cfg.DEVICE)
+ pelvis_idxs = [2, 3]
+
+ accumulator = defaultdict(list)
+ bar = Bar('Inference', fill='#', max=len(eval_loader))
+ with torch.no_grad():
+ for i in range(len(eval_loader)):
+ # Original batch
+ batch = eval_loader.dataset.load_data(i, False)
+ x, inits, features, kwargs, gt = prepare_batch(batch, cfg.DEVICE, cfg.TRAIN.STAGE=='stage2')
+
+ if cfg.FLIP_EVAL:
+ flipped_batch = eval_loader.dataset.load_data(i, True)
+ f_x, f_inits, f_features, f_kwargs, _ = prepare_batch(flipped_batch, cfg.DEVICE, cfg.TRAIN.STAGE=='stage2')
+
+ # Forward pass with flipped input
+ flipped_pred = network(f_x, f_inits, f_features, **f_kwargs)
+
+ # Forward pass with normal input
+ pred = network(x, inits, features, **kwargs)
+
+ if cfg.FLIP_EVAL:
+ # Merge two predictions
+ flipped_pose, flipped_shape = flipped_pred['pose'].squeeze(0), flipped_pred['betas'].squeeze(0)
+ pose, shape = pred['pose'].squeeze(0), pred['betas'].squeeze(0)
+ flipped_pose, pose = flipped_pose.reshape(-1, 24, 6), pose.reshape(-1, 24, 6)
+ avg_pose, avg_shape = avg_preds(pose, shape, flipped_pose, flipped_shape)
+ avg_pose = avg_pose.reshape(-1, 144)
+
+ # Refine trajectory with merged prediction
+ network.pred_pose = avg_pose.view_as(network.pred_pose)
+ network.pred_shape = avg_shape.view_as(network.pred_shape)
+ pred = network.forward_smpl(**kwargs)
+
+ # <======= Build predicted SMPL
+ pred_output = smpl['neutral'](body_pose=pred['poses_body'],
+ global_orient=pred['poses_root_cam'],
+ betas=pred['betas'].squeeze(0),
+ pose2rot=False)
+ pred_verts = pred_output.vertices.cpu()
+ pred_j3d = torch.matmul(J_regressor_eval, pred_output.vertices).cpu()
+ # =======>
+
+ # <======= Build groundtruth SMPL
+ target_output = smpl[batch['gender']](
+ body_pose=transforms.rotation_6d_to_matrix(gt['pose'][0, :, 1:]),
+ global_orient=transforms.rotation_6d_to_matrix(gt['pose'][0, :, :1]),
+ betas=gt['betas'][0],
+ pose2rot=False)
+ target_verts = target_output.vertices.cpu()
+ target_j3d = torch.matmul(J_regressor_eval, target_output.vertices).cpu()
+ # =======>
+
+ # <======= Compute performance of the current sequence
+ pred_j3d, target_j3d, pred_verts, target_verts = batch_align_by_pelvis(
+ [pred_j3d, target_j3d, pred_verts, target_verts], pelvis_idxs
+ )
+ S1_hat = batch_compute_similarity_transform_torch(pred_j3d, target_j3d)
+ pa_mpjpe = torch.sqrt(((S1_hat - target_j3d) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ mpjpe = torch.sqrt(((pred_j3d - target_j3d) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ pve = torch.sqrt(((pred_verts - target_verts) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ accel = compute_error_accel(joints_pred=pred_j3d, joints_gt=target_j3d)[1:-1]
+ accel = accel * (30 ** 2) # per frame^s to per s^2
+ # =======>
+
+ summary_string = f'{batch["vid"]} | PA-MPJPE: {pa_mpjpe.mean():.1f} MPJPE: {mpjpe.mean():.1f} PVE: {pve.mean():.1f}'
+ bar.suffix = summary_string
+ bar.next()
+
+ # <======= Accumulate the results over entire sequences
+ accumulator['pa_mpjpe'].append(pa_mpjpe)
+ accumulator['mpjpe'].append(mpjpe)
+ accumulator['pve'].append(pve)
+ accumulator['accel'].append(accel)
+ # =======>
+
+ # <======= (Optional) Render the prediction
+ if not (_render and args.render):
+ # Skip if PyTorch3D is not installed or rendering argument is not parsed.
+ continue
+
+ # Save path
+ viz_pth = osp.join('output', 'visualization')
+ os.makedirs(viz_pth, exist_ok=True)
+
+ # Build Renderer
+ width, height = batch['cam_intrinsics'][0][0, :2, -1].numpy() * 2
+ focal_length = batch['cam_intrinsics'][0][0, 0, 0].item()
+ renderer = Renderer(width, height, focal_length, cfg.DEVICE, smpl['neutral'].faces)
+
+ # Get images and writer
+ frame_list = batch['frame_id'][0].numpy()
+ imname_list = sorted(glob(osp.join(_C.PATHS.THREEDPW_PTH, 'imageFiles', batch['vid'][:-2], '*.jpg')))
+ writer = imageio.get_writer(osp.join(viz_pth, batch['vid'] + '.mp4'),
+ mode='I', format='FFMPEG', fps=30, macro_block_size=1)
+
+ # Skip the invalid frames
+ for i, frame in enumerate(frame_list):
+ image = imageio.imread(imname_list[frame])
+
+ # NOTE: pred['verts'] is different from pred_verts as we substracted offset from SMPL mesh.
+ # Check line 121 in lib/models/smpl.py
+ vertices = pred['verts_cam'][i] + pred['trans_cam'][[i]]
+ image = renderer.render_mesh(vertices, image)
+ writer.append_data(image)
+ writer.close()
+ # =======>
+
+ for k, v in accumulator.items():
+ accumulator[k] = np.concatenate(v).mean()
+
+ print('')
+ log_str = 'Evaluation on 3DPW, '
+ log_str += ' '.join([f'{k.upper()}: {v:.4f},'for k,v in accumulator.items()])
+ logger.info(log_str)
+
+if __name__ == '__main__':
+ cfg, cfg_file, args = parse_args(test=True)
+ cfg = prepare_output_dir(cfg, cfg_file)
+
+ main(cfg, args)
\ No newline at end of file
diff --git a/lib/eval/evaluate_emdb.py b/lib/eval/evaluate_emdb.py
new file mode 100644
index 0000000000000000000000000000000000000000..09a0b7be239c6d13f03dd3e00a621aa9245e3da8
--- /dev/null
+++ b/lib/eval/evaluate_emdb.py
@@ -0,0 +1,228 @@
+import os
+import time
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import torch
+import pickle
+import numpy as np
+from smplx import SMPL
+from loguru import logger
+from progress.bar import Bar
+
+from configs import constants as _C
+from configs.config import parse_args
+from lib.data.dataloader import setup_eval_dataloader
+from lib.models import build_network, build_body_model
+from lib.eval.eval_utils import (
+ compute_jpe,
+ compute_rte,
+ compute_jitter,
+ compute_error_accel,
+ compute_foot_sliding,
+ batch_align_by_pelvis,
+ first_align_joints,
+ global_align_joints,
+ compute_rte,
+ compute_jitter,
+ compute_foot_sliding
+ batch_compute_similarity_transform_torch,
+)
+from lib.utils import transforms
+from lib.utils.utils import prepare_output_dir
+from lib.utils.utils import prepare_batch
+from lib.utils.imutils import avg_preds
+
+"""
+This is a tentative script to evaluate WHAM on EMDB dataset.
+Current implementation requires EMDB dataset downloaded at ./datasets/EMDB/
+"""
+
+m2mm = 1e3
+@torch.no_grad()
+def main(cfg, args):
+ torch.backends.cuda.matmul.allow_tf32 = False
+ torch.backends.cudnn.allow_tf32 = False
+
+ logger.info(f'GPU name -> {torch.cuda.get_device_name()}')
+ logger.info(f'GPU feat -> {torch.cuda.get_device_properties("cuda")}')
+
+ # ========= Dataloaders ========= #
+ eval_loader = setup_eval_dataloader(cfg, 'emdb', args.eval_split, cfg.MODEL.BACKBONE)
+ logger.info(f'Dataset loaded')
+
+ # ========= Load WHAM ========= #
+ smpl_batch_size = cfg.TRAIN.BATCH_SIZE * cfg.DATASET.SEQLEN
+ smpl = build_body_model(cfg.DEVICE, smpl_batch_size)
+ network = build_network(cfg, smpl)
+ network.eval()
+
+ # Build SMPL models with each gender
+ smpl = {k: SMPL(_C.BMODEL.FLDR, gender=k).to(cfg.DEVICE) for k in ['male', 'female', 'neutral']}
+
+ # Load vertices -> joints regression matrix to evaluate
+ pelvis_idxs = [1, 2]
+
+ # WHAM uses Y-down coordinate system, while EMDB dataset uses Y-up one.
+ yup2ydown = transforms.axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float().to(cfg.DEVICE)
+
+ # To torch tensor function
+ tt = lambda x: torch.from_numpy(x).float().to(cfg.DEVICE)
+ accumulator = defaultdict(list)
+ bar = Bar('Inference', fill='#', max=len(eval_loader))
+ with torch.no_grad():
+ for i in range(len(eval_loader)):
+ # Original batch
+ batch = eval_loader.dataset.load_data(i, False)
+ x, inits, features, kwargs, gt = prepare_batch(batch, cfg.DEVICE, cfg.TRAIN.STAGE == 'stage2')
+
+ # Align with groundtruth data to the first frame
+ cam2yup = batch['R'][0][:1].to(cfg.DEVICE)
+ cam2ydown = cam2yup @ yup2ydown
+ cam2root = transforms.rotation_6d_to_matrix(inits[1][:, 0, 0])
+ ydown2root = cam2ydown.mT @ cam2root
+ ydown2root = transforms.matrix_to_rotation_6d(ydown2root)
+ kwargs['init_root'][:, 0] = ydown2root
+
+ if cfg.FLIP_EVAL:
+ flipped_batch = eval_loader.dataset.load_data(i, True)
+ f_x, f_inits, f_features, f_kwargs, _ = prepare_batch(flipped_batch, cfg.DEVICE, cfg.TRAIN.STAGE == 'stage2')
+
+ # Forward pass with flipped input
+ flipped_pred = network(f_x, f_inits, f_features, **f_kwargs)
+
+ # Forward pass with normal input
+ pred = network(x, inits, features, **kwargs)
+
+ if cfg.FLIP_EVAL:
+ # Merge two predictions
+ flipped_pose, flipped_shape = flipped_pred['pose'].squeeze(0), flipped_pred['betas'].squeeze(0)
+ pose, shape = pred['pose'].squeeze(0), pred['betas'].squeeze(0)
+ flipped_pose, pose = flipped_pose.reshape(-1, 24, 6), pose.reshape(-1, 24, 6)
+ avg_pose, avg_shape = avg_preds(pose, shape, flipped_pose, flipped_shape)
+ avg_pose = avg_pose.reshape(-1, 144)
+ avg_contact = (flipped_pred['contact'][..., [2, 3, 0, 1]] + pred['contact']) / 2
+
+ # Refine trajectory with merged prediction
+ network.pred_pose = avg_pose.view_as(network.pred_pose)
+ network.pred_shape = avg_shape.view_as(network.pred_shape)
+ network.pred_contact = avg_contact.view_as(network.pred_contact)
+ output = network.forward_smpl(**kwargs)
+ pred = network.refine_trajectory(output, return_y_up=True, **kwargs)
+
+ # <======= Prepare groundtruth data
+ subj, seq = batch['vid'][:2], batch['vid'][3:]
+ annot_pth = glob(osp.join(_C.PATHS.EMDB_PTH, subj, seq, '*_data.pkl'))[0]
+ annot = pickle.load(open(annot_pth, 'rb'))
+
+ masks = annot['good_frames_mask']
+ gender = annot['gender']
+ poses_body = annot["smpl"]["poses_body"]
+ poses_root = annot["smpl"]["poses_root"]
+ betas = np.repeat(annot["smpl"]["betas"].reshape((1, -1)), repeats=annot["n_frames"], axis=0)
+ trans = annot["smpl"]["trans"]
+ extrinsics = annot["camera"]["extrinsics"]
+
+ # # Map to camear coordinate
+ poses_root_cam = transforms.matrix_to_axis_angle(tt(extrinsics[:, :3, :3]) @ transforms.axis_angle_to_matrix(tt(poses_root)))
+
+ # Groundtruth global motion
+ target_glob = smpl[gender](body_pose=tt(poses_body), global_orient=tt(poses_root), betas=tt(betas), transl=tt(trans))
+ target_j3d_glob = target_glob.joints[:, :24][masks]
+
+ # Groundtruth local motion
+ target_cam = smpl[gender](body_pose=tt(poses_body), global_orient=poses_root_cam, betas=tt(betas))
+ target_verts_cam = target_cam.vertices[masks]
+ target_j3d_cam = target_cam.joints[:, :24][masks]
+ # =======>
+
+ # Convert WHAM global orient to Y-up coordinate
+ poses_root = pred['poses_root_world'].squeeze(0)
+ pred_trans = pred['trans_world'].squeeze(0)
+ poses_root = yup2ydown.mT @ poses_root
+ pred_trans = (yup2ydown.mT @ pred_trans.unsqueeze(-1)).squeeze(-1)
+
+ # <======= Build predicted motion
+ # Predicted global motion
+ pred_glob = smpl['neutral'](body_pose=pred['poses_body'], global_orient=poses_root.unsqueeze(1), betas=pred['betas'].squeeze(0), transl=pred_trans, pose2rot=False)
+ pred_j3d_glob = pred_glob.joints[:, :24]
+
+ # Predicted local motion
+ pred_cam = smpl['neutral'](body_pose=pred['poses_body'], global_orient=pred['poses_root_cam'], betas=pred['betas'].squeeze(0), pose2rot=False)
+ pred_verts_cam = pred_cam.vertices
+ pred_j3d_cam = pred_cam.joints[:, :24]
+ # =======>
+
+ # <======= Evaluation on the local motion
+ pred_j3d_cam, target_j3d_cam, pred_verts_cam, target_verts_cam = batch_align_by_pelvis(
+ [pred_j3d_cam, target_j3d_cam, pred_verts_cam, target_verts_cam], pelvis_idxs
+ )
+ S1_hat = batch_compute_similarity_transform_torch(pred_j3d_cam, target_j3d_cam)
+ pa_mpjpe = torch.sqrt(((S1_hat - target_j3d_cam) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy() * m2mm
+ mpjpe = torch.sqrt(((pred_j3d_cam - target_j3d_cam) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy() * m2mm
+ pve = torch.sqrt(((pred_verts_cam - target_verts_cam) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy() * m2mm
+ accel = compute_error_accel(joints_pred=pred_j3d_cam.cpu(), joints_gt=target_j3d_cam.cpu())[1:-1]
+ accel = accel * (30 ** 2) # per frame^s to per s^2
+
+ summary_string = f'{batch["vid"]} | PA-MPJPE: {pa_mpjpe.mean():.1f} MPJPE: {mpjpe.mean():.1f} PVE: {pve.mean():.1f}'
+ bar.suffix = summary_string
+ bar.next()
+ # =======>
+
+ # <======= Evaluation on the global motion
+ chunk_length = 100
+ w_mpjpe, wa_mpjpe = [], []
+ for start in range(0, masks.sum(), chunk_length):
+ end = min(masks.sum(), start + chunk_length)
+
+ target_j3d = target_j3d_glob[start:end].clone().cpu()
+ pred_j3d = pred_j3d_glob[start:end].clone().cpu()
+
+ w_j3d = first_align_joints(target_j3d, pred_j3d)
+ wa_j3d = global_align_joints(target_j3d, pred_j3d)
+
+ w_jpe = compute_jpe(target_j3d, w_j3d)
+ wa_jpe = compute_jpe(target_j3d, wa_j3d)
+ w_mpjpe.append(w_jpe)
+ wa_mpjpe.append(wa_jpe)
+
+ w_mpjpe = np.concatenate(w_mpjpe) * m2mm
+ wa_mpjpe = np.concatenate(wa_mpjpe) * m2mm
+
+ # Additional metrics
+ rte = compute_rte(torch.from_numpy(trans[masks]), pred_trans.cpu()) * 1e2
+ jitter = compute_jitter(pred_glob, fps=30)
+ foot_sliding = compute_foot_sliding(target_glob, pred_glob, masks) * m2mm
+ # =======>
+
+ # Additional metrics
+ rte = compute_rte(torch.from_numpy(trans[masks]), pred_trans.cpu()) * 1e2
+ jitter = compute_jitter(pred_glob, fps=30)
+ foot_sliding = compute_foot_sliding(target_glob, pred_glob, masks) * m2mm
+
+ # <======= Accumulate the results over entire sequences
+ accumulator['pa_mpjpe'].append(pa_mpjpe)
+ accumulator['mpjpe'].append(mpjpe)
+ accumulator['pve'].append(pve)
+ accumulator['accel'].append(accel)
+ accumulator['wa_mpjpe'].append(wa_mpjpe)
+ accumulator['w_mpjpe'].append(w_mpjpe)
+ accumulator['RTE'].append(rte)
+ accumulator['jitter'].append(jitter)
+ accumulator['FS'].append(foot_sliding)
+ # =======>
+
+ for k, v in accumulator.items():
+ accumulator[k] = np.concatenate(v).mean()
+
+ print('')
+ log_str = f'Evaluation on EMDB {args.eval_split}, '
+ log_str += ' '.join([f'{k.upper()}: {v:.4f},'for k,v in accumulator.items()])
+ logger.info(log_str)
+
+if __name__ == '__main__':
+ cfg, cfg_file, args = parse_args(test=True)
+ cfg = prepare_output_dir(cfg, cfg_file)
+
+ main(cfg, args)
\ No newline at end of file
diff --git a/lib/eval/evaluate_rich.py b/lib/eval/evaluate_rich.py
new file mode 100644
index 0000000000000000000000000000000000000000..5504ca6862dd73b27a5e88420699862e507f9ad9
--- /dev/null
+++ b/lib/eval/evaluate_rich.py
@@ -0,0 +1,156 @@
+import os
+import os.path as osp
+from collections import defaultdict
+from time import time
+
+import torch
+import joblib
+import numpy as np
+from loguru import logger
+from smplx import SMPL, SMPLX
+from progress.bar import Bar
+
+from configs import constants as _C
+from configs.config import parse_args
+from lib.data.dataloader import setup_eval_dataloader
+from lib.models import build_network, build_body_model
+from lib.eval.eval_utils import (
+ compute_error_accel,
+ batch_align_by_pelvis,
+ batch_compute_similarity_transform_torch,
+)
+from lib.utils import transforms
+from lib.utils.utils import prepare_output_dir
+from lib.utils.utils import prepare_batch
+from lib.utils.imutils import avg_preds
+
+m2mm = 1e3
+smplx2smpl = torch.from_numpy(joblib.load(_C.BMODEL.SMPLX2SMPL)['matrix']).unsqueeze(0).float().cuda()
+@torch.no_grad()
+def main(cfg, args):
+ torch.backends.cuda.matmul.allow_tf32 = False
+ torch.backends.cudnn.allow_tf32 = False
+
+ logger.info(f'GPU name -> {torch.cuda.get_device_name()}')
+ logger.info(f'GPU feat -> {torch.cuda.get_device_properties("cuda")}')
+
+ # ========= Dataloaders ========= #
+ eval_loader = setup_eval_dataloader(cfg, 'rich', 'test', cfg.MODEL.BACKBONE)
+ logger.info(f'Dataset loaded')
+
+ # ========= Load WHAM ========= #
+ smpl_batch_size = cfg.TRAIN.BATCH_SIZE * cfg.DATASET.SEQLEN
+ smpl = build_body_model(cfg.DEVICE, smpl_batch_size)
+ network = build_network(cfg, smpl)
+ network.eval()
+
+ # Build neutral SMPL model for WHAM and gendered SMPLX models for the groundtruth data
+ smpl = SMPL(_C.BMODEL.FLDR, gender='neutral').to(cfg.DEVICE)
+
+ # Load vertices -> joints regression matrix to evaluate
+ J_regressor_eval = smpl.J_regressor.clone().unsqueeze(0)
+ pelvis_idxs = [1, 2]
+
+ accumulator = defaultdict(list)
+ bar = Bar('Inference', fill='#', max=len(eval_loader))
+ with torch.no_grad():
+ for i in range(len(eval_loader)):
+ time_dict = {}
+ _t = time()
+
+ # Original batch
+ batch = eval_loader.dataset.load_data(i, False)
+ x, inits, features, kwargs, gt = prepare_batch(batch, cfg.DEVICE, cfg.TRAIN.STAGE=='stage2')
+
+ # <======= Inference
+ if cfg.FLIP_EVAL:
+ flipped_batch = eval_loader.dataset.load_data(i, True)
+ f_x, f_inits, f_features, f_kwargs, _ = prepare_batch(flipped_batch, cfg.DEVICE, cfg.TRAIN.STAGE=='stage2')
+
+ # Forward pass with flipped input
+ flipped_pred = network(f_x, f_inits, f_features, **f_kwargs)
+ time_dict['inference_flipped'] = time() - _t; _t = time()
+
+ # Forward pass with normal input
+ pred = network(x, inits, features, **kwargs)
+ time_dict['inference'] = time() - _t; _t = time()
+
+ if cfg.FLIP_EVAL:
+ # Merge two predictions
+ flipped_pose, flipped_shape = flipped_pred['pose'].squeeze(0), flipped_pred['betas'].squeeze(0)
+ pose, shape = pred['pose'].squeeze(0), pred['betas'].squeeze(0)
+ flipped_pose, pose = flipped_pose.reshape(-1, 24, 6), pose.reshape(-1, 24, 6)
+ avg_pose, avg_shape = avg_preds(pose, shape, flipped_pose, flipped_shape)
+ avg_pose = avg_pose.reshape(-1, 144)
+
+ # Refine trajectory with merged prediction
+ network.pred_pose = avg_pose.view_as(network.pred_pose)
+ network.pred_shape = avg_shape.view_as(network.pred_shape)
+ pred = network.forward_smpl(**kwargs)
+ time_dict['averaging'] = time() - _t; _t = time()
+ # =======>
+
+ # <======= Build predicted SMPL
+ pred_output = smpl(body_pose=pred['poses_body'],
+ global_orient=pred['poses_root_cam'],
+ betas=pred['betas'].squeeze(0),
+ pose2rot=False)
+ pred_verts = pred_output.vertices.cpu()
+ pred_j3d = torch.matmul(J_regressor_eval, pred_output.vertices).cpu()
+ time_dict['building prediction'] = time() - _t; _t = time()
+ # =======>
+
+ # <======= Build groundtruth SMPL (from SMPLX)
+ smplx = SMPLX(_C.BMODEL.FLDR.replace('smpl', 'smplx'),
+ gender=batch['gender'],
+ batch_size=len(pred_verts)
+ ).to(cfg.DEVICE)
+ gt_pose = transforms.matrix_to_axis_angle(transforms.rotation_6d_to_matrix(gt['pose'][0]))
+ target_output = smplx(
+ body_pose=gt_pose[:, 1:-2].reshape(-1, 63),
+ global_orient=gt_pose[:, 0],
+ betas=gt['betas'][0])
+ target_verts = torch.matmul(smplx2smpl, target_output.vertices.cuda()).cpu()
+ target_j3d = torch.matmul(J_regressor_eval, target_verts.to(cfg.DEVICE)).cpu()
+ time_dict['building target'] = time() - _t; _t = time()
+ # =======>
+
+ # <======= Compute performance of the current sequence
+ pred_j3d, target_j3d, pred_verts, target_verts = batch_align_by_pelvis(
+ [pred_j3d, target_j3d, pred_verts, target_verts], pelvis_idxs
+ )
+ S1_hat = batch_compute_similarity_transform_torch(pred_j3d, target_j3d)
+ pa_mpjpe = torch.sqrt(((S1_hat - target_j3d) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ mpjpe = torch.sqrt(((pred_j3d - target_j3d) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ pve = torch.sqrt(((pred_verts - target_verts) ** 2).sum(dim=-1)).mean(dim=-1).numpy() * m2mm
+ accel = compute_error_accel(joints_pred=pred_j3d, joints_gt=target_j3d)[1:-1]
+ accel = accel * (30 ** 2) # per frame^s to per s^2
+ time_dict['evaluating'] = time() - _t; _t = time()
+ # =======>
+
+ # summary_string = f'{batch["vid"]} | PA-MPJPE: {pa_mpjpe.mean():.1f} MPJPE: {mpjpe.mean():.1f} PVE: {pve.mean():.1f}'
+ summary_string = f'{batch["vid"]} | ' + ' '.join([f'{k}: {v:.1f} s' for k, v in time_dict.items()])
+ bar.suffix = summary_string
+ bar.next()
+
+ # <======= Accumulate the results over entire sequences
+ accumulator['pa_mpjpe'].append(pa_mpjpe)
+ accumulator['mpjpe'].append(mpjpe)
+ accumulator['pve'].append(pve)
+ accumulator['accel'].append(accel)
+
+ # =======>
+
+ for k, v in accumulator.items():
+ accumulator[k] = np.concatenate(v).mean()
+
+ print('')
+ log_str = 'Evaluation on RICH, '
+ log_str += ' '.join([f'{k.upper()}: {v:.4f},'for k,v in accumulator.items()])
+ logger.info(log_str)
+
+if __name__ == '__main__':
+ cfg, cfg_file, args = parse_args(test=True)
+ cfg = prepare_output_dir(cfg, cfg_file)
+
+ main(cfg, args)
\ No newline at end of file
diff --git a/lib/models/__init__.py b/lib/models/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..d203b287a2024e1d1d5c0ccf9846f59ca87efe55
--- /dev/null
+++ b/lib/models/__init__.py
@@ -0,0 +1,40 @@
+import os, sys
+import yaml
+import torch
+from loguru import logger
+
+from configs import constants as _C
+from .smpl import SMPL
+
+
+def build_body_model(device, batch_size=1, gender='neutral', **kwargs):
+ sys.stdout = open(os.devnull, 'w')
+ body_model = SMPL(
+ model_path=_C.BMODEL.FLDR,
+ gender=gender,
+ batch_size=batch_size,
+ create_transl=False).to(device)
+ sys.stdout = sys.__stdout__
+ return body_model
+
+
+def build_network(cfg, smpl):
+ from .wham import Network
+
+ with open(cfg.MODEL_CONFIG, 'r') as f:
+ model_config = yaml.safe_load(f)
+ model_config.update({'d_feat': _C.IMG_FEAT_DIM[cfg.MODEL.BACKBONE]})
+
+ network = Network(smpl, **model_config).to(cfg.DEVICE)
+
+ # Load Checkpoint
+ if os.path.isfile(cfg.TRAIN.CHECKPOINT):
+ checkpoint = torch.load(cfg.TRAIN.CHECKPOINT)
+ ignore_keys = ['smpl.body_pose', 'smpl.betas', 'smpl.global_orient', 'smpl.J_regressor_extra', 'smpl.J_regressor_eval']
+ model_state_dict = {k: v for k, v in checkpoint['model'].items() if k not in ignore_keys}
+ network.load_state_dict(model_state_dict, strict=False)
+ logger.info(f"=> loaded checkpoint '{cfg.TRAIN.CHECKPOINT}' ")
+ else:
+ logger.info(f"=> Warning! no checkpoint found at '{cfg.TRAIN.CHECKPOINT}'.")
+
+ return network
\ No newline at end of file
diff --git a/lib/models/__pycache__/__init__.cpython-39.pyc b/lib/models/__pycache__/__init__.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..6448bf47a6483675d1066ee310b79de26e5a1a65
Binary files /dev/null and b/lib/models/__pycache__/__init__.cpython-39.pyc differ
diff --git a/lib/models/__pycache__/smpl.cpython-39.pyc b/lib/models/__pycache__/smpl.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2795edbd166325aa0584f5a5bc89e048a22a3ace
Binary files /dev/null and b/lib/models/__pycache__/smpl.cpython-39.pyc differ
diff --git a/lib/models/__pycache__/wham.cpython-39.pyc b/lib/models/__pycache__/wham.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..0825def69bd50c6e3b95bdf4577e0b2fd1f9b437
Binary files /dev/null and b/lib/models/__pycache__/wham.cpython-39.pyc differ
diff --git a/lib/models/layers/__init__.py b/lib/models/layers/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..9bcd2ab8be0f475741e9cec1d896930fd1c40021
--- /dev/null
+++ b/lib/models/layers/__init__.py
@@ -0,0 +1,2 @@
+from .modules import MotionEncoder, MotionDecoder, TrajectoryDecoder, TrajectoryRefiner, Integrator
+from .utils import rollout_global_motion, compute_camera_pose, reset_root_velocity, compute_camera_motion
\ No newline at end of file
diff --git a/lib/models/layers/__pycache__/__init__.cpython-39.pyc b/lib/models/layers/__pycache__/__init__.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..cb6793839e899cd00867dd79278b82420a3a74f9
Binary files /dev/null and b/lib/models/layers/__pycache__/__init__.cpython-39.pyc differ
diff --git a/lib/models/layers/__pycache__/modules.cpython-39.pyc b/lib/models/layers/__pycache__/modules.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4c9734807e8a2080b4ff90378dc5c98ab8604423
Binary files /dev/null and b/lib/models/layers/__pycache__/modules.cpython-39.pyc differ
diff --git a/lib/models/layers/__pycache__/utils.cpython-39.pyc b/lib/models/layers/__pycache__/utils.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fc2b15810eafd2e56618bfb56a0b0958f520d44c
Binary files /dev/null and b/lib/models/layers/__pycache__/utils.cpython-39.pyc differ
diff --git a/lib/models/layers/modules.py b/lib/models/layers/modules.py
new file mode 100644
index 0000000000000000000000000000000000000000..b0f4aca22175f3676a204564a51fc9dd02cf434e
--- /dev/null
+++ b/lib/models/layers/modules.py
@@ -0,0 +1,262 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import numpy as np
+from torch import nn
+from configs import constants as _C
+from .utils import rollout_global_motion
+from lib.utils.transforms import axis_angle_to_matrix
+
+
+class Regressor(nn.Module):
+ def __init__(self, in_dim, hid_dim, out_dims, init_dim, layer='LSTM', n_layers=2, n_iters=1):
+ super().__init__()
+ self.n_outs = len(out_dims)
+
+ self.rnn = getattr(nn, layer.upper())(
+ in_dim + init_dim, hid_dim, n_layers,
+ bidirectional=False, batch_first=True, dropout=0.3)
+
+ for i, out_dim in enumerate(out_dims):
+ setattr(self, 'declayer%d'%i, nn.Linear(hid_dim, out_dim))
+ nn.init.xavier_uniform_(getattr(self, 'declayer%d'%i).weight, gain=0.01)
+
+ def forward(self, x, inits, h0):
+ xc = torch.cat([x, *inits], dim=-1)
+ xc, h0 = self.rnn(xc, h0)
+
+ preds = []
+ for j in range(self.n_outs):
+ out = getattr(self, 'declayer%d'%j)(xc)
+ preds.append(out)
+
+ return preds, xc, h0
+
+
+class NeuralInitialization(nn.Module):
+ def __init__(self, in_dim, hid_dim, layer, n_layers):
+ super().__init__()
+
+ out_dim = hid_dim
+ self.n_layers = n_layers
+ self.num_inits = int(layer.upper() == 'LSTM') + 1
+ out_dim *= self.num_inits * n_layers
+
+ self.linear1 = nn.Linear(in_dim, hid_dim)
+ self.linear2 = nn.Linear(hid_dim, hid_dim * self.n_layers)
+ self.linear3 = nn.Linear(hid_dim * self.n_layers, out_dim)
+ self.relu1 = nn.ReLU()
+ self.relu2 = nn.ReLU()
+
+ def forward(self, x):
+ b = x.shape[0]
+
+ out = self.linear3(self.relu2(self.linear2(self.relu1(self.linear1(x)))))
+ out = out.view(b, self.num_inits, self.n_layers, -1).permute(1, 2, 0, 3).contiguous()
+
+ if self.num_inits == 2:
+ return tuple([_ for _ in out])
+ return out[0]
+
+
+class Integrator(nn.Module):
+ def __init__(self, in_channel, out_channel, hid_channel=1024):
+ super().__init__()
+
+ self.layer1 = nn.Linear(in_channel, hid_channel)
+ self.relu1 = nn.ReLU()
+ self.dr1 = nn.Dropout(0.1)
+
+ self.layer2 = nn.Linear(hid_channel, hid_channel)
+ self.relu2 = nn.ReLU()
+ self.dr2 = nn.Dropout(0.1)
+
+ self.layer3 = nn.Linear(hid_channel, out_channel)
+
+
+ def forward(self, x, feat):
+ res = x
+ mask = (feat != 0).all(dim=-1).all(dim=-1)
+
+ out = torch.cat((x, feat), dim=-1)
+ out = self.layer1(out)
+ out = self.relu1(out)
+ out = self.dr1(out)
+
+ out = self.layer2(out)
+ out = self.relu2(out)
+ out = self.dr2(out)
+
+ out = self.layer3(out)
+ out[mask] = out[mask] + res[mask]
+
+ return out
+
+
+class MotionEncoder(nn.Module):
+ def __init__(self,
+ in_dim,
+ d_embed,
+ pose_dr,
+ rnn_type,
+ n_layers,
+ n_joints):
+ super().__init__()
+
+ self.n_joints = n_joints
+
+ self.embed_layer = nn.Linear(in_dim, d_embed)
+ self.pos_drop = nn.Dropout(pose_dr)
+
+ # Keypoints initializer
+ self.neural_init = NeuralInitialization(n_joints * 3 + in_dim, d_embed, rnn_type, n_layers)
+
+ # 3d keypoints regressor
+ self.regressor = Regressor(
+ d_embed, d_embed, [n_joints * 3], n_joints * 3, rnn_type, n_layers)
+
+ def forward(self, x, init):
+ """ Forward pass of motion encoder.
+ """
+
+ self.b, self.f = x.shape[:2]
+ x = self.embed_layer(x.reshape(self.b, self.f, -1))
+ x = self.pos_drop(x)
+
+ h0 = self.neural_init(init)
+ pred_list = [init[..., :self.n_joints * 3]]
+ motion_context_list = []
+
+ for i in range(self.f):
+ (pred_kp3d, ), motion_context, h0 = self.regressor(x[:, [i]], pred_list[-1:], h0)
+ motion_context_list.append(motion_context)
+ pred_list.append(pred_kp3d)
+
+ pred_kp3d = torch.cat(pred_list[1:], dim=1).view(self.b, self.f, -1, 3)
+ motion_context = torch.cat(motion_context_list, dim=1)
+
+ # Merge 3D keypoints with motion context
+ motion_context = torch.cat((motion_context, pred_kp3d.reshape(self.b, self.f, -1)), dim=-1)
+ return pred_kp3d, motion_context
+
+
+class TrajectoryDecoder(nn.Module):
+ def __init__(self,
+ d_embed,
+ rnn_type,
+ n_layers):
+ super().__init__()
+
+ # Trajectory regressor
+ self.regressor = Regressor(
+ d_embed, d_embed, [3, 6], 12, rnn_type, n_layers, )
+
+ def forward(self, x, root, cam_a, h0=None):
+ """ Forward pass of trajectory decoder.
+ """
+
+ b, f = x.shape[:2]
+ pred_root_list, pred_vel_list = [root[:, :1]], []
+
+ for i in range(f):
+ # Global coordinate estimation
+ (pred_rootv, pred_rootr), _, h0 = self.regressor(
+ x[:, [i]], [pred_root_list[-1], cam_a[:, [i]]], h0)
+
+ pred_root_list.append(pred_rootr)
+ pred_vel_list.append(pred_rootv)
+
+ pred_root = torch.cat(pred_root_list, dim=1).view(b, f + 1, -1)
+ pred_vel = torch.cat(pred_vel_list, dim=1).view(b, f, -1)
+
+ return pred_root, pred_vel
+
+
+class MotionDecoder(nn.Module):
+ def __init__(self,
+ d_embed,
+ rnn_type,
+ n_layers):
+ super().__init__()
+
+ self.n_pose = 24
+
+ # SMPL pose initialization
+ self.neural_init = NeuralInitialization(len(_C.BMODEL.MAIN_JOINTS) * 6, d_embed, rnn_type, n_layers)
+
+ # 3d keypoints regressor
+ self.regressor = Regressor(
+ d_embed, d_embed, [self.n_pose * 6, 10, 3, 4], self.n_pose * 6, rnn_type, n_layers)
+
+ def forward(self, x, init):
+ """ Forward pass of motion decoder.
+ """
+ b, f = x.shape[:2]
+
+ h0 = self.neural_init(init[:, :, _C.BMODEL.MAIN_JOINTS].reshape(b, 1, -1))
+
+ # Recursive prediction of SMPL parameters
+ pred_pose_list = [init.reshape(b, 1, -1)]
+ pred_shape_list, pred_cam_list, pred_contact_list = [], [], []
+
+ for i in range(f):
+ # Camera coordinate estimation
+ (pred_pose, pred_shape, pred_cam, pred_contact), _, h0 = self.regressor(x[:, [i]], pred_pose_list[-1:], h0)
+ pred_pose_list.append(pred_pose)
+ pred_shape_list.append(pred_shape)
+ pred_cam_list.append(pred_cam)
+ pred_contact_list.append(pred_contact)
+
+ pred_pose = torch.cat(pred_pose_list[1:], dim=1).view(b, f, -1)
+ pred_shape = torch.cat(pred_shape_list, dim=1).view(b, f, -1)
+ pred_cam = torch.cat(pred_cam_list, dim=1).view(b, f, -1)
+ pred_contact = torch.cat(pred_contact_list, dim=1).view(b, f, -1)
+
+ return pred_pose, pred_shape, pred_cam, pred_contact
+
+
+class TrajectoryRefiner(nn.Module):
+ def __init__(self,
+ d_embed,
+ d_hidden,
+ rnn_type,
+ n_layers):
+ super().__init__()
+
+ d_input = d_embed + 12
+ self.refiner = Regressor(
+ d_input, d_hidden, [6, 3], 9, rnn_type, n_layers)
+
+ def forward(self, context, pred_vel, output, cam_angvel, return_y_up):
+ b, f = context.shape[:2]
+
+ # Register values
+ pred_root = output['poses_root_r6d'].clone().detach()
+ feet = output['feet'].clone().detach()
+ contact = output['contact'].clone().detach()
+
+ feet_vel = torch.cat((torch.zeros_like(feet[:, :1]), feet[:, 1:] - feet[:, :-1]), dim=1) * 30 # Normalize to 30 times
+ feet = (feet_vel * contact.unsqueeze(-1)).reshape(b, f, -1) # Velocity input
+ inpt_feat = torch.cat([context, feet], dim=-1)
+
+ (delta_root, delta_vel), _, _ = self.refiner(inpt_feat, [pred_root[:, 1:], pred_vel], h0=None)
+ pred_root[:, 1:] = pred_root[:, 1:] + delta_root
+ pred_vel = pred_vel + delta_vel
+
+ # root_world, trans_world = rollout_global_motion(pred_root, pred_vel)
+
+ # if return_y_up:
+ # yup2ydown = axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float().to(root_world.device)
+ # root_world = yup2ydown.mT @ root_world
+ # trans_world = (yup2ydown.mT @ trans_world.unsqueeze(-1)).squeeze(-1)
+
+ output.update({
+ 'poses_root_r6d_refined': pred_root,
+ 'vel_root_refined': pred_vel,
+ # 'poses_root_world': root_world,
+ # 'trans_world': trans_world,
+ })
+
+ return output
\ No newline at end of file
diff --git a/lib/models/layers/utils.py b/lib/models/layers/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..6a67ec77b8572f8ca26d4ded8f4731da4bb7b616
--- /dev/null
+++ b/lib/models/layers/utils.py
@@ -0,0 +1,52 @@
+import torch
+from lib.utils import transforms
+
+
+
+def rollout_global_motion(root_r, root_v, init_trans=None):
+ b, f = root_v.shape[:2]
+ root = transforms.rotation_6d_to_matrix(root_r[:])
+ vel_world = (root[:, :-1] @ root_v.unsqueeze(-1)).squeeze(-1)
+ trans = torch.cumsum(vel_world, dim=1)
+
+ if init_trans is not None: trans = trans + init_trans
+ return root[:, 1:], trans
+
+def compute_camera_motion(output, root_c_d6d, root_w, trans, pred_cam):
+ root_c = transforms.rotation_6d_to_matrix(root_c_d6d) # Root orient in cam coord
+ cam_R = root_c @ root_w.mT
+ pelvis_cam = output.full_cam.view_as(pred_cam)
+ pelvis_world = (cam_R.mT @ pelvis_cam.unsqueeze(-1)).squeeze(-1)
+ cam_T_world = pelvis_world - trans
+ cam_T = (cam_R @ cam_T_world.unsqueeze(-1)).squeeze(-1)
+
+ return cam_R, cam_T
+
+def compute_camera_pose(root_c_d6d, root_w):
+ root_c = transforms.rotation_6d_to_matrix(root_c_d6d) # Root orient in cam coord
+ cam_R = root_c @ root_w.mT
+ return cam_R
+
+
+def reset_root_velocity(smpl, output, stationary, pred_ori, pred_vel, thr=0.7):
+ b, f = pred_vel.shape[:2]
+
+ stationary_mask = (stationary.clone().detach() > thr).unsqueeze(-1).float()
+ poses_root = transforms.rotation_6d_to_matrix(pred_ori.clone().detach())
+ vel_world = (poses_root[:, 1:] @ pred_vel.clone().detach().unsqueeze(-1)).squeeze(-1)
+
+ output = smpl.get_output(body_pose=output.body_pose.clone().detach(),
+ global_orient=poses_root[:, 1:].reshape(-1, 1, 3, 3),
+ betas=output.betas.clone().detach(),
+ pose2rot=False)
+ feet = output.feet.reshape(b, f, 4, 3)
+ feet_vel = feet[:, 1:] - feet[:, :-1] + vel_world[:, 1:].unsqueeze(-2)
+ feet_vel = torch.cat((torch.zeros_like(feet_vel[:, :1]), feet_vel), dim=1)
+
+ stationary_vel = feet_vel * stationary_mask
+ del_vel = stationary_vel.sum(dim=2) / ((stationary_vel != 0).sum(dim=2) + 1e-4)
+ vel_world_update = vel_world - del_vel
+
+ vel_root = (poses_root[:, 1:].mT @ vel_world_update.unsqueeze(-1)).squeeze(-1)
+
+ return vel_root
\ No newline at end of file
diff --git a/lib/models/preproc/__pycache__/detector.cpython-39.pyc b/lib/models/preproc/__pycache__/detector.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..79cc37dbe7ad97032f937cacc11496380dba2476
Binary files /dev/null and b/lib/models/preproc/__pycache__/detector.cpython-39.pyc differ
diff --git a/lib/models/preproc/__pycache__/extractor.cpython-39.pyc b/lib/models/preproc/__pycache__/extractor.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..2e2e36df8bd3833d3b5e804bab3ecd9a4aaaa4a2
Binary files /dev/null and b/lib/models/preproc/__pycache__/extractor.cpython-39.pyc differ
diff --git a/lib/models/preproc/__pycache__/slam.cpython-39.pyc b/lib/models/preproc/__pycache__/slam.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..00df033befc3345b66a10c48439fe09dae546d12
Binary files /dev/null and b/lib/models/preproc/__pycache__/slam.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/hmr2.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/hmr2.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..465485605b609e3c833afaddb8d0a56af49ce6c1
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/hmr2.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/pose_transformer.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/pose_transformer.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..4c0abc591506f68c5675e9f378169305e69ef47b
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/pose_transformer.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/smpl_head.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/smpl_head.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..794288fed3709651cbffcf555144e538d05b2f5e
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/smpl_head.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/t_cond_mlp.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/t_cond_mlp.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..fef62b6c4677103a86ecff02739b8cb35546597c
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/t_cond_mlp.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/utils.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/utils.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..1ab5e2d17bbc88f56cf7dd3961a3351bd6c15dc1
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/utils.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/__pycache__/vit.cpython-39.pyc b/lib/models/preproc/backbone/__pycache__/vit.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5ce3d6121d4f74ae622b0e5e4b49eae7c680fbff
Binary files /dev/null and b/lib/models/preproc/backbone/__pycache__/vit.cpython-39.pyc differ
diff --git a/lib/models/preproc/backbone/hmr2.py b/lib/models/preproc/backbone/hmr2.py
new file mode 100644
index 0000000000000000000000000000000000000000..bf5a87b874585e9c0cc1b09aeda939fd365de5be
--- /dev/null
+++ b/lib/models/preproc/backbone/hmr2.py
@@ -0,0 +1,77 @@
+import os
+
+import torch
+import einops
+import torch.nn as nn
+# import pytorch_lightning as pl
+
+from yacs.config import CfgNode
+from .vit import vit
+from .smpl_head import SMPLTransformerDecoderHead
+
+# class HMR2(pl.LightningModule):
+class HMR2(nn.Module):
+
+ def __init__(self):
+ """
+ Setup HMR2 model
+ Args:
+ cfg (CfgNode): Config file as a yacs CfgNode
+ """
+ super().__init__()
+
+ # Create backbone feature extractor
+ self.backbone = vit()
+
+ # Create SMPL head
+ self.smpl_head = SMPLTransformerDecoderHead()
+
+
+ def decode(self, x):
+
+ batch_size = x.shape[0]
+ pred_smpl_params, pred_cam, _ = self.smpl_head(x)
+
+ # Compute model vertices, joints and the projected joints
+ pred_smpl_params['global_orient'] = pred_smpl_params['global_orient'].reshape(batch_size, -1, 3, 3)
+ pred_smpl_params['body_pose'] = pred_smpl_params['body_pose'].reshape(batch_size, -1, 3, 3)
+ pred_smpl_params['betas'] = pred_smpl_params['betas'].reshape(batch_size, -1)
+ return pred_smpl_params['global_orient'], pred_smpl_params['body_pose'], pred_smpl_params['betas'], pred_cam
+
+ def forward(self, x, encode=False, **kwargs):
+ """
+ Run a forward step of the network
+ Args:
+ batch (Dict): Dictionary containing batch data
+ train (bool): Flag indicating whether it is training or validation mode
+ Returns:
+ Dict: Dictionary containing the regression output
+ """
+
+ # Use RGB image as input
+ batch_size = x.shape[0]
+
+ # Compute conditioning features using the backbone
+ # if using ViT backbone, we need to use a different aspect ratio
+ conditioning_feats = self.backbone(x[:,:,:,32:-32])
+ if encode:
+ conditioning_feats = einops.rearrange(conditioning_feats, 'b c h w -> b (h w) c')
+ token = torch.zeros(batch_size, 1, 1).to(x.device)
+ token_out = self.smpl_head.transformer(token, context=conditioning_feats)
+ return token_out.squeeze(1)
+
+ pred_smpl_params, pred_cam, _ = self.smpl_head(conditioning_feats)
+
+ # Compute model vertices, joints and the projected joints
+ pred_smpl_params['global_orient'] = pred_smpl_params['global_orient'].reshape(batch_size, -1, 3, 3)
+ pred_smpl_params['body_pose'] = pred_smpl_params['body_pose'].reshape(batch_size, -1, 3, 3)
+ pred_smpl_params['betas'] = pred_smpl_params['betas'].reshape(batch_size, -1)
+ return pred_smpl_params['global_orient'], pred_smpl_params['body_pose'], pred_smpl_params['betas'], pred_cam
+
+
+def hmr2(checkpoint_pth):
+ model = HMR2()
+ if os.path.exists(checkpoint_pth):
+ model.load_state_dict(torch.load(checkpoint_pth, map_location='cpu')['state_dict'], strict=False)
+ print(f'Load backbone weight: {checkpoint_pth}')
+ return model
\ No newline at end of file
diff --git a/lib/models/preproc/backbone/pose_transformer.py b/lib/models/preproc/backbone/pose_transformer.py
new file mode 100644
index 0000000000000000000000000000000000000000..ed09143c324169acf7231c020ed8b5ead0964423
--- /dev/null
+++ b/lib/models/preproc/backbone/pose_transformer.py
@@ -0,0 +1,357 @@
+from inspect import isfunction
+from typing import Callable, Optional
+
+import torch
+from einops import rearrange
+from einops.layers.torch import Rearrange
+from torch import nn
+
+from .t_cond_mlp import (
+ AdaptiveLayerNorm1D,
+ FrequencyEmbedder,
+ normalization_layer,
+)
+# from .vit import Attention, FeedForward
+
+
+def exists(val):
+ return val is not None
+
+
+def default(val, d):
+ if exists(val):
+ return val
+ return d() if isfunction(d) else d
+
+
+class PreNorm(nn.Module):
+ def __init__(self, dim: int, fn: Callable, norm: str = "layer", norm_cond_dim: int = -1):
+ super().__init__()
+ self.norm = normalization_layer(norm, dim, norm_cond_dim)
+ self.fn = fn
+
+ def forward(self, x: torch.Tensor, *args, **kwargs):
+ if isinstance(self.norm, AdaptiveLayerNorm1D):
+ return self.fn(self.norm(x, *args), **kwargs)
+ else:
+ return self.fn(self.norm(x), **kwargs)
+
+
+class FeedForward(nn.Module):
+ def __init__(self, dim, hidden_dim, dropout=0.0):
+ super().__init__()
+ self.net = nn.Sequential(
+ nn.Linear(dim, hidden_dim),
+ nn.GELU(),
+ nn.Dropout(dropout),
+ nn.Linear(hidden_dim, dim),
+ nn.Dropout(dropout),
+ )
+
+ def forward(self, x):
+ return self.net(x)
+
+
+class Attention(nn.Module):
+ def __init__(self, dim, heads=8, dim_head=64, dropout=0.0):
+ super().__init__()
+ inner_dim = dim_head * heads
+ project_out = not (heads == 1 and dim_head == dim)
+
+ self.heads = heads
+ self.scale = dim_head**-0.5
+
+ self.attend = nn.Softmax(dim=-1)
+ self.dropout = nn.Dropout(dropout)
+
+ self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)
+
+ self.to_out = (
+ nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+ if project_out
+ else nn.Identity()
+ )
+
+ def forward(self, x):
+ qkv = self.to_qkv(x).chunk(3, dim=-1)
+ q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), qkv)
+
+ dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+
+ attn = self.attend(dots)
+ attn = self.dropout(attn)
+
+ out = torch.matmul(attn, v)
+ out = rearrange(out, "b h n d -> b n (h d)")
+ return self.to_out(out)
+
+
+class CrossAttention(nn.Module):
+ def __init__(self, dim, context_dim=None, heads=8, dim_head=64, dropout=0.0):
+ super().__init__()
+ inner_dim = dim_head * heads
+ project_out = not (heads == 1 and dim_head == dim)
+
+ self.heads = heads
+ self.scale = dim_head**-0.5
+
+ self.attend = nn.Softmax(dim=-1)
+ self.dropout = nn.Dropout(dropout)
+
+ context_dim = default(context_dim, dim)
+ self.to_kv = nn.Linear(context_dim, inner_dim * 2, bias=False)
+ self.to_q = nn.Linear(dim, inner_dim, bias=False)
+
+ self.to_out = (
+ nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+ if project_out
+ else nn.Identity()
+ )
+
+ def forward(self, x, context=None):
+ context = default(context, x)
+ k, v = self.to_kv(context).chunk(2, dim=-1)
+ q = self.to_q(x)
+ q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), [q, k, v])
+
+ dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+
+ attn = self.attend(dots)
+ attn = self.dropout(attn)
+
+ out = torch.matmul(attn, v)
+ out = rearrange(out, "b h n d -> b n (h d)")
+ return self.to_out(out)
+
+
+class Transformer(nn.Module):
+ def __init__(
+ self,
+ dim: int,
+ depth: int,
+ heads: int,
+ dim_head: int,
+ mlp_dim: int,
+ dropout: float = 0.0,
+ norm: str = "layer",
+ norm_cond_dim: int = -1,
+ ):
+ super().__init__()
+ self.layers = nn.ModuleList([])
+ for _ in range(depth):
+ sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+ ff = FeedForward(dim, mlp_dim, dropout=dropout)
+ self.layers.append(
+ nn.ModuleList(
+ [
+ PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+ PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+ ]
+ )
+ )
+
+ def forward(self, x: torch.Tensor, *args):
+ for attn, ff in self.layers:
+ x = attn(x, *args) + x
+ x = ff(x, *args) + x
+ return x
+
+
+class TransformerCrossAttn(nn.Module):
+ def __init__(
+ self,
+ dim: int,
+ depth: int,
+ heads: int,
+ dim_head: int,
+ mlp_dim: int,
+ dropout: float = 0.0,
+ norm: str = "layer",
+ norm_cond_dim: int = -1,
+ context_dim: Optional[int] = None,
+ ):
+ super().__init__()
+ self.layers = nn.ModuleList([])
+ for _ in range(depth):
+ sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+ ca = CrossAttention(
+ dim, context_dim=context_dim, heads=heads, dim_head=dim_head, dropout=dropout
+ )
+ ff = FeedForward(dim, mlp_dim, dropout=dropout)
+ self.layers.append(
+ nn.ModuleList(
+ [
+ PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+ PreNorm(dim, ca, norm=norm, norm_cond_dim=norm_cond_dim),
+ PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+ ]
+ )
+ )
+
+ def forward(self, x: torch.Tensor, *args, context=None, context_list=None):
+ if context_list is None:
+ context_list = [context] * len(self.layers)
+ if len(context_list) != len(self.layers):
+ raise ValueError(f"len(context_list) != len(self.layers) ({len(context_list)} != {len(self.layers)})")
+
+ for i, (self_attn, cross_attn, ff) in enumerate(self.layers):
+ x = self_attn(x, *args) + x
+ x = cross_attn(x, *args, context=context_list[i]) + x
+ x = ff(x, *args) + x
+ return x
+
+
+class DropTokenDropout(nn.Module):
+ def __init__(self, p: float = 0.1):
+ super().__init__()
+ if p < 0 or p > 1:
+ raise ValueError(
+ "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+ )
+ self.p = p
+
+ def forward(self, x: torch.Tensor):
+ # x: (batch_size, seq_len, dim)
+ if self.training and self.p > 0:
+ zero_mask = torch.full_like(x[0, :, 0], self.p).bernoulli().bool()
+ # TODO: permutation idx for each batch using torch.argsort
+ if zero_mask.any():
+ x = x[:, ~zero_mask, :]
+ return x
+
+
+class ZeroTokenDropout(nn.Module):
+ def __init__(self, p: float = 0.1):
+ super().__init__()
+ if p < 0 or p > 1:
+ raise ValueError(
+ "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+ )
+ self.p = p
+
+ def forward(self, x: torch.Tensor):
+ # x: (batch_size, seq_len, dim)
+ if self.training and self.p > 0:
+ zero_mask = torch.full_like(x[:, :, 0], self.p).bernoulli().bool()
+ # Zero-out the masked tokens
+ x[zero_mask, :] = 0
+ return x
+
+
+class TransformerEncoder(nn.Module):
+ def __init__(
+ self,
+ num_tokens: int,
+ token_dim: int,
+ dim: int,
+ depth: int,
+ heads: int,
+ mlp_dim: int,
+ dim_head: int = 64,
+ dropout: float = 0.0,
+ emb_dropout: float = 0.0,
+ emb_dropout_type: str = "drop",
+ emb_dropout_loc: str = "token",
+ norm: str = "layer",
+ norm_cond_dim: int = -1,
+ token_pe_numfreq: int = -1,
+ ):
+ super().__init__()
+ if token_pe_numfreq > 0:
+ token_dim_new = token_dim * (2 * token_pe_numfreq + 1)
+ self.to_token_embedding = nn.Sequential(
+ Rearrange("b n d -> (b n) d", n=num_tokens, d=token_dim),
+ FrequencyEmbedder(token_pe_numfreq, token_pe_numfreq - 1),
+ Rearrange("(b n) d -> b n d", n=num_tokens, d=token_dim_new),
+ nn.Linear(token_dim_new, dim),
+ )
+ else:
+ self.to_token_embedding = nn.Linear(token_dim, dim)
+ self.pos_embedding = nn.Parameter(torch.randn(1, num_tokens, dim))
+ if emb_dropout_type == "drop":
+ self.dropout = DropTokenDropout(emb_dropout)
+ elif emb_dropout_type == "zero":
+ self.dropout = ZeroTokenDropout(emb_dropout)
+ else:
+ raise ValueError(f"Unknown emb_dropout_type: {emb_dropout_type}")
+ self.emb_dropout_loc = emb_dropout_loc
+
+ self.transformer = Transformer(
+ dim, depth, heads, dim_head, mlp_dim, dropout, norm=norm, norm_cond_dim=norm_cond_dim
+ )
+
+ def forward(self, inp: torch.Tensor, *args, **kwargs):
+ x = inp
+
+ if self.emb_dropout_loc == "input":
+ x = self.dropout(x)
+ x = self.to_token_embedding(x)
+
+ if self.emb_dropout_loc == "token":
+ x = self.dropout(x)
+ b, n, _ = x.shape
+ x += self.pos_embedding[:, :n]
+
+ if self.emb_dropout_loc == "token_afterpos":
+ x = self.dropout(x)
+ x = self.transformer(x, *args)
+ return x
+
+
+class TransformerDecoder(nn.Module):
+ def __init__(
+ self,
+ num_tokens: int,
+ token_dim: int,
+ dim: int,
+ depth: int,
+ heads: int,
+ mlp_dim: int,
+ dim_head: int = 64,
+ dropout: float = 0.0,
+ emb_dropout: float = 0.0,
+ emb_dropout_type: str = 'drop',
+ norm: str = "layer",
+ norm_cond_dim: int = -1,
+ context_dim: Optional[int] = None,
+ skip_token_embedding: bool = False,
+ ):
+ super().__init__()
+ if not skip_token_embedding:
+ self.to_token_embedding = nn.Linear(token_dim, dim)
+ else:
+ self.to_token_embedding = nn.Identity()
+ if token_dim != dim:
+ raise ValueError(
+ f"token_dim ({token_dim}) != dim ({dim}) when skip_token_embedding is True"
+ )
+
+ self.pos_embedding = nn.Parameter(torch.randn(1, num_tokens, dim))
+ if emb_dropout_type == "drop":
+ self.dropout = DropTokenDropout(emb_dropout)
+ elif emb_dropout_type == "zero":
+ self.dropout = ZeroTokenDropout(emb_dropout)
+ elif emb_dropout_type == "normal":
+ self.dropout = nn.Dropout(emb_dropout)
+
+ self.transformer = TransformerCrossAttn(
+ dim,
+ depth,
+ heads,
+ dim_head,
+ mlp_dim,
+ dropout,
+ norm=norm,
+ norm_cond_dim=norm_cond_dim,
+ context_dim=context_dim,
+ )
+
+ def forward(self, inp: torch.Tensor, *args, context=None, context_list=None):
+ x = self.to_token_embedding(inp)
+ b, n, _ = x.shape
+
+ x = self.dropout(x)
+ x += self.pos_embedding[:, :n]
+
+ x = self.transformer(x, *args, context=context, context_list=context_list)
+ return x
diff --git a/lib/models/preproc/backbone/smpl_head.py b/lib/models/preproc/backbone/smpl_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..14493b8398deff1afffd6a9584bb7e54f5146626
--- /dev/null
+++ b/lib/models/preproc/backbone/smpl_head.py
@@ -0,0 +1,128 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+import einops
+
+from configs import constants as _C
+from lib.utils.transforms import axis_angle_to_matrix
+from .pose_transformer import TransformerDecoder
+
+def rot6d_to_rotmat(x: torch.Tensor) -> torch.Tensor:
+ """
+ Convert 6D rotation representation to 3x3 rotation matrix.
+ Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+ Args:
+ x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+ Returns:
+ torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+ """
+ x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+ a1 = x[:, :, 0]
+ a2 = x[:, :, 1]
+ b1 = F.normalize(a1)
+ b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+ b3 = torch.cross(b1, b2)
+ return torch.stack((b1, b2, b3), dim=-1)
+
+def build_smpl_head(cfg):
+ smpl_head_type = 'transformer_decoder'
+ if smpl_head_type == 'transformer_decoder':
+ return SMPLTransformerDecoderHead(cfg)
+ else:
+ raise ValueError('Unknown SMPL head type: {}'.format(smpl_head_type))
+
+class SMPLTransformerDecoderHead(nn.Module):
+ """ Cross-attention based SMPL Transformer decoder
+ """
+
+ def __init__(self):
+ super().__init__()
+ self.joint_rep_type = '6d'
+ self.joint_rep_dim = {'6d': 6, 'aa': 3}[self.joint_rep_type]
+ npose = self.joint_rep_dim * 24
+ self.npose = npose
+ self.input_is_mean_shape = False
+ transformer_args = dict(
+ num_tokens=1,
+ token_dim=(npose + 10 + 3) if self.input_is_mean_shape else 1,
+ dim=1024,
+ )
+ transformer_args_from_cfg = dict(
+ depth=6, heads=8, mlp_dim=1024, dim_head=64, dropout=0.0, emb_dropout=0.0, norm='layer', context_dim=1280
+ )
+ transformer_args = (transformer_args | transformer_args_from_cfg)
+ self.transformer = TransformerDecoder(
+ **transformer_args
+ )
+ dim=transformer_args['dim']
+ self.decpose = nn.Linear(dim, npose)
+ self.decshape = nn.Linear(dim, 10)
+ self.deccam = nn.Linear(dim, 3)
+
+ mean_params = np.load(_C.BMODEL.MEAN_PARAMS)
+ init_body_pose = torch.from_numpy(mean_params['pose'].astype(np.float32)).unsqueeze(0)
+ init_betas = torch.from_numpy(mean_params['shape'].astype('float32')).unsqueeze(0)
+ init_cam = torch.from_numpy(mean_params['cam'].astype(np.float32)).unsqueeze(0)
+ self.register_buffer('init_body_pose', init_body_pose)
+ self.register_buffer('init_betas', init_betas)
+ self.register_buffer('init_cam', init_cam)
+
+ def forward(self, x, **kwargs):
+
+ batch_size = x.shape[0]
+ # vit pretrained backbone is channel-first. Change to token-first
+
+ init_body_pose = self.init_body_pose.expand(batch_size, -1)
+ init_betas = self.init_betas.expand(batch_size, -1)
+ init_cam = self.init_cam.expand(batch_size, -1)
+
+ # TODO: Convert init_body_pose to aa rep if needed
+ if self.joint_rep_type == 'aa':
+ raise NotImplementedError
+
+ pred_body_pose = init_body_pose
+ pred_betas = init_betas
+ pred_cam = init_cam
+ pred_body_pose_list = []
+ pred_betas_list = []
+ pred_cam_list = []
+
+ # Input token to transformer is zero token
+ if len(x.shape) > 2:
+ x = einops.rearrange(x, 'b c h w -> b (h w) c')
+ if self.input_is_mean_shape:
+ token = torch.cat([pred_body_pose, pred_betas, pred_cam], dim=1)[:,None,:]
+ else:
+ token = torch.zeros(batch_size, 1, 1).to(x.device)
+
+ # Pass through transformer
+ token_out = self.transformer(token, context=x)
+ token_out = token_out.squeeze(1) # (B, C)
+ else:
+ token_out = x
+
+ # Readout from token_out
+ pred_body_pose = self.decpose(token_out) + pred_body_pose
+ pred_betas = self.decshape(token_out) + pred_betas
+ pred_cam = self.deccam(token_out) + pred_cam
+ pred_body_pose_list.append(pred_body_pose)
+ pred_betas_list.append(pred_betas)
+ pred_cam_list.append(pred_cam)
+
+ # Convert self.joint_rep_type -> rotmat
+ joint_conversion_fn = {
+ '6d': rot6d_to_rotmat,
+ 'aa': lambda x: axis_angle_to_matrix(x.view(-1, 3).contiguous())
+ }[self.joint_rep_type]
+
+ pred_smpl_params_list = {}
+ pred_smpl_params_list['body_pose'] = torch.cat([joint_conversion_fn(pbp).view(batch_size, -1, 3, 3)[:, 1:, :, :] for pbp in pred_body_pose_list], dim=0)
+ pred_smpl_params_list['betas'] = torch.cat(pred_betas_list, dim=0)
+ pred_smpl_params_list['cam'] = torch.cat(pred_cam_list, dim=0)
+ pred_body_pose = joint_conversion_fn(pred_body_pose).view(batch_size, 24, 3, 3)
+
+ pred_smpl_params = {'global_orient': pred_body_pose[:, [0]],
+ 'body_pose': pred_body_pose[:, 1:],
+ 'betas': pred_betas}
+ return pred_smpl_params, pred_cam, pred_smpl_params_list
\ No newline at end of file
diff --git a/lib/models/preproc/backbone/t_cond_mlp.py b/lib/models/preproc/backbone/t_cond_mlp.py
new file mode 100644
index 0000000000000000000000000000000000000000..954ae695b30f06dcb88ded4514da3f2c693149f6
--- /dev/null
+++ b/lib/models/preproc/backbone/t_cond_mlp.py
@@ -0,0 +1,198 @@
+import copy
+from typing import List, Optional
+
+import torch
+
+
+class AdaptiveLayerNorm1D(torch.nn.Module):
+ def __init__(self, data_dim: int, norm_cond_dim: int):
+ super().__init__()
+ if data_dim <= 0:
+ raise ValueError(f"data_dim must be positive, but got {data_dim}")
+ if norm_cond_dim <= 0:
+ raise ValueError(f"norm_cond_dim must be positive, but got {norm_cond_dim}")
+ self.norm = torch.nn.LayerNorm(
+ data_dim
+ ) # TODO: Check if elementwise_affine=True is correct
+ self.linear = torch.nn.Linear(norm_cond_dim, 2 * data_dim)
+ torch.nn.init.zeros_(self.linear.weight)
+ torch.nn.init.zeros_(self.linear.bias)
+
+ def forward(self, x: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
+ # x: (batch, ..., data_dim)
+ # t: (batch, norm_cond_dim)
+ # return: (batch, data_dim)
+ x = self.norm(x)
+ alpha, beta = self.linear(t).chunk(2, dim=-1)
+
+ # Add singleton dimensions to alpha and beta
+ if x.dim() > 2:
+ alpha = alpha.view(alpha.shape[0], *([1] * (x.dim() - 2)), alpha.shape[1])
+ beta = beta.view(beta.shape[0], *([1] * (x.dim() - 2)), beta.shape[1])
+
+ return x * (1 + alpha) + beta
+
+
+class SequentialCond(torch.nn.Sequential):
+ def forward(self, input, *args, **kwargs):
+ for module in self:
+ if isinstance(module, (AdaptiveLayerNorm1D, SequentialCond, ResidualMLPBlock)):
+ # print(f'Passing on args to {module}', [a.shape for a in args])
+ input = module(input, *args, **kwargs)
+ else:
+ # print(f'Skipping passing args to {module}', [a.shape for a in args])
+ input = module(input)
+ return input
+
+
+def normalization_layer(norm: Optional[str], dim: int, norm_cond_dim: int = -1):
+ if norm == "batch":
+ return torch.nn.BatchNorm1d(dim)
+ elif norm == "layer":
+ return torch.nn.LayerNorm(dim)
+ elif norm == "ada":
+ assert norm_cond_dim > 0, f"norm_cond_dim must be positive, got {norm_cond_dim}"
+ return AdaptiveLayerNorm1D(dim, norm_cond_dim)
+ elif norm is None:
+ return torch.nn.Identity()
+ else:
+ raise ValueError(f"Unknown norm: {norm}")
+
+
+def linear_norm_activ_dropout(
+ input_dim: int,
+ output_dim: int,
+ activation: torch.nn.Module = torch.nn.ReLU(),
+ bias: bool = True,
+ norm: Optional[str] = "layer", # Options: ada/batch/layer
+ dropout: float = 0.0,
+ norm_cond_dim: int = -1,
+) -> SequentialCond:
+ layers = []
+ layers.append(torch.nn.Linear(input_dim, output_dim, bias=bias))
+ if norm is not None:
+ layers.append(normalization_layer(norm, output_dim, norm_cond_dim))
+ layers.append(copy.deepcopy(activation))
+ if dropout > 0.0:
+ layers.append(torch.nn.Dropout(dropout))
+ return SequentialCond(*layers)
+
+
+def create_simple_mlp(
+ input_dim: int,
+ hidden_dims: List[int],
+ output_dim: int,
+ activation: torch.nn.Module = torch.nn.ReLU(),
+ bias: bool = True,
+ norm: Optional[str] = "layer", # Options: ada/batch/layer
+ dropout: float = 0.0,
+ norm_cond_dim: int = -1,
+) -> SequentialCond:
+ layers = []
+ prev_dim = input_dim
+ for hidden_dim in hidden_dims:
+ layers.extend(
+ linear_norm_activ_dropout(
+ prev_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+ )
+ )
+ prev_dim = hidden_dim
+ layers.append(torch.nn.Linear(prev_dim, output_dim, bias=bias))
+ return SequentialCond(*layers)
+
+
+class ResidualMLPBlock(torch.nn.Module):
+ def __init__(
+ self,
+ input_dim: int,
+ hidden_dim: int,
+ num_hidden_layers: int,
+ output_dim: int,
+ activation: torch.nn.Module = torch.nn.ReLU(),
+ bias: bool = True,
+ norm: Optional[str] = "layer", # Options: ada/batch/layer
+ dropout: float = 0.0,
+ norm_cond_dim: int = -1,
+ ):
+ super().__init__()
+ if not (input_dim == output_dim == hidden_dim):
+ raise NotImplementedError(
+ f"input_dim {input_dim} != output_dim {output_dim} is not implemented"
+ )
+
+ layers = []
+ prev_dim = input_dim
+ for i in range(num_hidden_layers):
+ layers.append(
+ linear_norm_activ_dropout(
+ prev_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+ )
+ )
+ prev_dim = hidden_dim
+ self.model = SequentialCond(*layers)
+ self.skip = torch.nn.Identity()
+
+ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+ return x + self.model(x, *args, **kwargs)
+
+
+class ResidualMLP(torch.nn.Module):
+ def __init__(
+ self,
+ input_dim: int,
+ hidden_dim: int,
+ num_hidden_layers: int,
+ output_dim: int,
+ activation: torch.nn.Module = torch.nn.ReLU(),
+ bias: bool = True,
+ norm: Optional[str] = "layer", # Options: ada/batch/layer
+ dropout: float = 0.0,
+ num_blocks: int = 1,
+ norm_cond_dim: int = -1,
+ ):
+ super().__init__()
+ self.input_dim = input_dim
+ self.model = SequentialCond(
+ linear_norm_activ_dropout(
+ input_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+ ),
+ *[
+ ResidualMLPBlock(
+ hidden_dim,
+ hidden_dim,
+ num_hidden_layers,
+ hidden_dim,
+ activation,
+ bias,
+ norm,
+ dropout,
+ norm_cond_dim,
+ )
+ for _ in range(num_blocks)
+ ],
+ torch.nn.Linear(hidden_dim, output_dim, bias=bias),
+ )
+
+ def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+ return self.model(x, *args, **kwargs)
+
+
+class FrequencyEmbedder(torch.nn.Module):
+ def __init__(self, num_frequencies, max_freq_log2):
+ super().__init__()
+ frequencies = 2 ** torch.linspace(0, max_freq_log2, steps=num_frequencies)
+ self.register_buffer("frequencies", frequencies)
+
+ def forward(self, x):
+ # x should be of size (N,) or (N, D)
+ N = x.size(0)
+ if x.dim() == 1: # (N,)
+ x = x.unsqueeze(1) # (N, D) where D=1
+ x_unsqueezed = x.unsqueeze(-1) # (N, D, 1)
+ scaled = self.frequencies.view(1, 1, -1) * x_unsqueezed # (N, D, num_frequencies)
+ s = torch.sin(scaled)
+ c = torch.cos(scaled)
+ embedded = torch.cat([s, c, x_unsqueezed], dim=-1).view(
+ N, -1
+ ) # (N, D * 2 * num_frequencies + D)
+ return embedded
diff --git a/lib/models/preproc/backbone/utils.py b/lib/models/preproc/backbone/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..44bbf64966953d246be2994f31656d90a4f7e8f9
--- /dev/null
+++ b/lib/models/preproc/backbone/utils.py
@@ -0,0 +1,115 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os
+import os.path as osp
+from collections import OrderedDict
+
+import cv2
+import numpy as np
+from skimage.filters import gaussian
+
+
+def get_transform(center, scale, res, rot=0):
+ """Generate transformation matrix."""
+ # res: (height, width), (rows, cols)
+ crop_aspect_ratio = res[0] / float(res[1])
+ h = 200 * scale
+ w = h / crop_aspect_ratio
+ t = np.zeros((3, 3))
+ t[0, 0] = float(res[1]) / w
+ t[1, 1] = float(res[0]) / h
+ t[0, 2] = res[1] * (-float(center[0]) / w + .5)
+ t[1, 2] = res[0] * (-float(center[1]) / h + .5)
+ t[2, 2] = 1
+ if not rot == 0:
+ rot = -rot # To match direction of rotation from cropping
+ rot_mat = np.zeros((3, 3))
+ rot_rad = rot * np.pi / 180
+ sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+ rot_mat[0, :2] = [cs, -sn]
+ rot_mat[1, :2] = [sn, cs]
+ rot_mat[2, 2] = 1
+ # Need to rotate around center
+ t_mat = np.eye(3)
+ t_mat[0, 2] = -res[1] / 2
+ t_mat[1, 2] = -res[0] / 2
+ t_inv = t_mat.copy()
+ t_inv[:2, 2] *= -1
+ t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t)))
+ return t
+
+
+def transform(pt, center, scale, res, invert=0, rot=0):
+ """Transform pixel location to different reference."""
+ t = get_transform(center, scale, res, rot=rot)
+ if invert:
+ t = np.linalg.inv(t)
+ new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+ new_pt = np.dot(t, new_pt)
+ return np.array([round(new_pt[0]), round(new_pt[1])], dtype=int) + 1
+
+
+def crop(img, center, scale, res):
+ """
+ Crop image according to the supplied bounding box.
+ res: [rows, cols]
+ """
+ # Upper left point
+ ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+ # Bottom right point
+ br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+
+ new_shape = [br[1] - ul[1], br[0] - ul[0]]
+ if len(img.shape) > 2:
+ new_shape += [img.shape[2]]
+ new_img = np.zeros(new_shape, dtype=np.float32)
+
+ # Range to fill new array
+ new_x = max(0, -ul[0]), min(br[0], len(img[0])) - ul[0]
+ new_y = max(0, -ul[1]), min(br[1], len(img)) - ul[1]
+ # Range to sample from original image
+ old_x = max(0, ul[0]), min(len(img[0]), br[0])
+ old_y = max(0, ul[1]), min(len(img), br[1])
+ try:
+ new_img[new_y[0]:new_y[1], new_x[0]:new_x[1]] = img[old_y[0]:old_y[1], old_x[0]:old_x[1]]
+ except Exception as e:
+ print(e)
+
+ new_img = cv2.resize(new_img, (res[1], res[0])) # (cols, rows)
+
+ return new_img, ul, br
+
+
+
+def process_image(orig_img_rgb, center, scale, crop_height=256, crop_width=192, blur=False, do_crop=True):
+ """
+ Read image, do preprocessing and possibly crop it according to the bounding box.
+ If there are bounding box annotations, use them to crop the image.
+ If no bounding box is specified but openpose detections are available, use them to get the bounding box.
+ """
+
+ if blur:
+ # Blur image to avoid aliasing artifacts
+ downsampling_factor = ((scale * 200 * 1.0) / crop_height)
+ downsampling_factor = downsampling_factor / 2.0
+ if downsampling_factor > 1.1:
+ orig_img_rgb = gaussian(orig_img_rgb, sigma=(downsampling_factor-1)/2, channel_axis=2, preserve_range=True)
+
+ IMG_NORM_MEAN = [0.485, 0.456, 0.406]
+ IMG_NORM_STD = [0.229, 0.224, 0.225]
+
+ if do_crop:
+ img, ul, br = crop(orig_img_rgb, center, scale, (crop_height, crop_width))
+ else:
+ img = orig_img_rgb.copy()
+ crop_img = img.copy()
+
+ img = img / 255.
+ mean = np.array(IMG_NORM_MEAN, dtype=np.float32)
+ std = np.array(IMG_NORM_STD, dtype=np.float32)
+ norm_img = (img - mean) / std
+ norm_img = np.transpose(norm_img, (2, 0, 1))
+
+ return norm_img, crop_img
\ No newline at end of file
diff --git a/lib/models/preproc/backbone/vit.py b/lib/models/preproc/backbone/vit.py
new file mode 100644
index 0000000000000000000000000000000000000000..0d4cb7abeb3365aa7c9b13a0d204b0009abf6a2e
--- /dev/null
+++ b/lib/models/preproc/backbone/vit.py
@@ -0,0 +1,348 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+
+import torch
+from functools import partial
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.checkpoint as checkpoint
+
+from timm.models.layers import drop_path, to_2tuple, trunc_normal_
+
+def vit():
+ return ViT(
+ img_size=(256, 192),
+ patch_size=16,
+ embed_dim=1280,
+ depth=32,
+ num_heads=16,
+ ratio=1,
+ use_checkpoint=False,
+ mlp_ratio=4,
+ qkv_bias=True,
+ drop_path_rate=0.55,
+ )
+
+def get_abs_pos(abs_pos, h, w, ori_h, ori_w, has_cls_token=True):
+ """
+ Calculate absolute positional embeddings. If needed, resize embeddings and remove cls_token
+ dimension for the original embeddings.
+ Args:
+ abs_pos (Tensor): absolute positional embeddings with (1, num_position, C).
+ has_cls_token (bool): If true, has 1 embedding in abs_pos for cls token.
+ hw (Tuple): size of input image tokens.
+
+ Returns:
+ Absolute positional embeddings after processing with shape (1, H, W, C)
+ """
+ cls_token = None
+ B, L, C = abs_pos.shape
+ if has_cls_token:
+ cls_token = abs_pos[:, 0:1]
+ abs_pos = abs_pos[:, 1:]
+
+ if ori_h != h or ori_w != w:
+ new_abs_pos = F.interpolate(
+ abs_pos.reshape(1, ori_h, ori_w, -1).permute(0, 3, 1, 2),
+ size=(h, w),
+ mode="bicubic",
+ align_corners=False,
+ ).permute(0, 2, 3, 1).reshape(B, -1, C)
+
+ else:
+ new_abs_pos = abs_pos
+
+ if cls_token is not None:
+ new_abs_pos = torch.cat([cls_token, new_abs_pos], dim=1)
+ return new_abs_pos
+
+class DropPath(nn.Module):
+ """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
+ """
+ def __init__(self, drop_prob=None):
+ super(DropPath, self).__init__()
+ self.drop_prob = drop_prob
+
+ def forward(self, x):
+ return drop_path(x, self.drop_prob, self.training)
+
+ def extra_repr(self):
+ return 'p={}'.format(self.drop_prob)
+
+class Mlp(nn.Module):
+ def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+ super().__init__()
+ out_features = out_features or in_features
+ hidden_features = hidden_features or in_features
+ self.fc1 = nn.Linear(in_features, hidden_features)
+ self.act = act_layer()
+ self.fc2 = nn.Linear(hidden_features, out_features)
+ self.drop = nn.Dropout(drop)
+
+ def forward(self, x):
+ x = self.fc1(x)
+ x = self.act(x)
+ x = self.fc2(x)
+ x = self.drop(x)
+ return x
+
+class Attention(nn.Module):
+ def __init__(
+ self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,
+ proj_drop=0., attn_head_dim=None,):
+ super().__init__()
+ self.num_heads = num_heads
+ head_dim = dim // num_heads
+ self.dim = dim
+
+ if attn_head_dim is not None:
+ head_dim = attn_head_dim
+ all_head_dim = head_dim * self.num_heads
+
+ self.scale = qk_scale or head_dim ** -0.5
+
+ self.qkv = nn.Linear(dim, all_head_dim * 3, bias=qkv_bias)
+
+ self.attn_drop = nn.Dropout(attn_drop)
+ self.proj = nn.Linear(all_head_dim, dim)
+ self.proj_drop = nn.Dropout(proj_drop)
+
+ def forward(self, x):
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+ q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple)
+
+ q = q * self.scale
+ attn = (q @ k.transpose(-2, -1))
+
+ attn = attn.softmax(dim=-1)
+ attn = self.attn_drop(attn)
+
+ x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
+ x = self.proj(x)
+ x = self.proj_drop(x)
+
+ return x
+
+class Block(nn.Module):
+
+ def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None,
+ drop=0., attn_drop=0., drop_path=0., act_layer=nn.GELU,
+ norm_layer=nn.LayerNorm, attn_head_dim=None
+ ):
+ super().__init__()
+
+ self.norm1 = norm_layer(dim)
+ self.attn = Attention(
+ dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
+ attn_drop=attn_drop, proj_drop=drop, attn_head_dim=attn_head_dim
+ )
+
+ # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
+ self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+ self.norm2 = norm_layer(dim)
+ mlp_hidden_dim = int(dim * mlp_ratio)
+ self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+
+ def forward(self, x):
+ x = x + self.drop_path(self.attn(self.norm1(x)))
+ x = x + self.drop_path(self.mlp(self.norm2(x)))
+ return x
+
+
+class PatchEmbed(nn.Module):
+ """ Image to Patch Embedding
+ """
+ def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, ratio=1):
+ super().__init__()
+ img_size = to_2tuple(img_size)
+ patch_size = to_2tuple(patch_size)
+ num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0]) * (ratio ** 2)
+ self.patch_shape = (int(img_size[0] // patch_size[0] * ratio), int(img_size[1] // patch_size[1] * ratio))
+ self.origin_patch_shape = (int(img_size[0] // patch_size[0]), int(img_size[1] // patch_size[1]))
+ self.img_size = img_size
+ self.patch_size = patch_size
+ self.num_patches = num_patches
+
+ self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=(patch_size[0] // ratio), padding=4 + 2 * (ratio//2-1))
+
+ def forward(self, x, **kwargs):
+ B, C, H, W = x.shape
+ x = self.proj(x)
+ Hp, Wp = x.shape[2], x.shape[3]
+
+ x = x.flatten(2).transpose(1, 2)
+ return x, (Hp, Wp)
+
+
+class HybridEmbed(nn.Module):
+ """ CNN Feature Map Embedding
+ Extract feature map from CNN, flatten, project to embedding dim.
+ """
+ def __init__(self, backbone, img_size=224, feature_size=None, in_chans=3, embed_dim=768):
+ super().__init__()
+ assert isinstance(backbone, nn.Module)
+ img_size = to_2tuple(img_size)
+ self.img_size = img_size
+ self.backbone = backbone
+ if feature_size is None:
+ with torch.no_grad():
+ training = backbone.training
+ if training:
+ backbone.eval()
+ o = self.backbone(torch.zeros(1, in_chans, img_size[0], img_size[1]))[-1]
+ feature_size = o.shape[-2:]
+ feature_dim = o.shape[1]
+ backbone.train(training)
+ else:
+ feature_size = to_2tuple(feature_size)
+ feature_dim = self.backbone.feature_info.channels()[-1]
+ self.num_patches = feature_size[0] * feature_size[1]
+ self.proj = nn.Linear(feature_dim, embed_dim)
+
+ def forward(self, x):
+ x = self.backbone(x)[-1]
+ x = x.flatten(2).transpose(1, 2)
+ x = self.proj(x)
+ return x
+
+
+class ViT(nn.Module):
+
+ def __init__(self,
+ img_size=224, patch_size=16, in_chans=3, num_classes=80, embed_dim=768, depth=12,
+ num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
+ drop_path_rate=0., hybrid_backbone=None, norm_layer=None, use_checkpoint=False,
+ frozen_stages=-1, ratio=1, last_norm=True,
+ patch_padding='pad', freeze_attn=False, freeze_ffn=False,
+ ):
+ # Protect mutable default arguments
+ super(ViT, self).__init__()
+ norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)
+ self.num_classes = num_classes
+ self.num_features = self.embed_dim = embed_dim # num_features for consistency with other models
+ self.frozen_stages = frozen_stages
+ self.use_checkpoint = use_checkpoint
+ self.patch_padding = patch_padding
+ self.freeze_attn = freeze_attn
+ self.freeze_ffn = freeze_ffn
+ self.depth = depth
+
+ if hybrid_backbone is not None:
+ self.patch_embed = HybridEmbed(
+ hybrid_backbone, img_size=img_size, in_chans=in_chans, embed_dim=embed_dim)
+ else:
+ self.patch_embed = PatchEmbed(
+ img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim, ratio=ratio)
+ num_patches = self.patch_embed.num_patches
+
+ # since the pretraining model has class token
+ self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))
+
+ dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] # stochastic depth decay rule
+
+ self.blocks = nn.ModuleList([
+ Block(
+ dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
+ drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,
+ )
+ for i in range(depth)])
+
+ self.last_norm = norm_layer(embed_dim) if last_norm else nn.Identity()
+
+ if self.pos_embed is not None:
+ trunc_normal_(self.pos_embed, std=.02)
+
+ self._freeze_stages()
+
+ def _freeze_stages(self):
+ """Freeze parameters."""
+ if self.frozen_stages >= 0:
+ self.patch_embed.eval()
+ for param in self.patch_embed.parameters():
+ param.requires_grad = False
+
+ for i in range(1, self.frozen_stages + 1):
+ m = self.blocks[i]
+ m.eval()
+ for param in m.parameters():
+ param.requires_grad = False
+
+ if self.freeze_attn:
+ for i in range(0, self.depth):
+ m = self.blocks[i]
+ m.attn.eval()
+ m.norm1.eval()
+ for param in m.attn.parameters():
+ param.requires_grad = False
+ for param in m.norm1.parameters():
+ param.requires_grad = False
+
+ if self.freeze_ffn:
+ self.pos_embed.requires_grad = False
+ self.patch_embed.eval()
+ for param in self.patch_embed.parameters():
+ param.requires_grad = False
+ for i in range(0, self.depth):
+ m = self.blocks[i]
+ m.mlp.eval()
+ m.norm2.eval()
+ for param in m.mlp.parameters():
+ param.requires_grad = False
+ for param in m.norm2.parameters():
+ param.requires_grad = False
+
+ def init_weights(self):
+ """Initialize the weights in backbone.
+ Args:
+ pretrained (str, optional): Path to pre-trained weights.
+ Defaults to None.
+ """
+ def _init_weights(m):
+ if isinstance(m, nn.Linear):
+ trunc_normal_(m.weight, std=.02)
+ if isinstance(m, nn.Linear) and m.bias is not None:
+ nn.init.constant_(m.bias, 0)
+ elif isinstance(m, nn.LayerNorm):
+ nn.init.constant_(m.bias, 0)
+ nn.init.constant_(m.weight, 1.0)
+
+ self.apply(_init_weights)
+
+ def get_num_layers(self):
+ return len(self.blocks)
+
+ @torch.jit.ignore
+ def no_weight_decay(self):
+ return {'pos_embed', 'cls_token'}
+
+ def forward_features(self, x):
+ B, C, H, W = x.shape
+ x, (Hp, Wp) = self.patch_embed(x)
+
+ if self.pos_embed is not None:
+ # fit for multiple GPU training
+ # since the first element for pos embed (sin-cos manner) is zero, it will cause no difference
+ x = x + self.pos_embed[:, 1:] + self.pos_embed[:, :1]
+
+ for blk in self.blocks:
+ if self.use_checkpoint:
+ x = checkpoint.checkpoint(blk, x)
+ else:
+ x = blk(x)
+
+ x = self.last_norm(x)
+
+ xp = x.permute(0, 2, 1).reshape(B, -1, Hp, Wp).contiguous()
+
+ return xp
+
+ def forward(self, x):
+ x = self.forward_features(x)
+ return x
+
+ def train(self, mode=True):
+ """Convert the model into training mode."""
+ super().train(mode)
+ self._freeze_stages()
\ No newline at end of file
diff --git a/lib/models/preproc/detector.py b/lib/models/preproc/detector.py
new file mode 100644
index 0000000000000000000000000000000000000000..d6fa0a8adac0f0f9cdda46b2a4ae38c203750e1f
--- /dev/null
+++ b/lib/models/preproc/detector.py
@@ -0,0 +1,146 @@
+from __future__ import annotations
+
+import os
+import os.path as osp
+from collections import defaultdict
+
+import numpy as np
+import torch
+import torch.nn as nn
+import scipy.signal as signal
+from progress.bar import Bar
+
+from ultralytics import YOLO
+from mmpose.apis import (
+ inference_top_down_pose_model,
+ init_pose_model,
+ get_track_id,
+ vis_pose_result,
+)
+
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+VIT_DIR = osp.join(ROOT_DIR, "third-party/ViTPose")
+
+VIS_THRESH = 0.3
+BBOX_CONF = 0.5
+TRACKING_THR = 0.1
+MINIMUM_FRMAES = 30
+MINIMUM_JOINTS = 6
+
+class DetectionModel(object):
+ def __init__(self, device):
+
+ # ViTPose
+ pose_model_cfg = osp.join(VIT_DIR, 'configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_huge_coco_256x192.py')
+ pose_model_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'vitpose-h-multi-coco.pth')
+ self.pose_model = init_pose_model(pose_model_cfg, pose_model_ckpt, device=device.lower())
+
+ # YOLO
+ bbox_model_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'yolov8x.pt')
+ self.bbox_model = YOLO(bbox_model_ckpt)
+
+ self.device = device
+ self.initialize_tracking()
+
+ def initialize_tracking(self, ):
+ self.next_id = 0
+ self.frame_id = 0
+ self.pose_results_last = []
+ self.tracking_results = {
+ 'id': [],
+ 'frame_id': [],
+ 'bbox': [],
+ 'keypoints': []
+ }
+
+ def xyxy_to_cxcys(self, bbox, s_factor=1.05):
+ cx, cy = bbox[[0, 2]].mean(), bbox[[1, 3]].mean()
+ scale = max(bbox[2] - bbox[0], bbox[3] - bbox[1]) / 200 * s_factor
+ return np.array([[cx, cy, scale]])
+
+ def compute_bboxes_from_keypoints(self, s_factor=1.2):
+ X = self.tracking_results['keypoints'].copy()
+ mask = X[..., -1] > VIS_THRESH
+
+ bbox = np.zeros((len(X), 3))
+ for i, (kp, m) in enumerate(zip(X, mask)):
+ bb = [kp[m, 0].min(), kp[m, 1].min(),
+ kp[m, 0].max(), kp[m, 1].max()]
+ cx, cy = [(bb[2]+bb[0])/2, (bb[3]+bb[1])/2]
+ bb_w = bb[2] - bb[0]
+ bb_h = bb[3] - bb[1]
+ s = np.stack((bb_w, bb_h)).max()
+ bb = np.array((cx, cy, s))
+ bbox[i] = bb
+
+ bbox[:, 2] = bbox[:, 2] * s_factor / 200.0
+ self.tracking_results['bbox'] = bbox
+
+ def track(self, img, fps, length):
+
+ # bbox detection
+ bboxes = self.bbox_model.predict(
+ img, device=self.device, classes=0, conf=BBOX_CONF, save=False, verbose=False
+ )[0].boxes.xyxy.detach().cpu().numpy()
+ bboxes = [{'bbox': bbox} for bbox in bboxes]
+
+ # keypoints detection
+ pose_results, returned_outputs = inference_top_down_pose_model(
+ self.pose_model,
+ img,
+ person_results=bboxes,
+ format='xyxy',
+ return_heatmap=False,
+ outputs=None)
+
+ # person identification
+ pose_results, self.next_id = get_track_id(
+ pose_results,
+ self.pose_results_last,
+ self.next_id,
+ use_oks=False,
+ tracking_thr=TRACKING_THR,
+ use_one_euro=True,
+ fps=fps)
+
+ for pose_result in pose_results:
+ n_valid = (pose_result['keypoints'][:, -1] > VIS_THRESH).sum()
+ if n_valid < MINIMUM_JOINTS: continue
+
+ _id = pose_result['track_id']
+ xyxy = pose_result['bbox']
+ bbox = self.xyxy_to_cxcys(xyxy)
+
+ self.tracking_results['id'].append(_id)
+ self.tracking_results['frame_id'].append(self.frame_id)
+ self.tracking_results['bbox'].append(bbox)
+ self.tracking_results['keypoints'].append(pose_result['keypoints'])
+
+ self.frame_id += 1
+ self.pose_results_last = pose_results
+
+ def process(self, fps):
+ for key in ['id', 'frame_id', 'keypoints']:
+ self.tracking_results[key] = np.array(self.tracking_results[key])
+ self.compute_bboxes_from_keypoints()
+
+ output = defaultdict(lambda: defaultdict(list))
+ ids = np.unique(self.tracking_results['id'])
+ for _id in ids:
+ idxs = np.where(self.tracking_results['id'] == _id)[0]
+ for key, val in self.tracking_results.items():
+ if key == 'id': continue
+ output[_id][key] = val[idxs]
+
+ # Smooth bounding box detection
+ ids = list(output.keys())
+ for _id in ids:
+ if len(output[_id]['bbox']) < MINIMUM_FRMAES:
+ del output[_id]
+ continue
+
+ kernel = int(int(fps/2) / 2) * 2 + 1
+ smoothed_bbox = np.array([signal.medfilt(param, kernel) for param in output[_id]['bbox'].T]).T
+ output[_id]['bbox'] = smoothed_bbox
+
+ return output
\ No newline at end of file
diff --git a/lib/models/preproc/extractor.py b/lib/models/preproc/extractor.py
new file mode 100644
index 0000000000000000000000000000000000000000..25a77973492d7e09160c6282c965dcfefd439c97
--- /dev/null
+++ b/lib/models/preproc/extractor.py
@@ -0,0 +1,112 @@
+from __future__ import annotations
+
+import os
+import os.path as osp
+from collections import defaultdict
+
+import cv2
+import torch
+import numpy as np
+import scipy.signal as signal
+from progress.bar import Bar
+from scipy.ndimage.filters import gaussian_filter1d
+
+from configs import constants as _C
+from .backbone.hmr2 import hmr2
+from .backbone.utils import process_image
+from ...utils.imutils import flip_kp, flip_bbox
+
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+
+class FeatureExtractor(object):
+ def __init__(self, device, flip_eval=False, max_batch_size=64):
+
+ self.device = device
+ self.flip_eval = flip_eval
+ self.max_batch_size = max_batch_size
+
+ ckpt = osp.join(ROOT_DIR, 'checkpoints', 'hmr2a.ckpt')
+ self.model = hmr2(ckpt).to(device).eval()
+
+ def run(self, video, tracking_results, patch_h=256, patch_w=256):
+
+ if osp.isfile(video):
+ cap = cv2.VideoCapture(video)
+ is_video = True
+ length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+ width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+ else: # Image list
+ cap = video
+ is_video = False
+ length = len(video)
+ height, width = cv2.imread(video[0]).shape[:2]
+
+ frame_id = 0
+ bar = Bar('Feature extraction ...', fill='#', max=length)
+ while True:
+ if is_video:
+ flag, img = cap.read()
+ if not flag:
+ break
+ else:
+ if frame_id >= len(cap):
+ break
+ img = cv2.imread(cap[frame_id])
+
+ for _id, val in tracking_results.items():
+ if not frame_id in val['frame_id']: continue
+
+ frame_id2 = np.where(val['frame_id'] == frame_id)[0][0]
+ bbox = val['bbox'][frame_id2]
+ cx, cy, scale = bbox
+
+ norm_img, crop_img = process_image(img[..., ::-1], [cx, cy], scale, patch_h, patch_w)
+ norm_img = torch.from_numpy(norm_img).unsqueeze(0).to(self.device)
+ feature = self.model(norm_img, encode=True)
+ tracking_results[_id]['features'].append(feature.cpu())
+
+ if frame_id2 == 0: # First frame of this subject
+ tracking_results = self.predict_init(norm_img, tracking_results, _id, flip_eval=False)
+
+ if self.flip_eval:
+ flipped_bbox = flip_bbox(bbox, width, height)
+ tracking_results[_id]['flipped_bbox'].append(flipped_bbox)
+
+ keypoints = val['keypoints'][frame_id2]
+ flipped_keypoints = flip_kp(keypoints, width)
+ tracking_results[_id]['flipped_keypoints'].append(flipped_keypoints)
+
+ flipped_features = self.model(torch.flip(norm_img, (3, )), encode=True)
+ tracking_results[_id]['flipped_features'].append(flipped_features.cpu())
+
+ if frame_id2 == 0:
+ tracking_results = self.predict_init(torch.flip(norm_img, (3, )), tracking_results, _id, flip_eval=True)
+
+ bar.next()
+ frame_id += 1
+
+ return self.process(tracking_results)
+
+ def predict_init(self, norm_img, tracking_results, _id, flip_eval=False):
+ prefix = 'flipped_' if flip_eval else ''
+
+ pred_global_orient, pred_body_pose, pred_betas, _ = self.model(norm_img, encode=False)
+ tracking_results[_id][prefix + 'init_global_orient'] = pred_global_orient.cpu()
+ tracking_results[_id][prefix + 'init_body_pose'] = pred_body_pose.cpu()
+ tracking_results[_id][prefix + 'init_betas'] = pred_betas.cpu()
+ return tracking_results
+
+ def process(self, tracking_results):
+ output = defaultdict(dict)
+
+ for _id, results in tracking_results.items():
+
+ for key, val in results.items():
+ if isinstance(val, list):
+ if isinstance(val[0], torch.Tensor):
+ val = torch.cat(val)
+ elif isinstance(val[0], np.ndarray):
+ val = np.array(val)
+ output[_id][key] = val
+
+ return output
\ No newline at end of file
diff --git a/lib/models/preproc/slam.py b/lib/models/preproc/slam.py
new file mode 100644
index 0000000000000000000000000000000000000000..58f68198843eb4beef3f6e1f07783e00239a030a
--- /dev/null
+++ b/lib/models/preproc/slam.py
@@ -0,0 +1,70 @@
+import cv2
+import numpy as np
+import glob
+import os.path as osp
+import os
+import time
+import torch
+from pathlib import Path
+from multiprocessing import Process, Queue
+
+from dpvo.utils import Timer
+from dpvo.dpvo import DPVO
+from dpvo.config import cfg
+from dpvo.stream import image_stream, video_stream
+
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+DPVO_DIR = osp.join(ROOT_DIR, "third-party/DPVO")
+
+
+class SLAMModel(object):
+ def __init__(self, video, output_pth, width, height, calib=None, stride=1, skip=0, buffer=2048):
+
+ if calib == None or not osp.exists(calib):
+ calib = osp.join(output_pth, 'calib.txt')
+ if not osp.exists(calib):
+ self.estimate_intrinsics(width, height, calib)
+
+ self.dpvo_cfg = osp.join(DPVO_DIR, 'config/default.yaml')
+ self.dpvo_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'dpvo.pth')
+
+ self.buffer = buffer
+ self.times = []
+ self.slam = None
+ self.queue = Queue(maxsize=8)
+ self.reader = Process(target=video_stream, args=(self.queue, video, calib, stride, skip))
+ self.reader.start()
+
+ def estimate_intrinsics(self, width, height, calib):
+ focal_length = (height ** 2 + width ** 2) ** 0.5
+ center_x = width / 2
+ center_y = height / 2
+
+ with open(calib, 'w') as fopen:
+ line = f'{focal_length} {focal_length} {center_x} {center_y}'
+ fopen.write(line)
+
+ def track(self, ):
+ (t, image, intrinsics) = self.queue.get()
+
+ if t < 0: return
+
+ image = torch.from_numpy(image).permute(2,0,1).cuda()
+ intrinsics = torch.from_numpy(intrinsics).cuda()
+
+ if self.slam is None:
+ cfg.merge_from_file(self.dpvo_cfg)
+ cfg.BUFFER_SIZE = self.buffer
+ self.slam = DPVO(cfg, self.dpvo_ckpt, ht=image.shape[1], wd=image.shape[2], viz=False)
+
+ with Timer("SLAM", enabled=False):
+ t = time.time()
+ self.slam(t, image, intrinsics)
+ self.times.append(time.time() - t)
+
+ def process(self, ):
+ for _ in range(12):
+ self.slam.update()
+
+ self.reader.join()
+ return self.slam.terminate()[0]
\ No newline at end of file
diff --git a/lib/models/smpl.py b/lib/models/smpl.py
new file mode 100644
index 0000000000000000000000000000000000000000..663ae77b872e7623c032d73001f963cdc29abb0c
--- /dev/null
+++ b/lib/models/smpl.py
@@ -0,0 +1,264 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import os, sys
+
+import torch
+import numpy as np
+from lib.utils import transforms
+
+from smplx import SMPL as _SMPL
+from smplx.utils import SMPLOutput as ModelOutput
+from smplx.lbs import vertices2joints
+
+from configs import constants as _C
+
+class SMPL(_SMPL):
+ """ Extension of the official SMPL implementation to support more joints """
+
+ def __init__(self, *args, **kwargs):
+ sys.stdout = open(os.devnull, 'w')
+ super(SMPL, self).__init__(*args, **kwargs)
+ sys.stdout = sys.__stdout__
+
+ J_regressor_wham = np.load(_C.BMODEL.JOINTS_REGRESSOR_WHAM)
+ J_regressor_eval = np.load(_C.BMODEL.JOINTS_REGRESSOR_H36M)
+ self.register_buffer('J_regressor_wham', torch.tensor(
+ J_regressor_wham, dtype=torch.float32))
+ self.register_buffer('J_regressor_eval', torch.tensor(
+ J_regressor_eval, dtype=torch.float32))
+ self.register_buffer('J_regressor_feet', torch.from_numpy(
+ np.load(_C.BMODEL.JOINTS_REGRESSOR_FEET)
+ ).float())
+
+ def get_local_pose_from_reduced_global_pose(self, reduced_pose):
+ full_pose = torch.eye(
+ 3, device=reduced_pose.device
+ )[(None, ) * 2].repeat(reduced_pose.shape[0], 24, 1, 1)
+ full_pose[:, _C.BMODEL.MAIN_JOINTS] = reduced_pose
+ return full_pose
+
+ def forward(self,
+ pred_rot6d,
+ betas,
+ cam=None,
+ cam_intrinsics=None,
+ bbox=None,
+ res=None,
+ return_full_pose=False,
+ **kwargs):
+
+ rotmat = transforms.rotation_6d_to_matrix(pred_rot6d.reshape(*pred_rot6d.shape[:2], -1, 6)
+ ).reshape(-1, 24, 3, 3)
+
+ output = self.get_output(body_pose=rotmat[:, 1:],
+ global_orient=rotmat[:, :1],
+ betas=betas.view(-1, 10),
+ pose2rot=False,
+ return_full_pose=return_full_pose)
+
+ if cam is not None:
+ joints3d = output.joints.reshape(*cam.shape[:2], -1, 3)
+
+ # Weak perspective projection (for InstaVariety)
+ weak_cam = convert_weak_perspective_to_perspective(cam)
+
+ weak_joints2d = weak_perspective_projection(
+ joints3d,
+ rotation=torch.eye(3, device=cam.device).unsqueeze(0).unsqueeze(0).expand(*cam.shape[:2], -1, -1),
+ translation=weak_cam,
+ focal_length=5000.,
+ camera_center=torch.zeros(*cam.shape[:2], 2, device=cam.device)
+ )
+ output.weak_joints2d = weak_joints2d
+
+ # Full perspective projection
+ full_cam = convert_pare_to_full_img_cam(
+ cam,
+ bbox[:, :, 2] * 200.,
+ bbox[:, :, :2],
+ res[:, 0].unsqueeze(-1),
+ res[:, 1].unsqueeze(-1),
+ focal_length=cam_intrinsics[:, :, 0, 0]
+ )
+
+ full_joints2d = full_perspective_projection(
+ joints3d,
+ translation=full_cam,
+ cam_intrinsics=cam_intrinsics,
+ )
+ output.full_joints2d = full_joints2d
+ output.full_cam = full_cam.reshape(-1, 3)
+
+ return output
+
+ def forward_nd(self,
+ pred_rot6d,
+ root,
+ betas,
+ return_full_pose=False):
+
+ rotmat = transforms.rotation_6d_to_matrix(pred_rot6d.reshape(*pred_rot6d.shape[:2], -1, 6)
+ ).reshape(-1, 24, 3, 3)
+
+ output = self.get_output(body_pose=rotmat[:, 1:],
+ global_orient=root.reshape(-1, 1, 3, 3),
+ betas=betas.view(-1, 10),
+ pose2rot=False,
+ return_full_pose=return_full_pose)
+
+ return output
+
+ def get_output(self, *args, **kwargs):
+ kwargs['get_skin'] = True
+ smpl_output = super(SMPL, self).forward(*args, **kwargs)
+ joints = vertices2joints(self.J_regressor_wham, smpl_output.vertices)
+ feet = vertices2joints(self.J_regressor_feet, smpl_output.vertices)
+
+ offset = joints[..., [11, 12], :].mean(-2)
+ if 'transl' in kwargs:
+ offset = offset - kwargs['transl']
+ vertices = smpl_output.vertices - offset.unsqueeze(-2)
+ joints = joints - offset.unsqueeze(-2)
+ feet = feet - offset.unsqueeze(-2)
+
+ output = ModelOutput(vertices=vertices,
+ global_orient=smpl_output.global_orient,
+ body_pose=smpl_output.body_pose,
+ joints=joints,
+ betas=smpl_output.betas,
+ full_pose=smpl_output.full_pose)
+ output.feet = feet
+ output.offset = offset
+ return output
+
+ def get_offset(self, *args, **kwargs):
+ kwargs['get_skin'] = True
+ smpl_output = super(SMPL, self).forward(*args, **kwargs)
+ joints = vertices2joints(self.J_regressor_wham, smpl_output.vertices)
+
+ offset = joints[..., [11, 12], :].mean(-2)
+ return offset
+
+ def get_faces(self):
+ return np.array(self.faces)
+
+
+def convert_weak_perspective_to_perspective(
+ weak_perspective_camera,
+ focal_length=5000.,
+ img_res=224,
+):
+
+ perspective_camera = torch.stack(
+ [
+ weak_perspective_camera[..., 1],
+ weak_perspective_camera[..., 2],
+ 2 * focal_length / (img_res * weak_perspective_camera[..., 0] + 1e-9)
+ ],
+ dim=-1
+ )
+ return perspective_camera
+
+
+def weak_perspective_projection(
+ points,
+ rotation,
+ translation,
+ focal_length,
+ camera_center,
+ img_res=224,
+ normalize_joints2d=True,
+):
+ """
+ This function computes the perspective projection of a set of points.
+ Input:
+ points (b, f, N, 3): 3D points
+ rotation (b, f, 3, 3): Camera rotation
+ translation (b, f, 3): Camera translation
+ focal_length (b, f,) or scalar: Focal length
+ camera_center (b, f, 2): Camera center
+ """
+
+ K = torch.zeros([*points.shape[:2], 3, 3], device=points.device)
+ K[:,:,0,0] = focal_length
+ K[:,:,1,1] = focal_length
+ K[:,:,2,2] = 1.
+ K[:,:,:-1, -1] = camera_center
+
+ # Transform points
+ points = torch.einsum('bfij,bfkj->bfki', rotation, points)
+ points = points + translation.unsqueeze(-2)
+
+ # Apply perspective distortion
+ projected_points = points / points[...,-1].unsqueeze(-1)
+
+ # Apply camera intrinsics
+ projected_points = torch.einsum('bfij,bfkj->bfki', K, projected_points)
+
+ if normalize_joints2d:
+ projected_points = projected_points / (img_res / 2.)
+
+ return projected_points[..., :-1]
+
+
+def full_perspective_projection(
+ points,
+ cam_intrinsics,
+ rotation=None,
+ translation=None,
+):
+
+ K = cam_intrinsics
+
+ if rotation is not None:
+ points = (rotation @ points.transpose(-1, -2)).transpose(-1, -2)
+ if translation is not None:
+ points = points + translation.unsqueeze(-2)
+ projected_points = points / points[..., -1].unsqueeze(-1)
+ projected_points = (K @ projected_points.transpose(-1, -2)).transpose(-1, -2)
+ return projected_points[..., :-1]
+
+
+def convert_pare_to_full_img_cam(
+ pare_cam,
+ bbox_height,
+ bbox_center,
+ img_w,
+ img_h,
+ focal_length,
+ crop_res=224
+):
+
+ s, tx, ty = pare_cam[..., 0], pare_cam[..., 1], pare_cam[..., 2]
+ res = crop_res
+ r = bbox_height / res
+ tz = 2 * focal_length / (r * res * s)
+
+ cx = 2 * (bbox_center[..., 0] - (img_w / 2.)) / (s * bbox_height)
+ cy = 2 * (bbox_center[..., 1] - (img_h / 2.)) / (s * bbox_height)
+
+ cam_t = torch.stack([tx + cx, ty + cy, tz], dim=-1)
+ return cam_t
+
+
+def cam_crop2full(crop_cam, center, scale, full_img_shape, focal_length):
+ """
+ convert the camera parameters from the crop camera to the full camera
+ :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+ :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+ :param scale: shape=(N) square bbox resolution (b / 200)
+ :param full_img_shape: shape=(N, 2) original image height and width
+ :param focal_length: shape=(N,)
+ :return:
+ """
+ img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+ cx, cy, b = center[:, 0], center[:, 1], scale * 200
+ w_2, h_2 = img_w / 2., img_h / 2.
+ bs = b * crop_cam[:, 0] + 1e-9
+ tz = 2 * focal_length / bs
+ tx = (2 * (cx - w_2) / bs) + crop_cam[:, 1]
+ ty = (2 * (cy - h_2) / bs) + crop_cam[:, 2]
+ full_cam = torch.stack([tx, ty, tz], dim=-1)
+ return full_cam
\ No newline at end of file
diff --git a/lib/models/smplify/__init__.py b/lib/models/smplify/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..58929cd8baf5311a542399815dedd924342c079e
--- /dev/null
+++ b/lib/models/smplify/__init__.py
@@ -0,0 +1 @@
+from .smplify import TemporalSMPLify
\ No newline at end of file
diff --git a/lib/models/smplify/__pycache__/__init__.cpython-39.pyc b/lib/models/smplify/__pycache__/__init__.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e0c0f394bb9c29d5cba7d7267e540a5d8c58591e
Binary files /dev/null and b/lib/models/smplify/__pycache__/__init__.cpython-39.pyc differ
diff --git a/lib/models/smplify/__pycache__/losses.cpython-39.pyc b/lib/models/smplify/__pycache__/losses.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..883e47f65196e5c4fe1960c725602410cd2adb5c
Binary files /dev/null and b/lib/models/smplify/__pycache__/losses.cpython-39.pyc differ
diff --git a/lib/models/smplify/__pycache__/smplify.cpython-39.pyc b/lib/models/smplify/__pycache__/smplify.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..34ad471d72e5e93d5a7bde1820434031b53a0522
Binary files /dev/null and b/lib/models/smplify/__pycache__/smplify.cpython-39.pyc differ
diff --git a/lib/models/smplify/losses.py b/lib/models/smplify/losses.py
new file mode 100644
index 0000000000000000000000000000000000000000..be42e2b1fbbc374109eeb5aab674cdf64999ceb5
--- /dev/null
+++ b/lib/models/smplify/losses.py
@@ -0,0 +1,87 @@
+import torch
+
+def gmof(x, sigma):
+ """
+ Geman-McClure error function
+ """
+ x_squared = x ** 2
+ sigma_squared = sigma ** 2
+ return (sigma_squared * x_squared) / (sigma_squared + x_squared)
+
+
+def compute_jitter(x):
+ """
+ Compute jitter for the input tensor
+ """
+ return torch.linalg.norm(x[:, 2:] + x[:, :-2] - 2 * x[:, 1:-1], dim=-1)
+
+
+class SMPLifyLoss(torch.nn.Module):
+ def __init__(self,
+ res,
+ cam_intrinsics,
+ init_pose,
+ device,
+ **kwargs
+ ):
+
+ super().__init__()
+
+ self.res = res
+ self.cam_intrinsics = cam_intrinsics
+ self.init_pose = torch.from_numpy(init_pose).float().to(device)
+
+ def forward(self, output, params, input_keypoints, bbox,
+ reprojection_weight=100., regularize_weight=60.0,
+ consistency_weight=10.0, sprior_weight=0.04,
+ smooth_weight=20.0, sigma=100):
+
+ pose, shape, cam = params
+ scale = bbox[..., 2:].unsqueeze(-1) * 200.
+
+ # Loss 1. Data term
+ pred_keypoints = output.full_joints2d[..., :17, :]
+ joints_conf = input_keypoints[..., -1:]
+ reprojection_error = gmof(pred_keypoints - input_keypoints[..., :-1], sigma)
+ reprojection_error = ((reprojection_error * joints_conf) / scale).mean()
+
+ # Loss 2. Regularization term
+ regularize_error = torch.linalg.norm(pose - self.init_pose, dim=-1).mean()
+
+ # Loss 3. Shape prior and consistency error
+ consistency_error = shape.std(dim=1).mean()
+ sprior_error = torch.linalg.norm(shape, dim=-1).mean()
+ shape_error = sprior_weight * sprior_error + consistency_weight * consistency_error
+
+ # Loss 4. Smooth loss
+ pose_diff = compute_jitter(pose).mean()
+ cam_diff = compute_jitter(cam).mean()
+ smooth_error = pose_diff + cam_diff
+
+ # Sum up losses
+ loss = {
+ 'reprojection': reprojection_weight * reprojection_error,
+ 'regularize': regularize_weight * regularize_error,
+ 'shape': shape_error,
+ 'smooth': smooth_weight * smooth_error
+ }
+
+ return loss
+
+ def create_closure(self,
+ optimizer,
+ smpl,
+ params,
+ bbox,
+ input_keypoints):
+
+ def closure():
+ optimizer.zero_grad()
+ output = smpl(*params, cam_intrinsics=self.cam_intrinsics, bbox=bbox, res=self.res)
+
+ loss_dict = self.forward(output, params, input_keypoints, bbox)
+ loss = sum(loss_dict.values())
+ loss.backward()
+ return loss
+
+ return closure
\ No newline at end of file
diff --git a/lib/models/smplify/smplify.py b/lib/models/smplify/smplify.py
new file mode 100644
index 0000000000000000000000000000000000000000..4ffdfac904bce5187a5b5ad69f7d9f973c02eb49
--- /dev/null
+++ b/lib/models/smplify/smplify.py
@@ -0,0 +1,83 @@
+import os
+import torch
+from tqdm import tqdm
+
+from lib.models import build_body_model
+from .losses import SMPLifyLoss
+
+class TemporalSMPLify():
+
+ def __init__(self,
+ smpl=None,
+ lr=1e-2,
+ num_iters=5,
+ num_steps=10,
+ img_w=None,
+ img_h=None,
+ device=None
+ ):
+
+ self.smpl = smpl
+ self.lr = lr
+ self.num_iters = num_iters
+ self.num_steps = num_steps
+ self.img_w = img_w
+ self.img_h = img_h
+ self.device = device
+
+ def fit(self, init_pred, keypoints, bbox, **kwargs):
+
+ def to_params(param):
+ return torch.from_numpy(param).float().to(self.device).requires_grad_(True)
+
+ pose = init_pred['pose'].detach().cpu().numpy()
+ betas = init_pred['betas'].detach().cpu().numpy()
+ cam = init_pred['cam'].detach().cpu().numpy()
+ keypoints = torch.from_numpy(keypoints).float().unsqueeze(0).to(self.device)
+
+ BN = pose.shape[1]
+ lr = self.lr
+
+ # Stage 1. Optimize translation
+ params = [to_params(pose), to_params(betas), to_params(cam)]
+ optim_params = [params[2]]
+
+ optimizer = torch.optim.LBFGS(
+ optim_params,
+ lr=lr,
+ max_iter=self.num_iters,
+ line_search_fn='strong_wolfe')
+
+ loss_fn = SMPLifyLoss(init_pose=pose, device=self.device, **kwargs)
+
+ closure = loss_fn.create_closure(optimizer,
+ self.smpl,
+ params,
+ bbox,
+ keypoints)
+
+ for j in (j_bar := tqdm(range(self.num_steps), leave=False)):
+ optimizer.zero_grad()
+ loss = optimizer.step(closure)
+ msg = f'Loss: {loss.item():.1f}'
+ j_bar.set_postfix_str(msg)
+
+
+ # Stage 2. Optimize all params
+ optimizer = torch.optim.LBFGS(
+ params,
+ lr=lr * BN,
+ max_iter=self.num_iters,
+ line_search_fn='strong_wolfe')
+
+ for j in (j_bar := tqdm(range(self.num_steps), leave=False)):
+ optimizer.zero_grad()
+ loss = optimizer.step(closure)
+ msg = f'Loss: {loss.item():.1f}'
+ j_bar.set_postfix_str(msg)
+
+ init_pred['pose'] = params[0].detach()
+ init_pred['betas'] = params[1].detach()
+ init_pred['cam'] = params[2].detach()
+
+ return init_pred
\ No newline at end of file
diff --git a/lib/models/wham.py b/lib/models/wham.py
new file mode 100644
index 0000000000000000000000000000000000000000..0a5dbb605c6080d3b6ae8e3ec3fb651e84f6b5e6
--- /dev/null
+++ b/lib/models/wham.py
@@ -0,0 +1,210 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+from torch import nn
+import numpy as np
+
+from configs import constants as _C
+from lib.models.layers import (MotionEncoder, MotionDecoder, TrajectoryDecoder, TrajectoryRefiner, Integrator,
+ rollout_global_motion, reset_root_velocity, compute_camera_motion)
+from lib.utils.transforms import axis_angle_to_matrix
+
+
+class Network(nn.Module):
+ def __init__(self,
+ smpl,
+ pose_dr=0.1,
+ d_embed=512,
+ n_layers=3,
+ d_feat=2048,
+ rnn_type='LSTM',
+ **kwargs
+ ):
+ super().__init__()
+
+ n_joints = _C.KEYPOINTS.NUM_JOINTS
+ self.smpl = smpl
+ in_dim = n_joints * 2 + 3
+ d_context = d_embed + n_joints * 3
+
+ self.mask_embedding = nn.Parameter(torch.zeros(1, 1, n_joints, 2))
+
+ # Module 1. Motion Encoder
+ self.motion_encoder = MotionEncoder(in_dim=in_dim,
+ d_embed=d_embed,
+ pose_dr=pose_dr,
+ rnn_type=rnn_type,
+ n_layers=n_layers,
+ n_joints=n_joints)
+
+ self.trajectory_decoder = TrajectoryDecoder(d_embed=d_context,
+ rnn_type=rnn_type,
+ n_layers=n_layers)
+
+ # Module 3. Feature Integrator
+ self.integrator = Integrator(in_channel=d_feat + d_context,
+ out_channel=d_context)
+
+ # Module 4. Motion Decoder
+ self.motion_decoder = MotionDecoder(d_embed=d_context,
+ rnn_type=rnn_type,
+ n_layers=n_layers)
+
+ # Module 5. Trajectory Refiner
+ self.trajectory_refiner = TrajectoryRefiner(d_embed=d_context,
+ d_hidden=d_embed,
+ rnn_type=rnn_type,
+ n_layers=2)
+
+ def compute_global_feet(self, root_world, trans):
+ # # Compute world-coordinate motion
+ cam_R, cam_T = compute_camera_motion(self.output, self.pred_pose[:, :, :6], root_world, trans, self.pred_cam)
+ feet_cam = self.output.feet.reshape(self.b, self.f, -1, 3) + self.output.full_cam.reshape(self.b, self.f, 1, 3)
+ feet_world = (cam_R.mT @ (feet_cam - cam_T.unsqueeze(-2)).mT).mT
+
+ return feet_world, cam_R
+
+ def forward_smpl(self, **kwargs):
+ self.output = self.smpl(self.pred_pose,
+ self.pred_shape,
+ cam=self.pred_cam,
+ return_full_pose=not self.training,
+ **kwargs,
+ )
+
+ from loguru import logger
+ logger.info(f"Output Joints: {self.output.joints}")
+ logger.info(f"Output Vertices: {self.output.vertices}")
+
+ # Save joints and vertices as .npy arrays
+
+ np.save('joints.npy', self.output.joints.cpu().numpy())
+ np.save('vertices.npy', self.output.vertices.cpu().numpy())
+
+ # Feet location in global coordinate
+ root_world, trans = rollout_global_motion(self.pred_root, self.pred_vel)
+ feet_world, cam_R = self.compute_global_feet(root_world, trans)
+
+ # Return output
+ output = {'feet': feet_world,
+ 'contact': self.pred_contact,
+ 'pose': self.pred_pose,
+ 'betas': self.pred_shape,
+ 'cam': self.pred_cam,
+ 'poses_root_cam': self.output.global_orient,
+ 'poses_root_r6d': self.pred_root,
+ 'vel_root': self.pred_vel,
+ 'pose_root': self.pred_root,
+ 'verts_cam': self.output.vertices}
+
+ if self.training:
+ output.update({
+ 'kp3d': self.output.joints,
+ 'kp3d_nn': self.pred_kp3d,
+ 'full_kp2d': self.output.full_joints2d,
+ 'weak_kp2d': self.output.weak_joints2d,
+ 'R': cam_R,
+ })
+ else:
+ output.update({
+ 'poses_root_r6d': self.pred_root,
+ 'trans_cam': self.output.full_cam,
+ 'poses_body': self.output.body_pose})
+
+ return output
+
+
+ def preprocess(self, x, mask):
+ self.b, self.f = x.shape[:2]
+
+ # Treat masked keypoints
+ mask_embedding = mask.unsqueeze(-1) * self.mask_embedding
+ _mask = mask.unsqueeze(-1).repeat(1, 1, 1, 2).reshape(self.b, self.f, -1)
+ _mask = torch.cat((_mask, torch.zeros_like(_mask[..., :3])), dim=-1)
+ _mask_embedding = mask_embedding.reshape(self.b, self.f, -1)
+ _mask_embedding = torch.cat((_mask_embedding, torch.zeros_like(_mask_embedding[..., :3])), dim=-1)
+ x[_mask] = 0.0
+ x = x + _mask_embedding
+ return x
+
+
+ def rollout(self, output, pred_root, pred_vel, return_y_up):
+ root_world, trans_world = rollout_global_motion(pred_root, pred_vel)
+
+ if return_y_up:
+ yup2ydown = axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float().to(root_world.device)
+ root_world = yup2ydown.mT @ root_world
+ trans_world = (yup2ydown.mT @ trans_world.unsqueeze(-1)).squeeze(-1)
+
+ output.update({
+ 'poses_root_world': root_world,
+ 'trans_world': trans_world,
+ })
+
+ return output
+
+
+ def refine_trajectory(self, output, cam_angvel, return_y_up, **kwargs):
+
+ # --------- Refine trajectory --------- #
+ update_vel = reset_root_velocity(self.smpl, self.output, self.pred_contact, self.pred_root, self.pred_vel, thr=0.5)
+ output = self.trajectory_refiner(self.old_motion_context, update_vel, output, cam_angvel, return_y_up=return_y_up)
+ # --------- #
+
+ # Do rollout
+ output = self.rollout(output, output['poses_root_r6d_refined'], output['vel_root_refined'], return_y_up)
+
+ # --------- Compute refined feet --------- #
+ if self.training:
+ feet_world, cam_R = self.compute_global_feet(output['poses_root_world'], output['trans_world'])
+ output.update({'feet_refined': feet_world})
+
+ return output
+
+
+ def forward(self, x, inits, img_features=None, mask=None, init_root=None, cam_angvel=None,
+ cam_intrinsics=None, bbox=None, res=None, return_y_up=False, refine_traj=True, **kwargs):
+
+ x = self.preprocess(x, mask)
+ init_kp, init_smpl = inits
+
+ # --------- Inference --------- #
+ # Stage 1. Encode motion
+ pred_kp3d, motion_context = self.motion_encoder(x, init_kp)
+ self.old_motion_context = motion_context.detach().clone()
+
+ # Stage 2. Decode global trajectory
+ pred_root, pred_vel = self.trajectory_decoder(motion_context, init_root, cam_angvel)
+
+ # Stage 3. Integrate features
+ if img_features is not None and self.integrator is not None:
+ motion_context = self.integrator(motion_context, img_features)
+
+ # Stage 4. Decode SMPL motion
+ pred_pose, pred_shape, pred_cam, pred_contact = self.motion_decoder(motion_context, init_smpl)
+ # --------- #
+
+ # --------- Register predictions --------- #
+ self.pred_kp3d = pred_kp3d
+ self.pred_root = pred_root
+ self.pred_vel = pred_vel
+ self.pred_pose = pred_pose
+ self.pred_shape = pred_shape
+ self.pred_cam = pred_cam
+ self.pred_contact = pred_contact
+ # --------- #
+
+ # --------- Build SMPL --------- #
+ output = self.forward_smpl(cam_intrinsics=cam_intrinsics, bbox=bbox, res=res)
+ # --------- #
+
+ # --------- Refine trajectory --------- #
+ if refine_traj:
+ output = self.refine_trajectory(output, cam_angvel, return_y_up)
+ else:
+ output = self.rollout(output, self.pred_root, self.pred_vel, return_y_up)
+ # --------- #
+
+ return output
\ No newline at end of file
diff --git a/lib/utils/__pycache__/data_utils.cpython-39.pyc b/lib/utils/__pycache__/data_utils.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..5ed06882a0c504e0570e95131ea640028f364595
Binary files /dev/null and b/lib/utils/__pycache__/data_utils.cpython-39.pyc differ
diff --git a/lib/utils/__pycache__/imutils.cpython-39.pyc b/lib/utils/__pycache__/imutils.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..8af34e7acf1e79c9e7b19114f443bc52925624a3
Binary files /dev/null and b/lib/utils/__pycache__/imutils.cpython-39.pyc differ
diff --git a/lib/utils/__pycache__/kp_utils.cpython-39.pyc b/lib/utils/__pycache__/kp_utils.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..9922411db11f03716c92d05f0d24d50b25024408
Binary files /dev/null and b/lib/utils/__pycache__/kp_utils.cpython-39.pyc differ
diff --git a/lib/utils/__pycache__/transforms.cpython-39.pyc b/lib/utils/__pycache__/transforms.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..7b48d3647a8ea3b9c860ee99c0bb4f6abf2ce0d1
Binary files /dev/null and b/lib/utils/__pycache__/transforms.cpython-39.pyc differ
diff --git a/lib/utils/data_utils.py b/lib/utils/data_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b53a19cb1f5486e268f94d9a1ab98a7b7b1eb1df
--- /dev/null
+++ b/lib/utils/data_utils.py
@@ -0,0 +1,113 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import numpy as np
+
+from lib.utils import transforms
+
+
+def make_collate_fn():
+ def collate_fn(items):
+ items = list(filter(lambda x: x is not None , items))
+ batch = dict()
+ try: batch['vid'] = [item['vid'] for item in items]
+ except: pass
+ try: batch['gender'] = [item['gender'] for item in items]
+ except: pass
+ for key in items[0].keys():
+ try: batch[key] = torch.stack([item[key] for item in items])
+ except: pass
+ return batch
+
+ return collate_fn
+
+
+def prepare_keypoints_data(target):
+ """Prepare keypoints data"""
+
+ # Prepare 2D keypoints
+ target['init_kp2d'] = target['kp2d'][:1]
+ target['kp2d'] = target['kp2d'][1:]
+ if 'kp3d' in target:
+ target['kp3d'] = target['kp3d'][1:]
+
+ return target
+
+
+def prepare_smpl_data(target):
+ if 'pose' in target.keys():
+ # Use only the main joints
+ pose = target['pose'][:]
+ # 6-D Rotation representation
+ pose6d = transforms.matrix_to_rotation_6d(pose)
+ target['pose'] = pose6d[1:]
+
+ if 'betas' in target.keys():
+ target['betas'] = target['betas'][1:]
+
+ # Translation and shape parameters
+ if 'transl' in target.keys():
+ target['cam'] = target['transl'][1:]
+
+ # Initial pose and translation
+ target['init_pose'] = transforms.matrix_to_rotation_6d(target['init_pose'])
+
+ return target
+
+
+def append_target(target, label, key_list, idx1, idx2=None, pad=True):
+ for key in key_list:
+ if idx2 is None: data = label[key][idx1]
+ else: data = label[key][idx1:idx2+1]
+ if not pad: data = data[2:]
+ target[key] = data
+
+ return target
+
+
+def map_dmpl_to_smpl(pose):
+ """ Map AMASS DMPL pose representation to SMPL pose representation
+
+ Args:
+ pose - tensor / array with shape of (n_frames, 156)
+
+ Return:
+ pose - tensor / array with shape of (n_frames, 24, 3)
+ """
+
+ pose = pose.reshape(pose.shape[0], -1, 3)
+ pose[:, 23] = pose[:, 37] # right hand
+ if isinstance(pose, np.ndarray): pose = pose[:, :24].copy()
+ else: pose = pose[:, :24].clone()
+ return pose
+
+
+def transform_global_coordinate(pose, T, transl=None):
+ """ Transform global coordinate of dataset with respect to the given matrix.
+ Various datasets have different global coordinate system,
+ thus we united all datasets to the cronical coordinate system.
+
+ Args:
+ pose - SMPL pose; tensor / array
+ T - Transformation matrix
+ transl - SMPL translation
+ """
+
+ return_to_numpy = False
+ if isinstance(pose, np.ndarray):
+ return_to_numpy = True
+ pose = torch.from_numpy(pose).float()
+ if transl is not None: transl = torch.from_numpy(transl).float()
+
+ pose = transforms.axis_angle_to_matrix(pose)
+ pose[:, 0] = T @ pose[:, 0]
+ pose = transforms.matrix_to_axis_angle(pose)
+ if transl is not None:
+ transl = (T @ transl.T).squeeze().T
+
+ if return_to_numpy:
+ pose = pose.detach().numpy()
+ if transl is not None: transl = transl.detach().numpy()
+ return pose, transl
\ No newline at end of file
diff --git a/lib/utils/imutils.py b/lib/utils/imutils.py
new file mode 100644
index 0000000000000000000000000000000000000000..fc8529ff58a6074d0ee46b0f55d15e601c206b2f
--- /dev/null
+++ b/lib/utils/imutils.py
@@ -0,0 +1,363 @@
+import cv2
+import torch
+import random
+import numpy as np
+from . import transforms
+
+def do_augmentation(scale_factor=0.2, trans_factor=0.1):
+ scale = random.uniform(1.2 - scale_factor, 1.2 + scale_factor)
+ trans_x = random.uniform(-trans_factor, trans_factor)
+ trans_y = random.uniform(-trans_factor, trans_factor)
+
+ return scale, trans_x, trans_y
+
+def get_transform(center, scale, res, rot=0):
+ """Generate transformation matrix."""
+ # res: (height, width), (rows, cols)
+ crop_aspect_ratio = res[0] / float(res[1])
+ h = 200 * scale
+ w = h / crop_aspect_ratio
+ t = np.zeros((3, 3))
+ t[0, 0] = float(res[1]) / w
+ t[1, 1] = float(res[0]) / h
+ t[0, 2] = res[1] * (-float(center[0]) / w + .5)
+ t[1, 2] = res[0] * (-float(center[1]) / h + .5)
+ t[2, 2] = 1
+ if not rot == 0:
+ rot = -rot # To match direction of rotation from cropping
+ rot_mat = np.zeros((3, 3))
+ rot_rad = rot * np.pi / 180
+ sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+ rot_mat[0, :2] = [cs, -sn]
+ rot_mat[1, :2] = [sn, cs]
+ rot_mat[2, 2] = 1
+ # Need to rotate around center
+ t_mat = np.eye(3)
+ t_mat[0, 2] = -res[1] / 2
+ t_mat[1, 2] = -res[0] / 2
+ t_inv = t_mat.copy()
+ t_inv[:2, 2] *= -1
+ t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t)))
+ return t
+
+
+def transform(pt, center, scale, res, invert=0, rot=0):
+ """Transform pixel location to different reference."""
+ t = get_transform(center, scale, res, rot=rot)
+ if invert:
+ t = np.linalg.inv(t)
+ new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+ new_pt = np.dot(t, new_pt)
+ return np.array([round(new_pt[0]), round(new_pt[1])], dtype=int) + 1
+
+
+def crop_cliff(img, center, scale, res):
+ """
+ Crop image according to the supplied bounding box.
+ res: [rows, cols]
+ """
+ # Upper left point
+ ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+ # Bottom right point
+ br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+
+ # Padding so that when rotated proper amount of context is included
+ pad = int(np.linalg.norm(br - ul) / 2 - float(br[1] - ul[1]) / 2)
+
+ new_shape = [br[1] - ul[1], br[0] - ul[0]]
+ if len(img.shape) > 2:
+ new_shape += [img.shape[2]]
+ new_img = np.zeros(new_shape, dtype=np.float32)
+
+ # Range to fill new array
+ new_x = max(0, -ul[0]), min(br[0], len(img[0])) - ul[0]
+ new_y = max(0, -ul[1]), min(br[1], len(img)) - ul[1]
+ # Range to sample from original image
+ old_x = max(0, ul[0]), min(len(img[0]), br[0])
+ old_y = max(0, ul[1]), min(len(img), br[1])
+
+ try:
+ new_img[new_y[0]:new_y[1], new_x[0]:new_x[1]] = img[old_y[0]:old_y[1], old_x[0]:old_x[1]]
+ except Exception as e:
+ print(e)
+
+ new_img = cv2.resize(new_img, (res[1], res[0])) # (cols, rows)
+
+ return new_img, ul, br
+
+
+def obtain_bbox(center, scale, res, org_res):
+ # Upper left point
+ ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+ # Bottom right point
+ br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+
+ # Padding so that when rotated proper amount of context is included
+ pad = int(np.linalg.norm(br - ul) / 2 - float(br[1] - ul[1]) / 2)
+
+ # Range to sample from original image
+ old_x = max(0, ul[0]), min(org_res[0], br[0])
+ old_y = max(0, ul[1]), min(org_res[1], br[1])
+
+ return old_x, old_y
+
+
+def cam_crop2full(crop_cam, bbox, full_img_shape, focal_length=None):
+ """
+ convert the camera parameters from the crop camera to the full camera
+ :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+ :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+ :param scale: shape=(N, 1) square bbox resolution (b / 200)
+ :param full_img_shape: shape=(N, 2) original image height and width
+ :param focal_length: shape=(N,)
+ :return:
+ """
+
+ cx = bbox[..., 0].clone(); cy = bbox[..., 1].clone(); b = bbox[..., 2].clone() * 200
+ img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+ w_2, h_2 = img_w / 2., img_h / 2.
+ bs = b * crop_cam[:, :, 0] + 1e-9
+
+ if focal_length is None:
+ focal_length = (img_w * img_w + img_h * img_h) ** 0.5
+
+ tz = 2 * focal_length.unsqueeze(-1) / bs
+ tx = (2 * (cx - w_2.unsqueeze(-1)) / bs) + crop_cam[:, :, 1]
+ ty = (2 * (cy - h_2.unsqueeze(-1)) / bs) + crop_cam[:, :, 2]
+ full_cam = torch.stack([tx, ty, tz], dim=-1)
+ return full_cam
+
+
+def cam_pred2full(crop_cam, center, scale, full_img_shape, focal_length=2000.,):
+ """
+ Reference CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation
+
+ convert the camera parameters from the crop camera to the full camera
+ :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+ :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+ :param scale: shape=(N, ) square bbox resolution (b / 200)
+ :param full_img_shape: shape=(N, 2) original image height and width
+ :param focal_length: shape=(N,)
+ :return:
+ """
+
+ # img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+ img_w, img_h = full_img_shape[:, 0], full_img_shape[:, 1]
+ cx, cy, b = center[:, 0], center[:, 1], scale * 200
+ w_2, h_2 = img_w / 2., img_h / 2.
+ bs = b * crop_cam[:, 0] + 1e-9
+ tz = 2 * focal_length / bs
+ tx = (2 * (cx - w_2) / bs) + crop_cam[:, 1]
+ ty = (2 * (cy - h_2) / bs) + crop_cam[:, 2]
+ full_cam = torch.stack([tx, ty, tz], dim=-1)
+ return full_cam
+
+
+def cam_full2pred(full_cam, center, scale, full_img_shape, focal_length=2000.):
+ # img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+ img_w, img_h = full_img_shape[:, 0], full_img_shape[:, 1]
+ cx, cy, b = center[:, 0], center[:, 1], scale * 200
+ w_2, h_2 = img_w / 2., img_h / 2.
+
+ bs = (2 * focal_length / full_cam[:, 2])
+ _s = bs / b
+ _tx = full_cam[:, 0] - (2 * (cx - w_2) / bs)
+ _ty = full_cam[:, 1] - (2 * (cy - h_2) / bs)
+ crop_cam = torch.stack([_s, _tx, _ty], dim=-1)
+ return crop_cam
+
+
+def obtain_camera_intrinsics(image_shape, focal_length):
+ res_w = image_shape[..., 0].clone()
+ res_h = image_shape[..., 1].clone()
+ K = torch.eye(3).unsqueeze(0).expand(focal_length.shape[0], -1, -1).to(focal_length.device)
+ K[..., 0, 0] = focal_length.clone()
+ K[..., 1, 1] = focal_length.clone()
+ K[..., 0, 2] = res_w / 2
+ K[..., 1, 2] = res_h / 2
+
+ return K.unsqueeze(1)
+
+
+def trans_point2d(pt_2d, trans):
+ src_pt = np.array([pt_2d[0], pt_2d[1], 1.]).T
+ dst_pt = np.dot(trans, src_pt)
+ return dst_pt[0:2]
+
+def rotate_2d(pt_2d, rot_rad):
+ x = pt_2d[0]
+ y = pt_2d[1]
+ sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+ xx = x * cs - y * sn
+ yy = x * sn + y * cs
+ return np.array([xx, yy], dtype=np.float32)
+
+def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False):
+ # augment size with scale
+ src_w = src_width * scale
+ src_h = src_height * scale
+ src_center = np.zeros(2)
+ src_center[0] = c_x
+ src_center[1] = c_y # np.array([c_x, c_y], dtype=np.float32)
+ # augment rotation
+ rot_rad = np.pi * rot / 180
+ src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
+ src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)
+
+ dst_w = dst_width
+ dst_h = dst_height
+ dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
+ dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
+ dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)
+
+ src = np.zeros((3, 2), dtype=np.float32)
+ src[0, :] = src_center
+ src[1, :] = src_center + src_downdir
+ src[2, :] = src_center + src_rightdir
+
+ dst = np.zeros((3, 2), dtype=np.float32)
+ dst[0, :] = dst_center
+ dst[1, :] = dst_center + dst_downdir
+ dst[2, :] = dst_center + dst_rightdir
+
+ if inv:
+ trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+ else:
+ trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+
+ return trans
+
+def transform_keypoints(kp_2d, bbox, patch_width, patch_height):
+
+ center_x, center_y, scale = bbox[:3]
+ width = height = scale * 200
+ # scale, rot = 1.2, 0
+ scale, rot = 1.0, 0
+
+ # generate transformation
+ trans = gen_trans_from_patch_cv(
+ center_x,
+ center_y,
+ width,
+ height,
+ patch_width,
+ patch_height,
+ scale,
+ rot,
+ inv=False,
+ )
+
+ for n_jt in range(kp_2d.shape[0]):
+ kp_2d[n_jt] = trans_point2d(kp_2d[n_jt], trans)
+
+ return kp_2d, trans
+
+
+def transform(pt, center, scale, res, invert=0, rot=0):
+ """Transform pixel location to different reference."""
+ t = get_transform(center, scale, res, rot=rot)
+ if invert:
+ t = np.linalg.inv(t)
+ new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+ new_pt = np.dot(t, new_pt)
+ return new_pt[:2].astype(int) + 1
+
+
+def compute_cam_intrinsics(res):
+ img_w, img_h = res
+ focal_length = (img_w * img_w + img_h * img_h) ** 0.5
+ cam_intrinsics = torch.eye(3).repeat(1, 1, 1).float()
+ cam_intrinsics[:, 0, 0] = focal_length
+ cam_intrinsics[:, 1, 1] = focal_length
+ cam_intrinsics[:, 0, 2] = img_w/2.
+ cam_intrinsics[:, 1, 2] = img_h/2.
+ return cam_intrinsics
+
+
+def flip_kp(kp, img_w=None):
+ """Flip keypoints."""
+
+ flipped_parts = [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
+ kp = kp[..., flipped_parts, :]
+
+ if img_w is not None:
+ # Assume 2D keypoints
+ kp[...,0] = img_w - kp[...,0]
+ return kp
+
+
+def flip_bbox(bbox, img_w, img_h):
+ center = bbox[..., :2]
+ scale = bbox[..., -1:]
+
+ WH = np.ones_like(center)
+ WH[..., 0] *= img_w
+ WH[..., 1] *= img_h
+
+ center = center - WH/2
+ center[...,0] = - center[...,0]
+ center = center + WH/2
+
+ flipped_bbox = np.concatenate((center, scale), axis=-1)
+ return flipped_bbox
+
+
+def flip_pose(rotation, representation='rotation_6d'):
+ """Flip pose.
+ The flipping is based on SMPL parameters.
+ """
+
+ BN = rotation.shape[0]
+
+ if representation == 'axis_angle':
+ pose = rotation.reshape(BN, -1).transpose(0, 1)
+ elif representation == 'matrix':
+ pose = transforms.matrix_to_axis_angle(rotation).reshape(BN, -1).transpose(0, 1)
+ elif representation == 'rotation_6d':
+ pose = transforms.matrix_to_axis_angle(
+ transforms.rotation_6d_to_matrix(rotation)
+ ).reshape(BN, -1).transpose(0, 1)
+ else:
+ raise ValueError(f"Unknown representation: {representation}")
+
+ SMPL_JOINTS_FLIP_PERM = [0, 2, 1, 3, 5, 4, 6, 8, 7, 9, 11, 10, 12, 14, 13, 15, 17, 16, 19, 18, 21, 20, 23, 22]
+ SMPL_POSE_FLIP_PERM = []
+ for i in SMPL_JOINTS_FLIP_PERM:
+ SMPL_POSE_FLIP_PERM.append(3*i)
+ SMPL_POSE_FLIP_PERM.append(3*i+1)
+ SMPL_POSE_FLIP_PERM.append(3*i+2)
+
+ pose = pose[SMPL_POSE_FLIP_PERM]
+
+ # we also negate the second and the third dimension of the axis-angle
+ pose[1::3] = -pose[1::3]
+ pose[2::3] = -pose[2::3]
+ pose = pose.transpose(0, 1).reshape(BN, -1, 3)
+
+ if representation == 'aa':
+ return pose
+ elif representation == 'rotmat':
+ return transforms.axis_angle_to_matrix(pose)
+ else:
+ return transforms.matrix_to_rotation_6d(
+ transforms.axis_angle_to_matrix(pose)
+ )
+
+def avg_preds(rotation, shape, flipped_rotation, flipped_shape, representation='rotation_6d'):
+ # Rotation
+ flipped_rotation = flip_pose(flipped_rotation, representation=representation)
+
+ if representation != 'matrix':
+ flipped_rotation = eval(f'transforms.{representation}_to_matrix')(flipped_rotation)
+ rotation = eval(f'transforms.{representation}_to_matrix')(rotation)
+
+ avg_rotation = torch.stack([rotation, flipped_rotation])
+ avg_rotation = transforms.avg_rot(avg_rotation)
+
+ if representation != 'matrix':
+ avg_rotation = eval(f'transforms.matrix_to_{representation}')(avg_rotation)
+
+ # Shape
+ avg_shape = (shape + flipped_shape) / 2.0
+
+ return avg_rotation, avg_shape
\ No newline at end of file
diff --git a/lib/utils/kp_utils.py b/lib/utils/kp_utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..b23f4d012b65194363dcb98c554f722b52222251
--- /dev/null
+++ b/lib/utils/kp_utils.py
@@ -0,0 +1,761 @@
+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+
+import torch
+import numpy as np
+
+from configs import constants as _C
+
+def root_centering(X, joint_type='coco'):
+ """Center the root joint to the pelvis."""
+ if joint_type != 'common' and X.shape[-2] == 14: return X
+
+ conf = None
+ if X.shape[-1] == 4:
+ conf = X[..., -1:]
+ X = X[..., :-1]
+
+ if X.shape[-2] == 31:
+ X[..., :17, :] = X[..., :17, :] - X[..., [12, 11], :].mean(-2, keepdims=True)
+ X[..., 17:, :] = X[..., 17:, :] - X[..., [19, 20], :].mean(-2, keepdims=True)
+
+ elif joint_type == 'coco':
+ X = X - X[..., [12, 11], :].mean(-2, keepdims=True)
+
+ elif joint_type == 'common':
+ X = X - X[..., [2, 3], :].mean(-2, keepdims=True)
+
+ if conf is not None:
+ X = torch.cat((X, conf), dim=-1)
+
+ return X
+
+
+def convert_kps(joints2d, src, dst):
+ src_names = eval(f'get_{src}_joint_names')()
+ dst_names = eval(f'get_{dst}_joint_names')()
+
+ if isinstance(joints2d, np.ndarray):
+ out_joints2d = np.zeros((*joints2d.shape[:-2], len(dst_names), joints2d.shape[-1]))
+ else:
+ out_joints2d = torch.zeros((*joints2d.shape[:-2], len(dst_names), joints2d.shape[-1]), device=joints2d.device)
+
+ for idx, jn in enumerate(dst_names):
+ if jn in src_names:
+ out_joints2d[..., idx, :] = joints2d[..., src_names.index(jn), :]
+
+ return out_joints2d
+
+def get_perm_idxs(src, dst):
+ src_names = eval(f'get_{src}_joint_names')()
+ dst_names = eval(f'get_{dst}_joint_names')()
+ idxs = [src_names.index(h) for h in dst_names if h in src_names]
+ return idxs
+
+def get_mpii3d_test_joint_names():
+ return [
+ 'headtop', # 'head_top',
+ 'neck',
+ 'rshoulder',# 'right_shoulder',
+ 'relbow',# 'right_elbow',
+ 'rwrist',# 'right_wrist',
+ 'lshoulder',# 'left_shoulder',
+ 'lelbow', # 'left_elbow',
+ 'lwrist', # 'left_wrist',
+ 'rhip', # 'right_hip',
+ 'rknee', # 'right_knee',
+ 'rankle',# 'right_ankle',
+ 'lhip',# 'left_hip',
+ 'lknee',# 'left_knee',
+ 'lankle',# 'left_ankle'
+ 'hip',# 'pelvis',
+ 'Spine (H36M)',# 'spine',
+ 'Head (H36M)',# 'head'
+ ]
+
+def get_mpii3d_joint_names():
+ return [
+ 'spine3', # 0,
+ 'spine4', # 1,
+ 'spine2', # 2,
+ 'Spine (H36M)', #'spine', # 3,
+ 'hip', # 'pelvis', # 4,
+ 'neck', # 5,
+ 'Head (H36M)', # 'head', # 6,
+ "headtop", # 'head_top', # 7,
+ 'left_clavicle', # 8,
+ "lshoulder", # 'left_shoulder', # 9,
+ "lelbow", # 'left_elbow',# 10,
+ "lwrist", # 'left_wrist',# 11,
+ 'left_hand',# 12,
+ 'right_clavicle',# 13,
+ 'rshoulder',# 'right_shoulder',# 14,
+ 'relbow',# 'right_elbow',# 15,
+ 'rwrist',# 'right_wrist',# 16,
+ 'right_hand',# 17,
+ 'lhip', # left_hip',# 18,
+ 'lknee', # 'left_knee',# 19,
+ 'lankle', #left ankle # 20
+ 'left_foot', # 21
+ 'left_toe', # 22
+ "rhip", # 'right_hip',# 23
+ "rknee", # 'right_knee',# 24
+ "rankle", #'right_ankle', # 25
+ 'right_foot',# 26
+ 'right_toe' # 27
+ ]
+
+def get_insta_joint_names():
+ return [
+ 'OP RHeel',
+ 'OP RKnee',
+ 'OP RHip',
+ 'OP LHip',
+ 'OP LKnee',
+ 'OP LHeel',
+ 'OP RWrist',
+ 'OP RElbow',
+ 'OP RShoulder',
+ 'OP LShoulder',
+ 'OP LElbow',
+ 'OP LWrist',
+ 'OP Neck',
+ 'headtop',
+ 'OP Nose',
+ 'OP LEye',
+ 'OP REye',
+ 'OP LEar',
+ 'OP REar',
+ 'OP LBigToe',
+ 'OP RBigToe',
+ 'OP LSmallToe',
+ 'OP RSmallToe',
+ 'OP LAnkle',
+ 'OP RAnkle',
+ ]
+
+def get_insta_skeleton():
+ return np.array(
+ [
+ [0 , 1],
+ [1 , 2],
+ [2 , 3],
+ [3 , 4],
+ [4 , 5],
+ [6 , 7],
+ [7 , 8],
+ [8 , 9],
+ [9 ,10],
+ [2 , 8],
+ [3 , 9],
+ [10,11],
+ [8 ,12],
+ [9 ,12],
+ [12,13],
+ [12,14],
+ [14,15],
+ [14,16],
+ [15,17],
+ [16,18],
+ [0 ,20],
+ [20,22],
+ [5 ,19],
+ [19,21],
+ [5 ,23],
+ [0 ,24],
+ ])
+
+def get_staf_skeleton():
+ return np.array(
+ [
+ [0, 1],
+ [1, 2],
+ [2, 3],
+ [3, 4],
+ [1, 5],
+ [5, 6],
+ [6, 7],
+ [1, 8],
+ [8, 9],
+ [9, 10],
+ [10, 11],
+ [8, 12],
+ [12, 13],
+ [13, 14],
+ [0, 15],
+ [0, 16],
+ [15, 17],
+ [16, 18],
+ [2, 9],
+ [5, 12],
+ [1, 19],
+ [20, 19],
+ ]
+ )
+
+def get_staf_joint_names():
+ return [
+ 'OP Nose', # 0,
+ 'OP Neck', # 1,
+ 'OP RShoulder', # 2,
+ 'OP RElbow', # 3,
+ 'OP RWrist', # 4,
+ 'OP LShoulder', # 5,
+ 'OP LElbow', # 6,
+ 'OP LWrist', # 7,
+ 'OP MidHip', # 8,
+ 'OP RHip', # 9,
+ 'OP RKnee', # 10,
+ 'OP RAnkle', # 11,
+ 'OP LHip', # 12,
+ 'OP LKnee', # 13,
+ 'OP LAnkle', # 14,
+ 'OP REye', # 15,
+ 'OP LEye', # 16,
+ 'OP REar', # 17,
+ 'OP LEar', # 18,
+ 'Neck (LSP)', # 19,
+ 'Top of Head (LSP)', # 20,
+ ]
+
+def get_spin_joint_names():
+ return [
+ 'OP Nose', # 0
+ 'OP Neck', # 1
+ 'OP RShoulder', # 2
+ 'OP RElbow', # 3
+ 'OP RWrist', # 4
+ 'OP LShoulder', # 5
+ 'OP LElbow', # 6
+ 'OP LWrist', # 7
+ 'OP MidHip', # 8
+ 'OP RHip', # 9
+ 'OP RKnee', # 10
+ 'OP RAnkle', # 11
+ 'OP LHip', # 12
+ 'OP LKnee', # 13
+ 'OP LAnkle', # 14
+ 'OP REye', # 15
+ 'OP LEye', # 16
+ 'OP REar', # 17
+ 'OP LEar', # 18
+ 'OP LBigToe', # 19
+ 'OP LSmallToe', # 20
+ 'OP LHeel', # 21
+ 'OP RBigToe', # 22
+ 'OP RSmallToe', # 23
+ 'OP RHeel', # 24
+ 'rankle', # 25
+ 'rknee', # 26
+ 'rhip', # 27
+ 'lhip', # 28
+ 'lknee', # 29
+ 'lankle', # 30
+ 'rwrist', # 31
+ 'relbow', # 32
+ 'rshoulder', # 33
+ 'lshoulder', # 34
+ 'lelbow', # 35
+ 'lwrist', # 36
+ 'neck', # 37
+ 'headtop', # 38
+ 'hip', # 39 'Pelvis (MPII)', # 39
+ 'thorax', # 40 'Thorax (MPII)', # 40
+ 'Spine (H36M)', # 41
+ 'Jaw (H36M)', # 42
+ 'Head (H36M)', # 43
+ 'nose', # 44
+ 'leye', # 45 'Left Eye', # 45
+ 'reye', # 46 'Right Eye', # 46
+ 'lear', # 47 'Left Ear', # 47
+ 'rear', # 48 'Right Ear', # 48
+ ]
+
+def get_h36m_joint_names():
+ return [
+ 'hip', # 0
+ 'lhip', # 1
+ 'lknee', # 2
+ 'lankle', # 3
+ 'rhip', # 4
+ 'rknee', # 5
+ 'rankle', # 6
+ 'Spine (H36M)', # 7
+ 'neck', # 8
+ 'Head (H36M)', # 9
+ 'headtop', # 10
+ 'lshoulder', # 11
+ 'lelbow', # 12
+ 'lwrist', # 13
+ 'rshoulder', # 14
+ 'relbow', # 15
+ 'rwrist', # 16
+ ]
+
+'Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head_top', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist'
+
+def get_spin_skeleton():
+ return np.array(
+ [
+ [0 , 1],
+ [1 , 2],
+ [2 , 3],
+ [3 , 4],
+ [1 , 5],
+ [5 , 6],
+ [6 , 7],
+ [1 , 8],
+ [8 , 9],
+ [9 ,10],
+ [10,11],
+ [8 ,12],
+ [12,13],
+ [13,14],
+ [0 ,15],
+ [0 ,16],
+ [15,17],
+ [16,18],
+ [21,19],
+ [19,20],
+ [14,21],
+ [11,24],
+ [24,22],
+ [22,23],
+ [0 ,38],
+ ]
+ )
+
+def get_posetrack_joint_names():
+ return [
+ "nose",
+ "neck",
+ "headtop",
+ "lear",
+ "rear",
+ "lshoulder",
+ "rshoulder",
+ "lelbow",
+ "relbow",
+ "lwrist",
+ "rwrist",
+ "lhip",
+ "rhip",
+ "lknee",
+ "rknee",
+ "lankle",
+ "rankle"
+ ]
+
+def get_posetrack_original_kp_names():
+ return [
+ 'nose',
+ 'head_bottom',
+ 'head_top',
+ 'left_ear',
+ 'right_ear',
+ 'left_shoulder',
+ 'right_shoulder',
+ 'left_elbow',
+ 'right_elbow',
+ 'left_wrist',
+ 'right_wrist',
+ 'left_hip',
+ 'right_hip',
+ 'left_knee',
+ 'right_knee',
+ 'left_ankle',
+ 'right_ankle'
+ ]
+
+def get_pennaction_joint_names():
+ return [
+ "headtop", # 0
+ "lshoulder", # 1
+ "rshoulder", # 2
+ "lelbow", # 3
+ "relbow", # 4
+ "lwrist", # 5
+ "rwrist", # 6
+ "lhip" , # 7
+ "rhip" , # 8
+ "lknee", # 9
+ "rknee" , # 10
+ "lankle", # 11
+ "rankle" # 12
+ ]
+
+def get_common_joint_names():
+ return [
+ "rankle", # 0 "lankle", # 0
+ "rknee", # 1 "lknee", # 1
+ "rhip", # 2 "lhip", # 2
+ "lhip", # 3 "rhip", # 3
+ "lknee", # 4 "rknee", # 4
+ "lankle", # 5 "rankle", # 5
+ "rwrist", # 6 "lwrist", # 6
+ "relbow", # 7 "lelbow", # 7
+ "rshoulder", # 8 "lshoulder", # 8
+ "lshoulder", # 9 "rshoulder", # 9
+ "lelbow", # 10 "relbow", # 10
+ "lwrist", # 11 "rwrist", # 11
+ "neck", # 12 "neck", # 12
+ "headtop", # 13 "headtop", # 13
+ ]
+
+def get_coco_common_joint_names():
+ return [
+ "nose", # 0
+ "leye", # 1
+ "reye", # 2
+ "lear", # 3
+ "rear", # 4
+ "lshoulder", # 5
+ "rshoulder", # 6
+ "lelbow", # 7
+ "relbow", # 8
+ "lwrist", # 9
+ "rwrist", # 10
+ "lhip", # 11
+ "rhip", # 12
+ "lknee", # 13
+ "rknee", # 14
+ "lankle", # 15
+ "rankle", # 16
+ "neck", # 17 "neck", # 12
+ "headtop", # 18 "headtop", # 13
+ ]
+
+def get_common_skeleton():
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 1, 2 ],
+ [ 3, 4 ],
+ [ 4, 5 ],
+ [ 6, 7 ],
+ [ 7, 8 ],
+ [ 8, 2 ],
+ [ 8, 9 ],
+ [ 9, 3 ],
+ [ 2, 3 ],
+ [ 8, 12],
+ [ 9, 10],
+ [12, 9 ],
+ [10, 11],
+ [12, 13],
+ ]
+ )
+
+def get_coco_joint_names():
+ return [
+ "nose", # 0
+ "leye", # 1
+ "reye", # 2
+ "lear", # 3
+ "rear", # 4
+ "lshoulder", # 5
+ "rshoulder", # 6
+ "lelbow", # 7
+ "relbow", # 8
+ "lwrist", # 9
+ "rwrist", # 10
+ "lhip", # 11
+ "rhip", # 12
+ "lknee", # 13
+ "rknee", # 14
+ "lankle", # 15
+ "rankle", # 16
+ ]
+
+def get_coco_skeleton():
+ # 0 - nose,
+ # 1 - leye,
+ # 2 - reye,
+ # 3 - lear,
+ # 4 - rear,
+ # 5 - lshoulder,
+ # 6 - rshoulder,
+ # 7 - lelbow,
+ # 8 - relbow,
+ # 9 - lwrist,
+ # 10 - rwrist,
+ # 11 - lhip,
+ # 12 - rhip,
+ # 13 - lknee,
+ # 14 - rknee,
+ # 15 - lankle,
+ # 16 - rankle,
+ return np.array(
+ [
+ [15, 13],
+ [13, 11],
+ [16, 14],
+ [14, 12],
+ [11, 12],
+ [ 5, 11],
+ [ 6, 12],
+ [ 5, 6 ],
+ [ 5, 7 ],
+ [ 6, 8 ],
+ [ 7, 9 ],
+ [ 8, 10],
+ [ 1, 2 ],
+ [ 0, 1 ],
+ [ 0, 2 ],
+ [ 1, 3 ],
+ [ 2, 4 ],
+ [ 3, 5 ],
+ [ 4, 6 ]
+ ]
+ )
+
+def get_mpii_joint_names():
+ return [
+ "rankle", # 0
+ "rknee", # 1
+ "rhip", # 2
+ "lhip", # 3
+ "lknee", # 4
+ "lankle", # 5
+ "hip", # 6
+ "thorax", # 7
+ "neck", # 8
+ "headtop", # 9
+ "rwrist", # 10
+ "relbow", # 11
+ "rshoulder", # 12
+ "lshoulder", # 13
+ "lelbow", # 14
+ "lwrist", # 15
+ ]
+
+def get_mpii_skeleton():
+ # 0 - rankle,
+ # 1 - rknee,
+ # 2 - rhip,
+ # 3 - lhip,
+ # 4 - lknee,
+ # 5 - lankle,
+ # 6 - hip,
+ # 7 - thorax,
+ # 8 - neck,
+ # 9 - headtop,
+ # 10 - rwrist,
+ # 11 - relbow,
+ # 12 - rshoulder,
+ # 13 - lshoulder,
+ # 14 - lelbow,
+ # 15 - lwrist,
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 1, 2 ],
+ [ 2, 6 ],
+ [ 6, 3 ],
+ [ 3, 4 ],
+ [ 4, 5 ],
+ [ 6, 7 ],
+ [ 7, 8 ],
+ [ 8, 9 ],
+ [ 7, 12],
+ [12, 11],
+ [11, 10],
+ [ 7, 13],
+ [13, 14],
+ [14, 15]
+ ]
+ )
+
+def get_aich_joint_names():
+ return [
+ "rshoulder", # 0
+ "relbow", # 1
+ "rwrist", # 2
+ "lshoulder", # 3
+ "lelbow", # 4
+ "lwrist", # 5
+ "rhip", # 6
+ "rknee", # 7
+ "rankle", # 8
+ "lhip", # 9
+ "lknee", # 10
+ "lankle", # 11
+ "headtop", # 12
+ "neck", # 13
+ ]
+
+def get_aich_skeleton():
+ # 0 - rshoulder,
+ # 1 - relbow,
+ # 2 - rwrist,
+ # 3 - lshoulder,
+ # 4 - lelbow,
+ # 5 - lwrist,
+ # 6 - rhip,
+ # 7 - rknee,
+ # 8 - rankle,
+ # 9 - lhip,
+ # 10 - lknee,
+ # 11 - lankle,
+ # 12 - headtop,
+ # 13 - neck,
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 1, 2 ],
+ [ 3, 4 ],
+ [ 4, 5 ],
+ [ 6, 7 ],
+ [ 7, 8 ],
+ [ 9, 10],
+ [10, 11],
+ [12, 13],
+ [13, 0 ],
+ [13, 3 ],
+ [ 0, 6 ],
+ [ 3, 9 ]
+ ]
+ )
+
+def get_3dpw_joint_names():
+ return [
+ "nose", # 0
+ "thorax", # 1
+ "rshoulder", # 2
+ "relbow", # 3
+ "rwrist", # 4
+ "lshoulder", # 5
+ "lelbow", # 6
+ "lwrist", # 7
+ "rhip", # 8
+ "rknee", # 9
+ "rankle", # 10
+ "lhip", # 11
+ "lknee", # 12
+ "lankle", # 13
+ ]
+
+def get_3dpw_skeleton():
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 1, 2 ],
+ [ 2, 3 ],
+ [ 3, 4 ],
+ [ 1, 5 ],
+ [ 5, 6 ],
+ [ 6, 7 ],
+ [ 2, 8 ],
+ [ 5, 11],
+ [ 8, 11],
+ [ 8, 9 ],
+ [ 9, 10],
+ [11, 12],
+ [12, 13]
+ ]
+ )
+
+def get_smplcoco_joint_names():
+ return [
+ "rankle", # 0
+ "rknee", # 1
+ "rhip", # 2
+ "lhip", # 3
+ "lknee", # 4
+ "lankle", # 5
+ "rwrist", # 6
+ "relbow", # 7
+ "rshoulder", # 8
+ "lshoulder", # 9
+ "lelbow", # 10
+ "lwrist", # 11
+ "neck", # 12
+ "headtop", # 13
+ "nose", # 14
+ "leye", # 15
+ "reye", # 16
+ "lear", # 17
+ "rear", # 18
+ ]
+
+def get_smplcoco_skeleton():
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 1, 2 ],
+ [ 3, 4 ],
+ [ 4, 5 ],
+ [ 6, 7 ],
+ [ 7, 8 ],
+ [ 8, 12],
+ [12, 9 ],
+ [ 9, 10],
+ [10, 11],
+ [12, 13],
+ [14, 15],
+ [15, 17],
+ [16, 18],
+ [14, 16],
+ [ 8, 2 ],
+ [ 9, 3 ],
+ [ 2, 3 ],
+ ]
+ )
+
+def get_smpl_joint_names():
+ return [
+ 'hips', # 0
+ 'leftUpLeg', # 1
+ 'rightUpLeg', # 2
+ 'spine', # 3
+ 'leftLeg', # 4
+ 'rightLeg', # 5
+ 'spine1', # 6
+ 'leftFoot', # 7
+ 'rightFoot', # 8
+ 'spine2', # 9
+ 'leftToeBase', # 10
+ 'rightToeBase', # 11
+ 'neck', # 12
+ 'leftShoulder', # 13
+ 'rightShoulder', # 14
+ 'head', # 15
+ 'leftArm', # 16
+ 'rightArm', # 17
+ 'leftForeArm', # 18
+ 'rightForeArm', # 19
+ 'leftHand', # 20
+ 'rightHand', # 21
+ 'leftHandIndex1', # 22
+ 'rightHandIndex1', # 23
+ ]
+
+def get_smpl_skeleton():
+ return np.array(
+ [
+ [ 0, 1 ],
+ [ 0, 2 ],
+ [ 0, 3 ],
+ [ 1, 4 ],
+ [ 2, 5 ],
+ [ 3, 6 ],
+ [ 4, 7 ],
+ [ 5, 8 ],
+ [ 6, 9 ],
+ [ 7, 10],
+ [ 8, 11],
+ [ 9, 12],
+ [ 9, 13],
+ [ 9, 14],
+ [12, 15],
+ [13, 16],
+ [14, 17],
+ [16, 18],
+ [17, 19],
+ [18, 20],
+ [19, 21],
+ [20, 22],
+ [21, 23],
+ ]
+ )
\ No newline at end of file
diff --git a/lib/utils/transforms.py b/lib/utils/transforms.py
new file mode 100644
index 0000000000000000000000000000000000000000..cb207f302a51797b501e32d05c572ddc461c473e
--- /dev/null
+++ b/lib/utils/transforms.py
@@ -0,0 +1,828 @@
+"""This transforms function is mainly borrowed from PyTorch3D"""
+
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+
+from typing import Optional, Union
+
+import torch
+import torch.nn.functional as F
+
+Device = Union[str, torch.device]
+
+"""
+The transformation matrices returned from the functions in this file assume
+the points on which the transformation will be applied are column vectors.
+i.e. the R matrix is structured as
+
+ R = [
+ [Rxx, Rxy, Rxz],
+ [Ryx, Ryy, Ryz],
+ [Rzx, Rzy, Rzz],
+ ] # (3, 3)
+
+This matrix can be applied to column vectors by post multiplication
+by the points e.g.
+
+ points = [[0], [1], [2]] # (3 x 1) xyz coordinates of a point
+ transformed_points = R * points
+
+To apply the same matrix to points which are row vectors, the R matrix
+can be transposed and pre multiplied by the points:
+
+e.g.
+ points = [[0, 1, 2]] # (1 x 3) xyz coordinates of a point
+ transformed_points = points * R.transpose(1, 0)
+"""
+
+
+def quaternion_to_matrix(quaternions: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as quaternions to rotation matrices.
+
+ Args:
+ quaternions: quaternions with real part first,
+ as tensor of shape (..., 4).
+
+ Returns:
+ Rotation matrices as tensor of shape (..., 3, 3).
+ """
+ r, i, j, k = torch.unbind(quaternions, -1)
+ # pyre-fixme[58]: `/` is not supported for operand types `float` and `Tensor`.
+ two_s = 2.0 / (quaternions * quaternions).sum(-1)
+
+ o = torch.stack(
+ (
+ 1 - two_s * (j * j + k * k),
+ two_s * (i * j - k * r),
+ two_s * (i * k + j * r),
+ two_s * (i * j + k * r),
+ 1 - two_s * (i * i + k * k),
+ two_s * (j * k - i * r),
+ two_s * (i * k - j * r),
+ two_s * (j * k + i * r),
+ 1 - two_s * (i * i + j * j),
+ ),
+ -1,
+ )
+ return o.reshape(quaternions.shape[:-1] + (3, 3))
+
+
+
+def _copysign(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+ """
+ Return a tensor where each element has the absolute value taken from the,
+ corresponding element of a, with sign taken from the corresponding
+ element of b. This is like the standard copysign floating-point operation,
+ but is not careful about negative 0 and NaN.
+
+ Args:
+ a: source tensor.
+ b: tensor whose signs will be used, of the same shape as a.
+
+ Returns:
+ Tensor of the same shape as a with the signs of b.
+ """
+ signs_differ = (a < 0) != (b < 0)
+ return torch.where(signs_differ, -a, a)
+
+
+def _sqrt_positive_part(x: torch.Tensor) -> torch.Tensor:
+ """
+ Returns torch.sqrt(torch.max(0, x))
+ but with a zero subgradient where x is 0.
+ """
+ ret = torch.zeros_like(x)
+ positive_mask = x > 0
+ ret[positive_mask] = torch.sqrt(x[positive_mask])
+ return ret
+
+
+def matrix_to_quaternion(matrix: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as rotation matrices to quaternions.
+
+ Args:
+ matrix: Rotation matrices as tensor of shape (..., 3, 3).
+
+ Returns:
+ quaternions with real part first, as tensor of shape (..., 4).
+ """
+ if matrix.size(-1) != 3 or matrix.size(-2) != 3:
+ raise ValueError(f"Invalid rotation matrix shape {matrix.shape}.")
+
+ batch_dim = matrix.shape[:-2]
+ m00, m01, m02, m10, m11, m12, m20, m21, m22 = torch.unbind(
+ matrix.reshape(batch_dim + (9,)), dim=-1
+ )
+
+ q_abs = _sqrt_positive_part(
+ torch.stack(
+ [
+ 1.0 + m00 + m11 + m22,
+ 1.0 + m00 - m11 - m22,
+ 1.0 - m00 + m11 - m22,
+ 1.0 - m00 - m11 + m22,
+ ],
+ dim=-1,
+ )
+ )
+
+ # we produce the desired quaternion multiplied by each of r, i, j, k
+ quat_by_rijk = torch.stack(
+ [
+ # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+ # `int`.
+ torch.stack([q_abs[..., 0] ** 2, m21 - m12, m02 - m20, m10 - m01], dim=-1),
+ # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+ # `int`.
+ torch.stack([m21 - m12, q_abs[..., 1] ** 2, m10 + m01, m02 + m20], dim=-1),
+ # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+ # `int`.
+ torch.stack([m02 - m20, m10 + m01, q_abs[..., 2] ** 2, m12 + m21], dim=-1),
+ # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+ # `int`.
+ torch.stack([m10 - m01, m20 + m02, m21 + m12, q_abs[..., 3] ** 2], dim=-1),
+ ],
+ dim=-2,
+ )
+
+ # We floor here at 0.1 but the exact level is not important; if q_abs is small,
+ # the candidate won't be picked.
+ flr = torch.tensor(0.1).to(dtype=q_abs.dtype, device=q_abs.device)
+ quat_candidates = quat_by_rijk / (2.0 * q_abs[..., None].max(flr))
+
+ # if not for numerical problems, quat_candidates[i] should be same (up to a sign),
+ # forall i; we pick the best-conditioned one (with the largest denominator)
+
+ return quat_candidates[
+ F.one_hot(q_abs.argmax(dim=-1), num_classes=4) > 0.5, :
+ ].reshape(batch_dim + (4,))
+
+
+
+def _axis_angle_rotation(axis: str, angle: torch.Tensor) -> torch.Tensor:
+ """
+ Return the rotation matrices for one of the rotations about an axis
+ of which Euler angles describe, for each value of the angle given.
+
+ Args:
+ axis: Axis label "X" or "Y or "Z".
+ angle: any shape tensor of Euler angles in radians
+
+ Returns:
+ Rotation matrices as tensor of shape (..., 3, 3).
+ """
+
+ cos = torch.cos(angle)
+ sin = torch.sin(angle)
+ one = torch.ones_like(angle)
+ zero = torch.zeros_like(angle)
+
+ if axis == "X":
+ R_flat = (one, zero, zero, zero, cos, -sin, zero, sin, cos)
+ elif axis == "Y":
+ R_flat = (cos, zero, sin, zero, one, zero, -sin, zero, cos)
+ elif axis == "Z":
+ R_flat = (cos, -sin, zero, sin, cos, zero, zero, zero, one)
+ else:
+ raise ValueError("letter must be either X, Y or Z.")
+
+ return torch.stack(R_flat, -1).reshape(angle.shape + (3, 3))
+
+
+def euler_angles_to_matrix(euler_angles: torch.Tensor, convention: str) -> torch.Tensor:
+ """
+ Convert rotations given as Euler angles in radians to rotation matrices.
+
+ Args:
+ euler_angles: Euler angles in radians as tensor of shape (..., 3).
+ convention: Convention string of three uppercase letters from
+ {"X", "Y", and "Z"}.
+
+ Returns:
+ Rotation matrices as tensor of shape (..., 3, 3).
+ """
+ if euler_angles.dim() == 0 or euler_angles.shape[-1] != 3:
+ raise ValueError("Invalid input euler angles.")
+ if len(convention) != 3:
+ raise ValueError("Convention must have 3 letters.")
+ if convention[1] in (convention[0], convention[2]):
+ raise ValueError(f"Invalid convention {convention}.")
+ for letter in convention:
+ if letter not in ("X", "Y", "Z"):
+ raise ValueError(f"Invalid letter {letter} in convention string.")
+ matrices = [
+ _axis_angle_rotation(c, e)
+ for c, e in zip(convention, torch.unbind(euler_angles, -1))
+ ]
+ # return functools.reduce(torch.matmul, matrices)
+ return torch.matmul(torch.matmul(matrices[0], matrices[1]), matrices[2])
+
+
+
+def _angle_from_tan(
+ axis: str, other_axis: str, data, horizontal: bool, tait_bryan: bool
+) -> torch.Tensor:
+ """
+ Extract the first or third Euler angle from the two members of
+ the matrix which are positive constant times its sine and cosine.
+
+ Args:
+ axis: Axis label "X" or "Y or "Z" for the angle we are finding.
+ other_axis: Axis label "X" or "Y or "Z" for the middle axis in the
+ convention.
+ data: Rotation matrices as tensor of shape (..., 3, 3).
+ horizontal: Whether we are looking for the angle for the third axis,
+ which means the relevant entries are in the same row of the
+ rotation matrix. If not, they are in the same column.
+ tait_bryan: Whether the first and third axes in the convention differ.
+
+ Returns:
+ Euler Angles in radians for each matrix in data as a tensor
+ of shape (...).
+ """
+
+ i1, i2 = {"X": (2, 1), "Y": (0, 2), "Z": (1, 0)}[axis]
+ if horizontal:
+ i2, i1 = i1, i2
+ even = (axis + other_axis) in ["XY", "YZ", "ZX"]
+ if horizontal == even:
+ return torch.atan2(data[..., i1], data[..., i2])
+ if tait_bryan:
+ return torch.atan2(-data[..., i2], data[..., i1])
+ return torch.atan2(data[..., i2], -data[..., i1])
+
+
+def _index_from_letter(letter: str) -> int:
+ if letter == "X":
+ return 0
+ if letter == "Y":
+ return 1
+ if letter == "Z":
+ return 2
+ raise ValueError("letter must be either X, Y or Z.")
+
+
+def matrix_to_euler_angles(matrix: torch.Tensor, convention: str) -> torch.Tensor:
+ """
+ Convert rotations given as rotation matrices to Euler angles in radians.
+
+ Args:
+ matrix: Rotation matrices as tensor of shape (..., 3, 3).
+ convention: Convention string of three uppercase letters.
+
+ Returns:
+ Euler angles in radians as tensor of shape (..., 3).
+ """
+ if len(convention) != 3:
+ raise ValueError("Convention must have 3 letters.")
+ if convention[1] in (convention[0], convention[2]):
+ raise ValueError(f"Invalid convention {convention}.")
+ for letter in convention:
+ if letter not in ("X", "Y", "Z"):
+ raise ValueError(f"Invalid letter {letter} in convention string.")
+ if matrix.size(-1) != 3 or matrix.size(-2) != 3:
+ raise ValueError(f"Invalid rotation matrix shape {matrix.shape}.")
+ i0 = _index_from_letter(convention[0])
+ i2 = _index_from_letter(convention[2])
+ tait_bryan = i0 != i2
+ if tait_bryan:
+ central_angle = torch.asin(
+ matrix[..., i0, i2] * (-1.0 if i0 - i2 in [-1, 2] else 1.0)
+ )
+ else:
+ central_angle = torch.acos(matrix[..., i0, i0])
+
+ o = (
+ _angle_from_tan(
+ convention[0], convention[1], matrix[..., i2], False, tait_bryan
+ ),
+ central_angle,
+ _angle_from_tan(
+ convention[2], convention[1], matrix[..., i0, :], True, tait_bryan
+ ),
+ )
+ return torch.stack(o, -1)
+
+
+
+def random_quaternions(
+ n: int, dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+ """
+ Generate random quaternions representing rotations,
+ i.e. versors with nonnegative real part.
+
+ Args:
+ n: Number of quaternions in a batch to return.
+ dtype: Type to return.
+ device: Desired device of returned tensor. Default:
+ uses the current device for the default tensor type.
+
+ Returns:
+ Quaternions as tensor of shape (N, 4).
+ """
+ if isinstance(device, str):
+ device = torch.device(device)
+ o = torch.randn((n, 4), dtype=dtype, device=device)
+ s = (o * o).sum(1)
+ o = o / _copysign(torch.sqrt(s), o[:, 0])[:, None]
+ return o
+
+
+
+def random_rotations(
+ n: int, dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+ """
+ Generate random rotations as 3x3 rotation matrices.
+
+ Args:
+ n: Number of rotation matrices in a batch to return.
+ dtype: Type to return.
+ device: Device of returned tensor. Default: if None,
+ uses the current device for the default tensor type.
+
+ Returns:
+ Rotation matrices as tensor of shape (n, 3, 3).
+ """
+ quaternions = random_quaternions(n, dtype=dtype, device=device)
+ return quaternion_to_matrix(quaternions)
+
+
+
+def random_rotation(
+ dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+ """
+ Generate a single random 3x3 rotation matrix.
+
+ Args:
+ dtype: Type to return
+ device: Device of returned tensor. Default: if None,
+ uses the current device for the default tensor type
+
+ Returns:
+ Rotation matrix as tensor of shape (3, 3).
+ """
+ return random_rotations(1, dtype, device)[0]
+
+
+
+def standardize_quaternion(quaternions: torch.Tensor) -> torch.Tensor:
+ """
+ Convert a unit quaternion to a standard form: one in which the real
+ part is non negative.
+
+ Args:
+ quaternions: Quaternions with real part first,
+ as tensor of shape (..., 4).
+
+ Returns:
+ Standardized quaternions as tensor of shape (..., 4).
+ """
+ return torch.where(quaternions[..., 0:1] < 0, -quaternions, quaternions)
+
+
+
+def quaternion_raw_multiply(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+ """
+ Multiply two quaternions.
+ Usual torch rules for broadcasting apply.
+
+ Args:
+ a: Quaternions as tensor of shape (..., 4), real part first.
+ b: Quaternions as tensor of shape (..., 4), real part first.
+
+ Returns:
+ The product of a and b, a tensor of quaternions shape (..., 4).
+ """
+ aw, ax, ay, az = torch.unbind(a, -1)
+ bw, bx, by, bz = torch.unbind(b, -1)
+ ow = aw * bw - ax * bx - ay * by - az * bz
+ ox = aw * bx + ax * bw + ay * bz - az * by
+ oy = aw * by - ax * bz + ay * bw + az * bx
+ oz = aw * bz + ax * by - ay * bx + az * bw
+ return torch.stack((ow, ox, oy, oz), -1)
+
+
+
+def quaternion_multiply(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+ """
+ Multiply two quaternions representing rotations, returning the quaternion
+ representing their composition, i.e. the versor with nonnegative real part.
+ Usual torch rules for broadcasting apply.
+
+ Args:
+ a: Quaternions as tensor of shape (..., 4), real part first.
+ b: Quaternions as tensor of shape (..., 4), real part first.
+
+ Returns:
+ The product of a and b, a tensor of quaternions of shape (..., 4).
+ """
+ ab = quaternion_raw_multiply(a, b)
+ return standardize_quaternion(ab)
+
+
+
+def quaternion_invert(quaternion: torch.Tensor) -> torch.Tensor:
+ """
+ Given a quaternion representing rotation, get the quaternion representing
+ its inverse.
+
+ Args:
+ quaternion: Quaternions as tensor of shape (..., 4), with real part
+ first, which must be versors (unit quaternions).
+
+ Returns:
+ The inverse, a tensor of quaternions of shape (..., 4).
+ """
+
+ scaling = torch.tensor([1, -1, -1, -1], device=quaternion.device)
+ return quaternion * scaling
+
+
+
+def quaternion_apply(quaternion: torch.Tensor, point: torch.Tensor) -> torch.Tensor:
+ """
+ Apply the rotation given by a quaternion to a 3D point.
+ Usual torch rules for broadcasting apply.
+
+ Args:
+ quaternion: Tensor of quaternions, real part first, of shape (..., 4).
+ point: Tensor of 3D points of shape (..., 3).
+
+ Returns:
+ Tensor of rotated points of shape (..., 3).
+ """
+ if point.size(-1) != 3:
+ raise ValueError(f"Points are not in 3D, {point.shape}.")
+ real_parts = point.new_zeros(point.shape[:-1] + (1,))
+ point_as_quaternion = torch.cat((real_parts, point), -1)
+ out = quaternion_raw_multiply(
+ quaternion_raw_multiply(quaternion, point_as_quaternion),
+ quaternion_invert(quaternion),
+ )
+ return out[..., 1:]
+
+
+
+def axis_angle_to_matrix(axis_angle: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as axis/angle to rotation matrices.
+
+ Args:
+ axis_angle: Rotations given as a vector in axis angle form,
+ as a tensor of shape (..., 3), where the magnitude is
+ the angle turned anticlockwise in radians around the
+ vector's direction.
+
+ Returns:
+ Rotation matrices as tensor of shape (..., 3, 3).
+ """
+ return quaternion_to_matrix(axis_angle_to_quaternion(axis_angle))
+
+
+
+def matrix_to_axis_angle(matrix: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as rotation matrices to axis/angle.
+
+ Args:
+ matrix: Rotation matrices as tensor of shape (..., 3, 3).
+
+ Returns:
+ Rotations given as a vector in axis angle form, as a tensor
+ of shape (..., 3), where the magnitude is the angle
+ turned anticlockwise in radians around the vector's
+ direction.
+ """
+ return quaternion_to_axis_angle(matrix_to_quaternion(matrix))
+
+
+
+def axis_angle_to_quaternion(axis_angle: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as axis/angle to quaternions.
+
+ Args:
+ axis_angle: Rotations given as a vector in axis angle form,
+ as a tensor of shape (..., 3), where the magnitude is
+ the angle turned anticlockwise in radians around the
+ vector's direction.
+
+ Returns:
+ quaternions with real part first, as tensor of shape (..., 4).
+ """
+ angles = torch.norm(axis_angle, p=2, dim=-1, keepdim=True)
+ half_angles = angles * 0.5
+ eps = 1e-6
+ small_angles = angles.abs() < eps
+ sin_half_angles_over_angles = torch.empty_like(angles)
+ sin_half_angles_over_angles[~small_angles] = (
+ torch.sin(half_angles[~small_angles]) / angles[~small_angles]
+ )
+ # for x small, sin(x/2) is about x/2 - (x/2)^3/6
+ # so sin(x/2)/x is about 1/2 - (x*x)/48
+ sin_half_angles_over_angles[small_angles] = (
+ 0.5 - (angles[small_angles] * angles[small_angles]) / 48
+ )
+ quaternions = torch.cat(
+ [torch.cos(half_angles), axis_angle * sin_half_angles_over_angles], dim=-1
+ )
+ return quaternions
+
+
+
+def quaternion_to_axis_angle(quaternions: torch.Tensor) -> torch.Tensor:
+ """
+ Convert rotations given as quaternions to axis/angle.
+
+ Args:
+ quaternions: quaternions with real part first,
+ as tensor of shape (..., 4).
+
+ Returns:
+ Rotations given as a vector in axis angle form, as a tensor
+ of shape (..., 3), where the magnitude is the angle
+ turned anticlockwise in radians around the vector's
+ direction.
+ """
+ norms = torch.norm(quaternions[..., 1:], p=2, dim=-1, keepdim=True)
+ half_angles = torch.atan2(norms, quaternions[..., :1])
+ angles = 2 * half_angles
+ eps = 1e-6
+ small_angles = angles.abs() < eps
+ sin_half_angles_over_angles = torch.empty_like(angles)
+ sin_half_angles_over_angles[~small_angles] = (
+ torch.sin(half_angles[~small_angles]) / angles[~small_angles]
+ )
+ # for x small, sin(x/2) is about x/2 - (x/2)^3/6
+ # so sin(x/2)/x is about 1/2 - (x*x)/48
+ sin_half_angles_over_angles[small_angles] = (
+ 0.5 - (angles[small_angles] * angles[small_angles]) / 48
+ )
+ return quaternions[..., 1:] / sin_half_angles_over_angles
+
+
+
+def rotation_6d_to_matrix(d6: torch.Tensor) -> torch.Tensor:
+ """
+ Converts 6D rotation representation by Zhou et al. [1] to rotation matrix
+ using Gram--Schmidt orthogonalization per Section B of [1].
+ Args:
+ d6: 6D rotation representation, of size (*, 6)
+
+ Returns:
+ batch of rotation matrices of size (*, 3, 3)
+
+ [1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H.
+ On the Continuity of Rotation Representations in Neural Networks.
+ IEEE Conference on Computer Vision and Pattern Recognition, 2019.
+ Retrieved from http://arxiv.org/abs/1812.07035
+ """
+
+ a1, a2 = d6[..., :3], d6[..., 3:]
+ b1 = F.normalize(a1, dim=-1)
+ b2 = a2 - (b1 * a2).sum(-1, keepdim=True) * b1
+ b2 = F.normalize(b2, dim=-1)
+ b3 = torch.cross(b1, b2, dim=-1)
+ return torch.stack((b1, b2, b3), dim=-2)
+
+
+def matrix_to_rotation_6d(matrix: torch.Tensor) -> torch.Tensor:
+ """
+ Converts rotation matrices to 6D rotation representation by Zhou et al. [1]
+ by dropping the last row. Note that 6D representation is not unique.
+ Args:
+ matrix: batch of rotation matrices of size (*, 3, 3)
+
+ Returns:
+ 6D rotation representation, of size (*, 6)
+
+ [1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H.
+ On the Continuity of Rotation Representations in Neural Networks.
+ IEEE Conference on Computer Vision and Pattern Recognition, 2019.
+ Retrieved from http://arxiv.org/abs/1812.07035
+ """
+ batch_dim = matrix.size()[:-2]
+ return matrix[..., :2, :].clone().reshape(batch_dim + (6,))
+
+
+def clean_rotation_6d(d6d: torch.Tensor) -> torch.Tensor:
+ """
+ Clean rotation 6d by converting it to matrix and then reconvert to d6
+ """
+ matrix = rotation_6d_to_matrix(d6d)
+ d6d = matrix_to_rotation_6d(matrix)
+ return d6d
+
+
+def rot6d_to_rotmat(x):
+ """Convert 6D rotation representation to 3x3 rotation matrix.
+ Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+ Input:
+ (B,6) Batch of 6-D rotation representations
+ Output:
+ (B,3,3) Batch of corresponding rotation matrices
+ """
+ if x.shape[-1] == 6:
+ batch_dim = x.size()[:-1]
+ else:
+ x = x.reshape(*x.shape[:-1], -1, 6)
+ batch_dim = x.size()[:-1]
+
+ x = x.reshape(*batch_dim, 3, 2)
+ a1, a2 = x[..., 0], x[..., 1]
+
+ b1 = F.normalize(a1, dim=-1)
+ b2 = a2 - (b1 * a2).sum(-1, keepdim=True) * b1
+ b2 = F.normalize(b2, dim=-1)
+ b3 = torch.cross(b1, b2, dim=-1)
+
+ return torch.stack((b1, b2, b3), dim=-1)
+
+
+def rotmat_to_rot6d(x):
+ """Inverse computation of rot6d_to_rotmat."""
+ batch_dim = x.size()[:-2]
+ return x[..., :2].clone().reshape(batch_dim + (6,))
+
+
+def convert_rotation_matrix_to_homogeneous(rotation_matrix):
+ "Add empty translation vector to Rotation matrix"""
+
+ transl = torch.zeros_like(rotation_matrix[...,:1])
+ rotation_matrix_hom = torch.cat((rotation_matrix, transl), dim=-1)
+
+ return rotation_matrix_hom
+
+
+def rotation_matrix_to_angle_axis(rotation_matrix):
+ """Convert 3x4 rotation matrix to Rodrigues vector
+
+ Args:
+ rotation_matrix (Tensor): rotation matrix.
+
+ Returns:
+ Tensor: Rodrigues vector transformation.
+
+ Shape:
+ - Input: :math:`(N, 3, 4)`
+ - Output: :math:`(N, 3)`
+
+ Example:
+ >>> input = torch.rand(2, 3, 4) # Nx3x4
+ >>> output = tgm.rotation_matrix_to_angle_axis(input) # Nx3
+ """
+
+ if rotation_matrix.size(-1) == 3:
+ rotation_matrix = convert_rotation_matrix_to_homogeneous(rotation_matrix)
+
+ quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+ return quaternion_to_angle_axis(quaternion)
+
+
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+ """Convert 3x4 rotation matrix to 4d quaternion vector
+
+ This algorithm is based on algorithm described in
+ https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+
+ Args:
+ rotation_matrix (Tensor): the rotation matrix to convert.
+
+ Return:
+ Tensor: the rotation in quaternion
+
+ Shape:
+ - Input: :math:`(N, 3, 4)`
+ - Output: :math:`(N, 4)`
+
+ Example:
+ >>> input = torch.rand(4, 3, 4) # Nx3x4
+ >>> output = tgm.rotation_matrix_to_quaternion(input) # Nx4
+ """
+ if not torch.is_tensor(rotation_matrix):
+ raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+ type(rotation_matrix)))
+
+ if len(rotation_matrix.shape) > 3:
+ raise ValueError(
+ "Input size must be a three dimensional tensor. Got {}".format(
+ rotation_matrix.shape))
+ if not rotation_matrix.shape[-2:] == (3, 4):
+ raise ValueError(
+ "Input size must be a N x 3 x 4 tensor. Got {}".format(
+ rotation_matrix.shape))
+
+ rmat_t = torch.transpose(rotation_matrix, 1, 2)
+
+ mask_d2 = rmat_t[:, 2, 2] < eps
+
+ mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+ mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+
+ t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+ q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+ t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+ rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
+ t0_rep = t0.repeat(4, 1).t()
+
+ t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+ q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+ rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+ t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
+ t1_rep = t1.repeat(4, 1).t()
+
+ t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+ q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+ rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+ rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
+ t2_rep = t2.repeat(4, 1).t()
+
+ t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+ q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+ rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+ rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
+ t3_rep = t3.repeat(4, 1).t()
+
+ mask_c0 = mask_d2 * mask_d0_d1
+ # mask_c1 = mask_d2 * (1 - mask_d0_d1)
+ mask_c1 = mask_d2 * ~mask_d0_d1
+ # mask_c2 = (1 - mask_d2) * mask_d0_nd1
+ mask_c2 = ~mask_d2 * mask_d0_nd1
+ # mask_c3 = (1 - mask_d2) * (1 - mask_d0_nd1)
+ mask_c3 = ~mask_d2 * ~mask_d0_nd1
+ mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+ mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+ mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+ mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+
+ q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+ q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 + # noqa
+ t2_rep * mask_c2 + t3_rep * mask_c3) # noqa
+ q *= 0.5
+ return q
+
+
+def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
+ """Convert quaternion vector to angle axis of rotation.
+
+ Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+
+ Args:
+ quaternion (torch.Tensor): tensor with quaternions.
+
+ Return:
+ torch.Tensor: tensor with angle axis of rotation.
+
+ Shape:
+ - Input: :math:`(*, 4)` where `*` means, any number of dimensions
+ - Output: :math:`(*, 3)`
+
+ Example:
+ >>> quaternion = torch.rand(2, 4) # Nx4
+ >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion) # Nx3
+ """
+ if not torch.is_tensor(quaternion):
+ raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+ type(quaternion)))
+
+ if not quaternion.shape[-1] == 4:
+ raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
+ .format(quaternion.shape))
+ # unpack input and compute conversion
+ q1: torch.Tensor = quaternion[..., 1]
+ q2: torch.Tensor = quaternion[..., 2]
+ q3: torch.Tensor = quaternion[..., 3]
+ sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
+
+ sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
+ cos_theta: torch.Tensor = quaternion[..., 0]
+ two_theta: torch.Tensor = 2.0 * torch.where(
+ cos_theta < 0.0,
+ torch.atan2(-sin_theta, -cos_theta),
+ torch.atan2(sin_theta, cos_theta))
+
+ k_pos: torch.Tensor = two_theta / sin_theta
+ k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
+ k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+
+ angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
+ angle_axis[..., 0] += q1 * k
+ angle_axis[..., 1] += q2 * k
+ angle_axis[..., 2] += q3 * k
+ return angle_axis
+
+
+def avg_rot(rot):
+ # input [B,...,3,3] --> output [...,3,3]
+ rot = rot.mean(dim=0)
+ U, _, V = torch.svd(rot)
+ rot = U @ V.transpose(-1, -2)
+ return rot
\ No newline at end of file
diff --git a/lib/utils/utils.py b/lib/utils/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..36679d136065c2c14a91bfbf242825284e88892f
--- /dev/null
+++ b/lib/utils/utils.py
@@ -0,0 +1,265 @@
+# -*- coding: utf-8 -*-
+
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: ps-license@tuebingen.mpg.de
+
+import os
+import yaml
+import torch
+import shutil
+import logging
+import operator
+from tqdm import tqdm
+from os import path as osp
+from functools import reduce
+from typing import List, Union
+from collections import OrderedDict
+from torch.optim.lr_scheduler import _LRScheduler
+
+class CustomScheduler(_LRScheduler):
+ def __init__(self, optimizer, lr_lambda):
+ self.lr_lambda = lr_lambda
+ super(CustomScheduler, self).__init__(optimizer)
+
+ def get_lr(self):
+ return [base_lr * self.lr_lambda(self.last_epoch)
+ for base_lr in self.base_lrs]
+
+def lr_decay_fn(epoch):
+ if epoch == 0: return 1.0
+ if epoch % big_epoch == 0:
+ return big_decay
+ else:
+ return small_decay
+
+def save_obj(v, f, file_name='output.obj'):
+ obj_file = open(file_name, 'w')
+ for i in range(len(v)):
+ obj_file.write('v ' + str(v[i][0]) + ' ' + str(v[i][1]) + ' ' + str(v[i][2]) + '\n')
+ for i in range(len(f)):
+ obj_file.write('f ' + str(f[i][0]+1) + '/' + str(f[i][0]+1) + ' ' + str(f[i][1]+1) + '/' + str(f[i][1]+1) + ' ' + str(f[i][2]+1) + '/' + str(f[i][2]+1) + '\n')
+ obj_file.close()
+
+
+def check_data_pararell(train_weight):
+ new_state_dict = OrderedDict()
+ for k, v in train_weight.items():
+ name = k[7:] if k.startswith('module') else k # remove `module.`
+ new_state_dict[name] = v
+ return new_state_dict
+
+
+def get_from_dict(dict, keys):
+ return reduce(operator.getitem, keys, dict)
+
+
+def tqdm_enumerate(iter):
+ i = 0
+ for y in tqdm(iter):
+ yield i, y
+ i += 1
+
+
+def iterdict(d):
+ for k,v in d.items():
+ if isinstance(v, dict):
+ d[k] = dict(v)
+ iterdict(v)
+ return d
+
+
+def accuracy(output, target):
+ _, pred = output.topk(1)
+ pred = pred.view(-1)
+
+ correct = pred.eq(target).sum()
+
+ return correct.item(), target.size(0) - correct.item()
+
+
+def lr_decay(optimizer, step, lr, decay_step, gamma):
+ lr = lr * gamma ** (step/decay_step)
+ for param_group in optimizer.param_groups:
+ param_group['lr'] = lr
+ return lr
+
+
+def step_decay(optimizer, step, lr, decay_step, gamma):
+ lr = lr * gamma ** (step / decay_step)
+ for param_group in optimizer.param_groups:
+ param_group['lr'] = lr
+ return lr
+
+
+def read_yaml(filename):
+ return yaml.load(open(filename, 'r'))
+
+
+def write_yaml(filename, object):
+ with open(filename, 'w') as f:
+ yaml.dump(object, f)
+
+
+def save_dict_to_yaml(obj, filename, mode='w'):
+ with open(filename, mode) as f:
+ yaml.dump(obj, f, default_flow_style=False)
+
+
+def save_to_file(obj, filename, mode='w'):
+ with open(filename, mode) as f:
+ f.write(obj)
+
+
+def concatenate_dicts(dict_list, dim=0):
+ rdict = dict.fromkeys(dict_list[0].keys())
+ for k in rdict.keys():
+ rdict[k] = torch.cat([d[k] for d in dict_list], dim=dim)
+ return rdict
+
+
+def bool_to_string(x: Union[List[bool],bool]) -> Union[List[str],str]:
+ """
+ boolean to string conversion
+ :param x: list or bool to be converted
+ :return: string converted thing
+ """
+ if isinstance(x, bool):
+ return [str(x)]
+ for i, j in enumerate(x):
+ x[i]=str(j)
+ return x
+
+
+def checkpoint2model(checkpoint, key='gen_state_dict'):
+ state_dict = checkpoint[key]
+ print(f'Performance of loaded model on 3DPW is {checkpoint["performance"]:.2f}mm')
+ # del state_dict['regressor.mean_theta']
+ return state_dict
+
+
+def get_optimizer(cfg, model, optim_type, momentum, stage):
+ if stage == 'stage2':
+ param_list = [{'params': model.integrator.parameters()}]
+ for name, param in model.named_parameters():
+ # if 'integrator' not in name and 'motion_encoder' not in name and 'trajectory_decoder' not in name:
+ if 'integrator' not in name:
+ param_list.append({'params': param, 'lr': cfg.TRAIN.LR_FINETUNE})
+ else:
+ param_list = [{'params': model.parameters()}]
+
+ if optim_type in ['sgd', 'SGD']:
+ opt = torch.optim.SGD(lr=cfg.TRAIN.LR, params=param_list, momentum=momentum)
+ elif optim_type in ['Adam', 'adam', 'ADAM']:
+ opt = torch.optim.Adam(lr=cfg.TRAIN.LR, params=param_list, weight_decay=cfg.TRAIN.WD, betas=(0.9, 0.999))
+ else:
+ raise ModuleNotFoundError
+
+ return opt
+
+
+def create_logger(logdir, phase='train'):
+ os.makedirs(logdir, exist_ok=True)
+
+ log_file = osp.join(logdir, f'{phase}_log.txt')
+
+ head = '%(asctime)-15s %(message)s'
+ logging.basicConfig(filename=log_file,
+ format=head)
+ logger = logging.getLogger()
+ logger.setLevel(logging.INFO)
+ console = logging.StreamHandler()
+ logging.getLogger('').addHandler(console)
+
+ return logger
+
+
+class AverageMeter(object):
+ def __init__(self):
+ self.val = 0
+ self.avg = 0
+ self.sum = 0
+ self.count = 0
+
+ def update(self, val, n=1):
+ self.val = val
+ self.sum += val * n
+ self.count += n
+ self.avg = self.sum / self.count
+
+
+def prepare_output_dir(cfg, cfg_file):
+
+ # ==== create logdir
+ logdir = osp.join(cfg.OUTPUT_DIR, cfg.EXP_NAME)
+ os.makedirs(logdir, exist_ok=True)
+ shutil.copy(src=cfg_file, dst=osp.join(cfg.OUTPUT_DIR, 'config.yaml'))
+
+ cfg.LOGDIR = logdir
+
+ # save config
+ save_dict_to_yaml(cfg, osp.join(cfg.LOGDIR, 'config.yaml'))
+
+ return cfg
+
+
+def prepare_groundtruth(batch, device):
+ groundtruths = dict()
+ gt_keys = ['pose', 'cam', 'betas', 'kp3d', 'bbox'] # Evaluation
+ gt_keys += ['pose_root', 'vel_root', 'weak_kp2d', 'verts', # Training
+ 'full_kp2d', 'contact', 'R', 'cam_angvel',
+ 'has_smpl', 'has_traj', 'has_full_screen', 'has_verts']
+ for gt_key in gt_keys:
+ if gt_key in batch.keys():
+ dtype = torch.float32 if batch[gt_key].dtype == torch.float64 else batch[gt_key].dtype
+ groundtruths[gt_key] = batch[gt_key].to(dtype=dtype, device=device)
+
+ return groundtruths
+
+def prepare_auxiliary(batch, device):
+ aux = dict()
+ aux_keys = ['mask', 'bbox', 'res', 'cam_intrinsics', 'init_root', 'cam_angvel']
+ for key in aux_keys:
+ if key in batch.keys():
+ dtype = torch.float32 if batch[key].dtype == torch.float64 else batch[key].dtype
+ aux[key] = batch[key].to(dtype=dtype, device=device)
+
+ return aux
+
+def prepare_input(batch, device, use_features):
+ # Input keypoints data
+ kp2d = batch['kp2d'].to(device).float()
+
+ # Input features
+ if use_features and 'features' in batch.keys():
+ features = batch['features'].to(device).float()
+ else:
+ features = None
+
+ # Initial SMPL parameters
+ init_smpl = batch['init_pose'].to(device).float()
+
+ # Initial keypoints
+ init_kp = torch.cat((
+ batch['init_kp3d'], batch['init_kp2d']
+ ), dim=-1).to(device).float()
+
+ return kp2d, (init_kp, init_smpl), features
+
+
+def prepare_batch(batch, device, use_features=True):
+ x, inits, features = prepare_input(batch, device, use_features)
+ aux = prepare_auxiliary(batch, device)
+ groundtruths = prepare_groundtruth(batch, device)
+
+ return x, inits, features, aux, groundtruths
\ No newline at end of file
diff --git a/lib/vis/__pycache__/renderer.cpython-39.pyc b/lib/vis/__pycache__/renderer.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..efcbde1428aa6e99af3ad25474a154ea0358b64b
Binary files /dev/null and b/lib/vis/__pycache__/renderer.cpython-39.pyc differ
diff --git a/lib/vis/__pycache__/run_vis.cpython-39.pyc b/lib/vis/__pycache__/run_vis.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..28210c497b31cf312b0baf27d3608fcb3374fa8b
Binary files /dev/null and b/lib/vis/__pycache__/run_vis.cpython-39.pyc differ
diff --git a/lib/vis/__pycache__/tools.cpython-39.pyc b/lib/vis/__pycache__/tools.cpython-39.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..787508d20d954cf556be085974608fb11a103312
Binary files /dev/null and b/lib/vis/__pycache__/tools.cpython-39.pyc differ
diff --git a/lib/vis/renderer.py b/lib/vis/renderer.py
new file mode 100644
index 0000000000000000000000000000000000000000..1f4dc6e4e78f99c96272c38e2035bd4af9c645a3
--- /dev/null
+++ b/lib/vis/renderer.py
@@ -0,0 +1,313 @@
+import cv2
+import torch
+import numpy as np
+
+from pytorch3d.renderer import (
+ PerspectiveCameras,
+ TexturesVertex,
+ PointLights,
+ Materials,
+ RasterizationSettings,
+ MeshRenderer,
+ MeshRasterizer,
+ SoftPhongShader,
+)
+from pytorch3d.structures import Meshes
+from pytorch3d.structures.meshes import join_meshes_as_scene
+from pytorch3d.renderer.cameras import look_at_rotation
+
+from .tools import get_colors, checkerboard_geometry
+
+
+def overlay_image_onto_background(image, mask, bbox, background):
+ if isinstance(image, torch.Tensor):
+ image = image.detach().cpu().numpy()
+ if isinstance(mask, torch.Tensor):
+ mask = mask.detach().cpu().numpy()
+
+ out_image = background.copy()
+ bbox = bbox[0].int().cpu().numpy().copy()
+ roi_image = out_image[bbox[1]:bbox[3], bbox[0]:bbox[2]]
+
+ roi_image[mask] = image[mask]
+ out_image[bbox[1]:bbox[3], bbox[0]:bbox[2]] = roi_image
+
+ return out_image
+
+
+def update_intrinsics_from_bbox(K_org, bbox):
+ device, dtype = K_org.device, K_org.dtype
+
+ K = torch.zeros((K_org.shape[0], 4, 4)
+ ).to(device=device, dtype=dtype)
+ K[:, :3, :3] = K_org.clone()
+ K[:, 2, 2] = 0
+ K[:, 2, -1] = 1
+ K[:, -1, 2] = 1
+
+ image_sizes = []
+ for idx, bbox in enumerate(bbox):
+ left, upper, right, lower = bbox
+ cx, cy = K[idx, 0, 2], K[idx, 1, 2]
+
+ new_cx = cx - left
+ new_cy = cy - upper
+ new_height = max(lower - upper, 1)
+ new_width = max(right - left, 1)
+ new_cx = new_width - new_cx
+ new_cy = new_height - new_cy
+
+ K[idx, 0, 2] = new_cx
+ K[idx, 1, 2] = new_cy
+ image_sizes.append((int(new_height), int(new_width)))
+
+ return K, image_sizes
+
+
+def perspective_projection(x3d, K, R=None, T=None):
+ if R != None:
+ x3d = torch.matmul(R, x3d.transpose(1, 2)).transpose(1, 2)
+ if T != None:
+ x3d = x3d + T.transpose(1, 2)
+
+ x2d = torch.div(x3d, x3d[..., 2:])
+ x2d = torch.matmul(K, x2d.transpose(-1, -2)).transpose(-1, -2)[..., :2]
+ return x2d
+
+
+def compute_bbox_from_points(X, img_w, img_h, scaleFactor=1.2):
+ left = torch.clamp(X.min(1)[0][:, 0], min=0, max=img_w)
+ right = torch.clamp(X.max(1)[0][:, 0], min=0, max=img_w)
+ top = torch.clamp(X.min(1)[0][:, 1], min=0, max=img_h)
+ bottom = torch.clamp(X.max(1)[0][:, 1], min=0, max=img_h)
+
+ cx = (left + right) / 2
+ cy = (top + bottom) / 2
+ width = (right - left)
+ height = (bottom - top)
+
+ new_left = torch.clamp(cx - width/2 * scaleFactor, min=0, max=img_w-1)
+ new_right = torch.clamp(cx + width/2 * scaleFactor, min=1, max=img_w)
+ new_top = torch.clamp(cy - height / 2 * scaleFactor, min=0, max=img_h-1)
+ new_bottom = torch.clamp(cy + height / 2 * scaleFactor, min=1, max=img_h)
+
+ bbox = torch.stack((new_left.detach(), new_top.detach(),
+ new_right.detach(), new_bottom.detach())).int().float().T
+
+ return bbox
+
+
+class Renderer():
+ def __init__(self, width, height, focal_length, device, faces=None):
+
+ self.width = width
+ self.height = height
+ self.focal_length = focal_length
+
+ self.device = device
+ if faces is not None:
+ self.faces = torch.from_numpy(
+ (faces).astype('int')
+ ).unsqueeze(0).to(self.device)
+
+ self.initialize_camera_params()
+ self.lights = PointLights(device=device, location=[[0.0, 0.0, -10.0]])
+ self.create_renderer()
+
+ def create_renderer(self):
+ self.renderer = MeshRenderer(
+ rasterizer=MeshRasterizer(
+ raster_settings=RasterizationSettings(
+ image_size=self.image_sizes[0],
+ blur_radius=1e-5),
+ ),
+ shader=SoftPhongShader(
+ device=self.device,
+ lights=self.lights,
+ )
+ )
+
+ def create_camera(self, R=None, T=None):
+ if R is not None:
+ self.R = R.clone().view(1, 3, 3).to(self.device)
+ if T is not None:
+ self.T = T.clone().view(1, 3).to(self.device)
+
+ return PerspectiveCameras(
+ device=self.device,
+ R=self.R.mT,
+ T=self.T,
+ K=self.K_full,
+ image_size=self.image_sizes,
+ in_ndc=False)
+
+
+ def initialize_camera_params(self):
+ """Hard coding for camera parameters
+ TODO: Do some soft coding"""
+
+ # Extrinsics
+ self.R = torch.diag(
+ torch.tensor([1, 1, 1])
+ ).float().to(self.device).unsqueeze(0)
+
+ self.T = torch.tensor(
+ [0, 0, 0]
+ ).unsqueeze(0).float().to(self.device)
+
+ # Intrinsics
+ self.K = torch.tensor(
+ [[self.focal_length, 0, self.width/2],
+ [0, self.focal_length, self.height/2],
+ [0, 0, 1]]
+ ).unsqueeze(0).float().to(self.device)
+ self.bboxes = torch.tensor([[0, 0, self.width, self.height]]).float()
+ self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, self.bboxes)
+ self.cameras = self.create_camera()
+
+
+ def set_ground(self, length, center_x, center_z):
+ device = self.device
+ v, f, vc, fc = map(torch.from_numpy, checkerboard_geometry(length=length, c1=center_x, c2=center_z, up="y"))
+ v, f, vc = v.to(device), f.to(device), vc.to(device)
+ self.ground_geometry = [v, f, vc]
+
+
+ def update_bbox(self, x3d, scale=2.0, mask=None):
+ """ Update bbox of cameras from the given 3d points
+
+ x3d: input 3D keypoints (or vertices), (num_frames, num_points, 3)
+ """
+
+ if x3d.size(-1) != 3:
+ x2d = x3d.unsqueeze(0)
+ else:
+ x2d = perspective_projection(x3d.unsqueeze(0), self.K, self.R, self.T.reshape(1, 3, 1))
+
+ if mask is not None:
+ x2d = x2d[:, ~mask]
+
+ bbox = compute_bbox_from_points(x2d, self.width, self.height, scale)
+ self.bboxes = bbox
+
+ self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, bbox)
+ self.cameras = self.create_camera()
+ self.create_renderer()
+
+ def reset_bbox(self,):
+ bbox = torch.zeros((1, 4)).float().to(self.device)
+ bbox[0, 2] = self.width
+ bbox[0, 3] = self.height
+ self.bboxes = bbox
+
+ self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, bbox)
+ self.cameras = self.create_camera()
+ self.create_renderer()
+
+ def render_mesh(self, vertices, background, colors=[0.8, 0.8, 0.8]):
+ self.update_bbox(vertices[::50], scale=1.2)
+ vertices = vertices.unsqueeze(0)
+
+ if colors[0] > 1: colors = [c / 255. for c in colors]
+ verts_features = torch.tensor(colors).reshape(1, 1, 3).to(device=vertices.device, dtype=vertices.dtype)
+ verts_features = verts_features.repeat(1, vertices.shape[1], 1)
+ textures = TexturesVertex(verts_features=verts_features)
+
+ mesh = Meshes(verts=vertices,
+ faces=self.faces,
+ textures=textures,)
+
+ materials = Materials(
+ device=self.device,
+ specular_color=(colors, ),
+ shininess=0
+ )
+
+ results = torch.flip(
+ self.renderer(mesh, materials=materials, cameras=self.cameras, lights=self.lights),
+ [1, 2]
+ )
+ image = results[0, ..., :3] * 255
+ mask = results[0, ..., -1] > 1e-3
+
+ image = overlay_image_onto_background(image, mask, self.bboxes, background.copy())
+ self.reset_bbox()
+ return image
+
+
+ def render_with_ground(self, verts, faces, colors, cameras, lights):
+ """
+ :param verts (B, V, 3)
+ :param faces (F, 3)
+ :param colors (B, 3)
+ """
+
+ # (B, V, 3), (B, F, 3), (B, V, 3)
+ verts, faces, colors = prep_shared_geometry(verts, faces, colors)
+ # (V, 3), (F, 3), (V, 3)
+ gv, gf, gc = self.ground_geometry
+ verts = list(torch.unbind(verts, dim=0)) + [gv]
+ faces = list(torch.unbind(faces, dim=0)) + [gf]
+ colors = list(torch.unbind(colors, dim=0)) + [gc[..., :3]]
+ mesh = create_meshes(verts, faces, colors)
+
+ materials = Materials(
+ device=self.device,
+ shininess=0
+ )
+
+ results = self.renderer(mesh, cameras=cameras, lights=lights, materials=materials)
+ image = (results[0, ..., :3].cpu().numpy() * 255).astype(np.uint8)
+
+ return image
+
+
+def prep_shared_geometry(verts, faces, colors):
+ """
+ :param verts (B, V, 3)
+ :param faces (F, 3)
+ :param colors (B, 4)
+ """
+ B, V, _ = verts.shape
+ F, _ = faces.shape
+ colors = colors.unsqueeze(1).expand(B, V, -1)[..., :3]
+ faces = faces.unsqueeze(0).expand(B, F, -1)
+ return verts, faces, colors
+
+
+def create_meshes(verts, faces, colors):
+ """
+ :param verts (B, V, 3)
+ :param faces (B, F, 3)
+ :param colors (B, V, 3)
+ """
+ textures = TexturesVertex(verts_features=colors)
+ meshes = Meshes(verts=verts, faces=faces, textures=textures)
+ return join_meshes_as_scene(meshes)
+
+
+def get_global_cameras(verts, device, distance=5, position=(-5.0, 5.0, 0.0)):
+ positions = torch.tensor([position]).repeat(len(verts), 1)
+ targets = verts.mean(1)
+
+ directions = targets - positions
+ directions = directions / torch.norm(directions, dim=-1).unsqueeze(-1) * distance
+ positions = targets - directions
+
+ rotation = look_at_rotation(positions, targets, ).mT
+ translation = -(rotation @ positions.unsqueeze(-1)).squeeze(-1)
+
+ lights = PointLights(device=device, location=[position])
+ return rotation, translation, lights
+
+
+def _get_global_cameras(verts, device, min_distance=3, chunk_size=100):
+
+ # split into smaller chunks to visualize
+ start_idxs = list(range(0, len(verts), chunk_size))
+ end_idxs = [min(start_idx + chunk_size, len(verts)) for start_idx in start_idxs]
+
+ Rs, Ts = [], []
+ for start_idx, end_idx in zip(start_idxs, end_idxs):
+ vert = verts[start_idx:end_idx].clone()
+ import pdb; pdb.set_trace()
\ No newline at end of file
diff --git a/lib/vis/run_vis.py b/lib/vis/run_vis.py
new file mode 100644
index 0000000000000000000000000000000000000000..d940a624ab5109b43fb732292f99b75c5461e684
--- /dev/null
+++ b/lib/vis/run_vis.py
@@ -0,0 +1,92 @@
+import os
+import os.path as osp
+
+import cv2
+import torch
+import imageio
+import numpy as np
+from progress.bar import Bar
+
+from lib.vis.renderer import Renderer, get_global_cameras
+
+def run_vis_on_demo(cfg, video, results, output_pth, smpl, vis_global=True):
+ # to torch tensor
+ tt = lambda x: torch.from_numpy(x).float().to(cfg.DEVICE)
+
+ cap = cv2.VideoCapture(video)
+ fps = cap.get(cv2.CAP_PROP_FPS)
+ length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+ width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+
+ # create renderer with cliff focal length estimation
+ focal_length = (width ** 2 + height ** 2) ** 0.5
+ renderer = Renderer(width, height, focal_length, cfg.DEVICE, smpl.faces)
+
+ if vis_global:
+ # setup global coordinate subject
+ # current implementation only visualize the subject appeared longest
+ n_frames = {k: len(results[k]['frame_ids']) for k in results.keys()}
+ sid = max(n_frames, key=n_frames.get)
+ global_output = smpl.get_output(
+ body_pose=tt(results[sid]['pose_world'][:, 3:]),
+ global_orient=tt(results[sid]['pose_world'][:, :3]),
+ betas=tt(results[sid]['betas']),
+ transl=tt(results[sid]['trans_world']))
+ verts_glob = global_output.vertices.cpu()
+ verts_glob[..., 1] = verts_glob[..., 1] - verts_glob[..., 1].min()
+ cx, cz = (verts_glob.mean(1).max(0)[0] + verts_glob.mean(1).min(0)[0])[[0, 2]] / 2.0
+ sx, sz = (verts_glob.mean(1).max(0)[0] - verts_glob.mean(1).min(0)[0])[[0, 2]]
+ scale = max(sx.item(), sz.item()) * 1.5
+
+ # set default ground
+ renderer.set_ground(scale, cx.item(), cz.item())
+
+ # build global camera
+ global_R, global_T, global_lights = get_global_cameras(verts_glob, cfg.DEVICE)
+
+ # build default camera
+ default_R, default_T = torch.eye(3), torch.zeros(3)
+
+ writer = imageio.get_writer(
+ osp.join(output_pth, 'output.mp4'),
+ fps=fps, mode='I', format='FFMPEG', macro_block_size=1
+ )
+ bar = Bar('Rendering results ...', fill='#', max=length)
+
+ frame_i = 0
+ _global_R, _global_T = None, None
+ # run rendering
+ while (cap.isOpened()):
+ flag, org_img = cap.read()
+ if not flag: break
+ img = org_img[..., ::-1].copy()
+
+ # render onto the input video
+ renderer.create_camera(default_R, default_T)
+ for _id, val in results.items():
+ # render onto the image
+ frame_i2 = np.where(val['frame_ids'] == frame_i)[0]
+ if len(frame_i2) == 0: continue
+ frame_i2 = frame_i2[0]
+ img = renderer.render_mesh(torch.from_numpy(val['verts'][frame_i2]).to(cfg.DEVICE), img)
+
+ if vis_global:
+ # render the global coordinate
+ if frame_i in results[sid]['frame_ids']:
+ frame_i3 = np.where(results[sid]['frame_ids'] == frame_i)[0]
+ verts = verts_glob[[frame_i3]].to(cfg.DEVICE)
+ faces = renderer.faces.clone().squeeze(0)
+ colors = torch.ones((1, 4)).float().to(cfg.DEVICE); colors[..., :3] *= 0.9
+
+ if _global_R is None:
+ _global_R = global_R[frame_i3].clone(); _global_T = global_T[frame_i3].clone()
+ cameras = renderer.create_camera(global_R[frame_i3], global_T[frame_i3])
+ img_glob = renderer.render_with_ground(verts, faces, colors, cameras, global_lights)
+
+ try: img = np.concatenate((img, img_glob), axis=1)
+ except: img = np.concatenate((img, np.ones_like(img) * 255), axis=1)
+
+ writer.append_data(img)
+ bar.next()
+ frame_i += 1
+ writer.close()
\ No newline at end of file
diff --git a/lib/vis/tools.py b/lib/vis/tools.py
new file mode 100644
index 0000000000000000000000000000000000000000..503e4377bcaa18d4d541381963fdd4458b81a7c1
--- /dev/null
+++ b/lib/vis/tools.py
@@ -0,0 +1,822 @@
+import os
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+
+
+def read_image(path, scale=1):
+ im = Image.open(path)
+ if scale == 1:
+ return np.array(im)
+ W, H = im.size
+ w, h = int(scale * W), int(scale * H)
+ return np.array(im.resize((w, h), Image.ANTIALIAS))
+
+
+def transform_torch3d(T_c2w):
+ """
+ :param T_c2w (*, 4, 4)
+ returns (*, 3, 3), (*, 3)
+ """
+ R1 = torch.tensor(
+ [[-1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, 1.0],], device=T_c2w.device,
+ )
+ R2 = torch.tensor(
+ [[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0],], device=T_c2w.device,
+ )
+ cam_R, cam_t = T_c2w[..., :3, :3], T_c2w[..., :3, 3]
+ cam_R = torch.einsum("...ij,jk->...ik", cam_R, R1)
+ cam_t = torch.einsum("ij,...j->...i", R2, cam_t)
+ return cam_R, cam_t
+
+
+def transform_pyrender(T_c2w):
+ """
+ :param T_c2w (*, 4, 4)
+ """
+ T_vis = torch.tensor(
+ [
+ [1.0, 0.0, 0.0, 0.0],
+ [0.0, -1.0, 0.0, 0.0],
+ [0.0, 0.0, -1.0, 0.0],
+ [0.0, 0.0, 0.0, 1.0],
+ ],
+ device=T_c2w.device,
+ )
+ return torch.einsum(
+ "...ij,jk->...ik", torch.einsum("ij,...jk->...ik", T_vis, T_c2w), T_vis
+ )
+
+
+def smpl_to_geometry(verts, faces, vis_mask=None, track_ids=None):
+ """
+ :param verts (B, T, V, 3)
+ :param faces (F, 3)
+ :param vis_mask (optional) (B, T) visibility of each person
+ :param track_ids (optional) (B,)
+ returns list of T verts (B, V, 3), faces (F, 3), colors (B, 3)
+ where B is different depending on the visibility of the people
+ """
+ B, T = verts.shape[:2]
+ device = verts.device
+
+ # (B, 3)
+ colors = (
+ track_to_colors(track_ids)
+ if track_ids is not None
+ else torch.ones(B, 3, device) * 0.5
+ )
+
+ # list T (B, V, 3), T (B, 3), T (F, 3)
+ return filter_visible_meshes(verts, colors, faces, vis_mask)
+
+
+def filter_visible_meshes(verts, colors, faces, vis_mask=None, vis_opacity=False):
+ """
+ :param verts (B, T, V, 3)
+ :param colors (B, 3)
+ :param faces (F, 3)
+ :param vis_mask (optional tensor, default None) (B, T) ternary mask
+ -1 if not in frame
+ 0 if temporarily occluded
+ 1 if visible
+ :param vis_opacity (optional bool, default False)
+ if True, make occluded people alpha=0.5, otherwise alpha=1
+ returns a list of T lists verts (Bi, V, 3), colors (Bi, 4), faces (F, 3)
+ """
+ # import ipdb; ipdb.set_trace()
+ B, T = verts.shape[:2]
+ faces = [faces for t in range(T)]
+ if vis_mask is None:
+ verts = [verts[:, t] for t in range(T)]
+ colors = [colors for t in range(T)]
+ return verts, colors, faces
+
+ # render occluded and visible, but not removed
+ vis_mask = vis_mask >= 0
+ if vis_opacity:
+ alpha = 0.5 * (vis_mask[..., None] + 1)
+ else:
+ alpha = (vis_mask[..., None] >= 0).float()
+ vert_list = [verts[vis_mask[:, t], t] for t in range(T)]
+ colors = [
+ torch.cat([colors[vis_mask[:, t]], alpha[vis_mask[:, t], t]], dim=-1)
+ for t in range(T)
+ ]
+ bounds = get_bboxes(verts, vis_mask)
+ return vert_list, colors, faces, bounds
+
+
+def get_bboxes(verts, vis_mask):
+ """
+ return bb_min, bb_max, and mean for each track (B, 3) over entire trajectory
+ :param verts (B, T, V, 3)
+ :param vis_mask (B, T)
+ """
+ B, T, *_ = verts.shape
+ bb_min, bb_max, mean = [], [], []
+ for b in range(B):
+ v = verts[b, vis_mask[b, :T]] # (Tb, V, 3)
+ bb_min.append(v.amin(dim=(0, 1)))
+ bb_max.append(v.amax(dim=(0, 1)))
+ mean.append(v.mean(dim=(0, 1)))
+ bb_min = torch.stack(bb_min, dim=0)
+ bb_max = torch.stack(bb_max, dim=0)
+ mean = torch.stack(mean, dim=0)
+ # point to a track that's long and close to the camera
+ zs = mean[:, 2]
+ counts = vis_mask[:, :T].sum(dim=-1) # (B,)
+ mask = counts < 0.8 * T
+ zs[mask] = torch.inf
+ sel = torch.argmin(zs)
+ return bb_min.amin(dim=0), bb_max.amax(dim=0), mean[sel]
+
+
+def track_to_colors(track_ids):
+ """
+ :param track_ids (B)
+ """
+ color_map = torch.from_numpy(get_colors()).to(track_ids)
+ return color_map[track_ids] / 255 # (B, 3)
+
+
+def get_colors():
+ # color_file = os.path.abspath(os.path.join(__file__, "../colors_phalp.txt"))
+ color_file = os.path.abspath(os.path.join(__file__, "../colors.txt"))
+ RGB_tuples = np.vstack(
+ [
+ np.loadtxt(color_file, skiprows=0),
+ # np.loadtxt(color_file, skiprows=1),
+ np.random.uniform(0, 255, size=(10000, 3)),
+ [[0, 0, 0]],
+ ]
+ )
+ b = np.where(RGB_tuples == 0)
+ RGB_tuples[b] = 1
+ return RGB_tuples.astype(np.float32)
+
+
+def checkerboard_geometry(
+ length=12.0,
+ color0=[0.8, 0.9, 0.9],
+ color1=[0.6, 0.7, 0.7],
+ tile_width=0.5,
+ alpha=1.0,
+ up="y",
+ c1=0.0,
+ c2=0.0,
+):
+ assert up == "y" or up == "z"
+ color0 = np.array(color0 + [alpha])
+ color1 = np.array(color1 + [alpha])
+ radius = length / 2.0
+ num_rows = num_cols = max(2, int(length / tile_width))
+ vertices = []
+ vert_colors = []
+ faces = []
+ face_colors = []
+ for i in range(num_rows):
+ for j in range(num_cols):
+ u0, v0 = j * tile_width - radius, i * tile_width - radius
+ us = np.array([u0, u0, u0 + tile_width, u0 + tile_width])
+ vs = np.array([v0, v0 + tile_width, v0 + tile_width, v0])
+ zs = np.zeros(4)
+ if up == "y":
+ cur_verts = np.stack([us, zs, vs], axis=-1) # (4, 3)
+ cur_verts[:, 0] += c1
+ cur_verts[:, 2] += c2
+ else:
+ cur_verts = np.stack([us, vs, zs], axis=-1) # (4, 3)
+ cur_verts[:, 0] += c1
+ cur_verts[:, 1] += c2
+
+ cur_faces = np.array(
+ [[0, 1, 3], [1, 2, 3], [0, 3, 1], [1, 3, 2]], dtype=np.int64
+ )
+ cur_faces += 4 * (i * num_cols + j) # the number of previously added verts
+ use_color0 = (i % 2 == 0 and j % 2 == 0) or (i % 2 == 1 and j % 2 == 1)
+ cur_color = color0 if use_color0 else color1
+ cur_colors = np.array([cur_color, cur_color, cur_color, cur_color])
+
+ vertices.append(cur_verts)
+ faces.append(cur_faces)
+ vert_colors.append(cur_colors)
+ face_colors.append(cur_colors)
+
+ vertices = np.concatenate(vertices, axis=0).astype(np.float32)
+ vert_colors = np.concatenate(vert_colors, axis=0).astype(np.float32)
+ faces = np.concatenate(faces, axis=0).astype(np.float32)
+ face_colors = np.concatenate(face_colors, axis=0).astype(np.float32)
+
+ return vertices, faces, vert_colors, face_colors
+
+
+def camera_marker_geometry(radius, height, up):
+ assert up == "y" or up == "z"
+ if up == "y":
+ vertices = np.array(
+ [
+ [-radius, -radius, 0],
+ [radius, -radius, 0],
+ [radius, radius, 0],
+ [-radius, radius, 0],
+ [0, 0, height],
+ ]
+ )
+ else:
+ vertices = np.array(
+ [
+ [-radius, 0, -radius],
+ [radius, 0, -radius],
+ [radius, 0, radius],
+ [-radius, 0, radius],
+ [0, -height, 0],
+ ]
+ )
+
+ faces = np.array(
+ [[0, 3, 1], [1, 3, 2], [0, 1, 4], [1, 2, 4], [2, 3, 4], [3, 0, 4],]
+ )
+
+ face_colors = np.array(
+ [
+ [1.0, 1.0, 1.0, 1.0],
+ [1.0, 1.0, 1.0, 1.0],
+ [0.0, 1.0, 0.0, 1.0],
+ [1.0, 0.0, 0.0, 1.0],
+ [0.0, 1.0, 0.0, 1.0],
+ [1.0, 0.0, 0.0, 1.0],
+ ]
+ )
+ return vertices, faces, face_colors
+
+
+def vis_keypoints(
+ keypts_list,
+ img_size,
+ radius=6,
+ thickness=3,
+ kpt_score_thr=0.3,
+ dataset="TopDownCocoDataset",
+):
+ """
+ Visualize keypoints
+ From ViTPose/mmpose/apis/inference.py
+ """
+ palette = np.array(
+ [
+ [255, 128, 0],
+ [255, 153, 51],
+ [255, 178, 102],
+ [230, 230, 0],
+ [255, 153, 255],
+ [153, 204, 255],
+ [255, 102, 255],
+ [255, 51, 255],
+ [102, 178, 255],
+ [51, 153, 255],
+ [255, 153, 153],
+ [255, 102, 102],
+ [255, 51, 51],
+ [153, 255, 153],
+ [102, 255, 102],
+ [51, 255, 51],
+ [0, 255, 0],
+ [0, 0, 255],
+ [255, 0, 0],
+ [255, 255, 255],
+ ]
+ )
+
+ if dataset in (
+ "TopDownCocoDataset",
+ "BottomUpCocoDataset",
+ "TopDownOCHumanDataset",
+ "AnimalMacaqueDataset",
+ ):
+ # show the results
+ skeleton = [
+ [15, 13],
+ [13, 11],
+ [16, 14],
+ [14, 12],
+ [11, 12],
+ [5, 11],
+ [6, 12],
+ [5, 6],
+ [5, 7],
+ [6, 8],
+ [7, 9],
+ [8, 10],
+ [1, 2],
+ [0, 1],
+ [0, 2],
+ [1, 3],
+ [2, 4],
+ [3, 5],
+ [4, 6],
+ ]
+
+ pose_link_color = palette[
+ [0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]
+ ]
+ pose_kpt_color = palette[
+ [16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]
+ ]
+
+ elif dataset == "TopDownCocoWholeBodyDataset":
+ # show the results
+ skeleton = [
+ [15, 13],
+ [13, 11],
+ [16, 14],
+ [14, 12],
+ [11, 12],
+ [5, 11],
+ [6, 12],
+ [5, 6],
+ [5, 7],
+ [6, 8],
+ [7, 9],
+ [8, 10],
+ [1, 2],
+ [0, 1],
+ [0, 2],
+ [1, 3],
+ [2, 4],
+ [3, 5],
+ [4, 6],
+ [15, 17],
+ [15, 18],
+ [15, 19],
+ [16, 20],
+ [16, 21],
+ [16, 22],
+ [91, 92],
+ [92, 93],
+ [93, 94],
+ [94, 95],
+ [91, 96],
+ [96, 97],
+ [97, 98],
+ [98, 99],
+ [91, 100],
+ [100, 101],
+ [101, 102],
+ [102, 103],
+ [91, 104],
+ [104, 105],
+ [105, 106],
+ [106, 107],
+ [91, 108],
+ [108, 109],
+ [109, 110],
+ [110, 111],
+ [112, 113],
+ [113, 114],
+ [114, 115],
+ [115, 116],
+ [112, 117],
+ [117, 118],
+ [118, 119],
+ [119, 120],
+ [112, 121],
+ [121, 122],
+ [122, 123],
+ [123, 124],
+ [112, 125],
+ [125, 126],
+ [126, 127],
+ [127, 128],
+ [112, 129],
+ [129, 130],
+ [130, 131],
+ [131, 132],
+ ]
+
+ pose_link_color = palette[
+ [0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]
+ + [16, 16, 16, 16, 16, 16]
+ + [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+ + [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+ ]
+ pose_kpt_color = palette[
+ [16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]
+ + [0, 0, 0, 0, 0, 0]
+ + [19] * (68 + 42)
+ ]
+
+ elif dataset == "TopDownAicDataset":
+ skeleton = [
+ [2, 1],
+ [1, 0],
+ [0, 13],
+ [13, 3],
+ [3, 4],
+ [4, 5],
+ [8, 7],
+ [7, 6],
+ [6, 9],
+ [9, 10],
+ [10, 11],
+ [12, 13],
+ [0, 6],
+ [3, 9],
+ ]
+
+ pose_link_color = palette[[9, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 0, 7, 7]]
+ pose_kpt_color = palette[[9, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 0, 0]]
+
+ elif dataset == "TopDownMpiiDataset":
+ skeleton = [
+ [0, 1],
+ [1, 2],
+ [2, 6],
+ [6, 3],
+ [3, 4],
+ [4, 5],
+ [6, 7],
+ [7, 8],
+ [8, 9],
+ [8, 12],
+ [12, 11],
+ [11, 10],
+ [8, 13],
+ [13, 14],
+ [14, 15],
+ ]
+
+ pose_link_color = palette[[16, 16, 16, 16, 16, 16, 7, 7, 0, 9, 9, 9, 9, 9, 9]]
+ pose_kpt_color = palette[[16, 16, 16, 16, 16, 16, 7, 7, 0, 0, 9, 9, 9, 9, 9, 9]]
+
+ elif dataset == "TopDownMpiiTrbDataset":
+ skeleton = [
+ [12, 13],
+ [13, 0],
+ [13, 1],
+ [0, 2],
+ [1, 3],
+ [2, 4],
+ [3, 5],
+ [0, 6],
+ [1, 7],
+ [6, 7],
+ [6, 8],
+ [7, 9],
+ [8, 10],
+ [9, 11],
+ [14, 15],
+ [16, 17],
+ [18, 19],
+ [20, 21],
+ [22, 23],
+ [24, 25],
+ [26, 27],
+ [28, 29],
+ [30, 31],
+ [32, 33],
+ [34, 35],
+ [36, 37],
+ [38, 39],
+ ]
+
+ pose_link_color = palette[[16] * 14 + [19] * 13]
+ pose_kpt_color = palette[[16] * 14 + [0] * 26]
+
+ elif dataset in ("OneHand10KDataset", "FreiHandDataset", "PanopticDataset"):
+ skeleton = [
+ [0, 1],
+ [1, 2],
+ [2, 3],
+ [3, 4],
+ [0, 5],
+ [5, 6],
+ [6, 7],
+ [7, 8],
+ [0, 9],
+ [9, 10],
+ [10, 11],
+ [11, 12],
+ [0, 13],
+ [13, 14],
+ [14, 15],
+ [15, 16],
+ [0, 17],
+ [17, 18],
+ [18, 19],
+ [19, 20],
+ ]
+
+ pose_link_color = palette[
+ [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+ ]
+ pose_kpt_color = palette[
+ [0, 0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+ ]
+
+ elif dataset == "InterHand2DDataset":
+ skeleton = [
+ [0, 1],
+ [1, 2],
+ [2, 3],
+ [4, 5],
+ [5, 6],
+ [6, 7],
+ [8, 9],
+ [9, 10],
+ [10, 11],
+ [12, 13],
+ [13, 14],
+ [14, 15],
+ [16, 17],
+ [17, 18],
+ [18, 19],
+ [3, 20],
+ [7, 20],
+ [11, 20],
+ [15, 20],
+ [19, 20],
+ ]
+
+ pose_link_color = palette[
+ [0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12, 16, 16, 16, 0, 4, 8, 12, 16]
+ ]
+ pose_kpt_color = palette[
+ [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16, 0]
+ ]
+
+ elif dataset == "Face300WDataset":
+ # show the results
+ skeleton = []
+
+ pose_link_color = palette[[]]
+ pose_kpt_color = palette[[19] * 68]
+ kpt_score_thr = 0
+
+ elif dataset == "FaceAFLWDataset":
+ # show the results
+ skeleton = []
+
+ pose_link_color = palette[[]]
+ pose_kpt_color = palette[[19] * 19]
+ kpt_score_thr = 0
+
+ elif dataset == "FaceCOFWDataset":
+ # show the results
+ skeleton = []
+
+ pose_link_color = palette[[]]
+ pose_kpt_color = palette[[19] * 29]
+ kpt_score_thr = 0
+
+ elif dataset == "FaceWFLWDataset":
+ # show the results
+ skeleton = []
+
+ pose_link_color = palette[[]]
+ pose_kpt_color = palette[[19] * 98]
+ kpt_score_thr = 0
+
+ elif dataset == "AnimalHorse10Dataset":
+ skeleton = [
+ [0, 1],
+ [1, 12],
+ [12, 16],
+ [16, 21],
+ [21, 17],
+ [17, 11],
+ [11, 10],
+ [10, 8],
+ [8, 9],
+ [9, 12],
+ [2, 3],
+ [3, 4],
+ [5, 6],
+ [6, 7],
+ [13, 14],
+ [14, 15],
+ [18, 19],
+ [19, 20],
+ ]
+
+ pose_link_color = palette[[4] * 10 + [6] * 2 + [6] * 2 + [7] * 2 + [7] * 2]
+ pose_kpt_color = palette[
+ [4, 4, 6, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 7, 7, 7, 4, 4, 7, 7, 7, 4]
+ ]
+
+ elif dataset == "AnimalFlyDataset":
+ skeleton = [
+ [1, 0],
+ [2, 0],
+ [3, 0],
+ [4, 3],
+ [5, 4],
+ [7, 6],
+ [8, 7],
+ [9, 8],
+ [11, 10],
+ [12, 11],
+ [13, 12],
+ [15, 14],
+ [16, 15],
+ [17, 16],
+ [19, 18],
+ [20, 19],
+ [21, 20],
+ [23, 22],
+ [24, 23],
+ [25, 24],
+ [27, 26],
+ [28, 27],
+ [29, 28],
+ [30, 3],
+ [31, 3],
+ ]
+
+ pose_link_color = palette[[0] * 25]
+ pose_kpt_color = palette[[0] * 32]
+
+ elif dataset == "AnimalLocustDataset":
+ skeleton = [
+ [1, 0],
+ [2, 1],
+ [3, 2],
+ [4, 3],
+ [6, 5],
+ [7, 6],
+ [9, 8],
+ [10, 9],
+ [11, 10],
+ [13, 12],
+ [14, 13],
+ [15, 14],
+ [17, 16],
+ [18, 17],
+ [19, 18],
+ [21, 20],
+ [22, 21],
+ [24, 23],
+ [25, 24],
+ [26, 25],
+ [28, 27],
+ [29, 28],
+ [30, 29],
+ [32, 31],
+ [33, 32],
+ [34, 33],
+ ]
+
+ pose_link_color = palette[[0] * 26]
+ pose_kpt_color = palette[[0] * 35]
+
+ elif dataset == "AnimalZebraDataset":
+ skeleton = [[1, 0], [2, 1], [3, 2], [4, 2], [5, 7], [6, 7], [7, 2], [8, 7]]
+
+ pose_link_color = palette[[0] * 8]
+ pose_kpt_color = palette[[0] * 9]
+
+ elif dataset in "AnimalPoseDataset":
+ skeleton = [
+ [0, 1],
+ [0, 2],
+ [1, 3],
+ [0, 4],
+ [1, 4],
+ [4, 5],
+ [5, 7],
+ [6, 7],
+ [5, 8],
+ [8, 12],
+ [12, 16],
+ [5, 9],
+ [9, 13],
+ [13, 17],
+ [6, 10],
+ [10, 14],
+ [14, 18],
+ [6, 11],
+ [11, 15],
+ [15, 19],
+ ]
+
+ pose_link_color = palette[[0] * 20]
+ pose_kpt_color = palette[[0] * 20]
+ else:
+ NotImplementedError()
+
+ img_w, img_h = img_size
+ img = 255 * np.ones((img_h, img_w, 3), dtype=np.uint8)
+ img = imshow_keypoints(
+ img,
+ keypts_list,
+ skeleton,
+ kpt_score_thr,
+ pose_kpt_color,
+ pose_link_color,
+ radius,
+ thickness,
+ )
+ alpha = 255 * (img != 255).any(axis=-1, keepdims=True).astype(np.uint8)
+ return np.concatenate([img, alpha], axis=-1)
+
+
+def imshow_keypoints(
+ img,
+ pose_result,
+ skeleton=None,
+ kpt_score_thr=0.3,
+ pose_kpt_color=None,
+ pose_link_color=None,
+ radius=4,
+ thickness=1,
+ show_keypoint_weight=False,
+):
+ """Draw keypoints and links on an image.
+ From ViTPose/mmpose/core/visualization/image.py
+
+ Args:
+ img (H, W, 3) array
+ pose_result (list[kpts]): The poses to draw. Each element kpts is
+ a set of K keypoints as an Kx3 numpy.ndarray, where each
+ keypoint is represented as x, y, score.
+ kpt_score_thr (float, optional): Minimum score of keypoints
+ to be shown. Default: 0.3.
+ pose_kpt_color (np.array[Nx3]`): Color of N keypoints. If None,
+ the keypoint will not be drawn.
+ pose_link_color (np.array[Mx3]): Color of M links. If None, the
+ links will not be drawn.
+ thickness (int): Thickness of lines.
+ show_keypoint_weight (bool): If True, opacity indicates keypoint score
+ """
+ img_h, img_w, _ = img.shape
+ idcs = [0, 16, 15, 18, 17, 5, 2, 6, 3, 7, 4, 12, 9, 13, 10, 14, 11]
+ for kpts in pose_result:
+ kpts = np.array(kpts, copy=False)[idcs]
+
+ # draw each point on image
+ if pose_kpt_color is not None:
+ assert len(pose_kpt_color) == len(kpts)
+ for kid, kpt in enumerate(kpts):
+ x_coord, y_coord, kpt_score = int(kpt[0]), int(kpt[1]), kpt[2]
+ if kpt_score > kpt_score_thr:
+ color = tuple(int(c) for c in pose_kpt_color[kid])
+ if show_keypoint_weight:
+ img_copy = img.copy()
+ cv2.circle(
+ img_copy, (int(x_coord), int(y_coord)), radius, color, -1
+ )
+ transparency = max(0, min(1, kpt_score))
+ cv2.addWeighted(
+ img_copy, transparency, img, 1 - transparency, 0, dst=img
+ )
+ else:
+ cv2.circle(img, (int(x_coord), int(y_coord)), radius, color, -1)
+
+ # draw links
+ if skeleton is not None and pose_link_color is not None:
+ assert len(pose_link_color) == len(skeleton)
+ for sk_id, sk in enumerate(skeleton):
+ pos1 = (int(kpts[sk[0], 0]), int(kpts[sk[0], 1]))
+ pos2 = (int(kpts[sk[1], 0]), int(kpts[sk[1], 1]))
+ if (
+ pos1[0] > 0
+ and pos1[0] < img_w
+ and pos1[1] > 0
+ and pos1[1] < img_h
+ and pos2[0] > 0
+ and pos2[0] < img_w
+ and pos2[1] > 0
+ and pos2[1] < img_h
+ and kpts[sk[0], 2] > kpt_score_thr
+ and kpts[sk[1], 2] > kpt_score_thr
+ ):
+ color = tuple(int(c) for c in pose_link_color[sk_id])
+ if show_keypoint_weight:
+ img_copy = img.copy()
+ X = (pos1[0], pos2[0])
+ Y = (pos1[1], pos2[1])
+ mX = np.mean(X)
+ mY = np.mean(Y)
+ length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5
+ angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1]))
+ stickwidth = 2
+ polygon = cv2.ellipse2Poly(
+ (int(mX), int(mY)),
+ (int(length / 2), int(stickwidth)),
+ int(angle),
+ 0,
+ 360,
+ 1,
+ )
+ cv2.fillConvexPoly(img_copy, polygon, color)
+ transparency = max(
+ 0, min(1, 0.5 * (kpts[sk[0], 2] + kpts[sk[1], 2]))
+ )
+ cv2.addWeighted(
+ img_copy, transparency, img, 1 - transparency, 0, dst=img
+ )
+ else:
+ cv2.line(img, pos1, pos2, color, thickness=thickness)
+
+ return img
\ No newline at end of file
diff --git a/output/demo/test19/output.mp4 b/output/demo/test19/output.mp4
new file mode 100644
index 0000000000000000000000000000000000000000..2b0ff61514173f1d0dbc688a9905a8ec42ec7888
Binary files /dev/null and b/output/demo/test19/output.mp4 differ
diff --git a/output/demo/test19/slam_results.pth b/output/demo/test19/slam_results.pth
new file mode 100644
index 0000000000000000000000000000000000000000..30960920c35d27cbaf981575ce81156881d3013d
--- /dev/null
+++ b/output/demo/test19/slam_results.pth
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:eb6e0b47809fe94bdc26bc99318f9eb9beccb005ec81c407887a5bd7223b5b81
+size 2353
diff --git a/output/demo/test19/tracking_results.pth b/output/demo/test19/tracking_results.pth
new file mode 100644
index 0000000000000000000000000000000000000000..69c2c5d2621b19fe51fdbb09bab84294e96cbb30
--- /dev/null
+++ b/output/demo/test19/tracking_results.pth
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3d1b3d6e23597e07daaa1b124cba63a1ea91d3909fe2c903c3c9b2b2819ce140
+size 333898
diff --git a/output/demo/test19/wham_output.pkl b/output/demo/test19/wham_output.pkl
new file mode 100644
index 0000000000000000000000000000000000000000..d994b69baefb5b73245454edb6618eb76c157a77
--- /dev/null
+++ b/output/demo/test19/wham_output.pkl
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1484cca6cf2774c3c0cbefa3d47ed7f2dd04db96b05ebdd14e5a11610a415b3e
+size 3167067
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..e9a73a55cf7ecb136437c7a4aacf8cc6bf517250
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,21 @@
+chumpy @ git+https://github.com/mattloper/chumpy
+numpy==1.22.3
+yacs
+joblib
+scikit-image
+opencv-python
+imageio[ffmpeg]
+matplotlib
+tensorboard
+smplx
+progress
+einops
+mmcv==1.3.9
+timm==0.4.9
+munkres
+xtcocotools>=1.8
+loguru
+setuptools==59.5.0
+tqdm
+ultralytics
+gdown==4.6.0
\ No newline at end of file
diff --git a/test.py b/test.py
new file mode 100644
index 0000000000000000000000000000000000000000..d83a39fa183dc9ee60a25f37b05e7807b066152c
--- /dev/null
+++ b/test.py
@@ -0,0 +1,230 @@
+import os
+import argparse
+import os.path as osp
+from glob import glob
+from collections import defaultdict
+
+import cv2
+import torch
+import joblib
+import numpy as np
+from loguru import logger
+from progress.bar import Bar
+
+from configs.config import get_cfg_defaults
+from lib.data.datasets import CustomDataset
+from lib.utils.imutils import avg_preds
+from lib.utils.transforms import matrix_to_axis_angle
+from lib.models import build_network, build_body_model
+from lib.models.preproc.detector import DetectionModel
+from lib.models.preproc.extractor import FeatureExtractor
+from lib.models.smplify import TemporalSMPLify
+
+try:
+ from lib.models.preproc.slam import SLAMModel
+
+ _run_global = True
+except:
+ logger.info('DPVO is not properly installed. Only estimate in local coordinates !')
+ _run_global = False
+
+
+def run(cfg,
+ video,
+ output_pth,
+ network,
+ calib=None,
+ run_global=True,
+ save_pkl=False,
+ visualize=False,
+ run_smplify=False):
+ cap = cv2.VideoCapture(video)
+ assert cap.isOpened(), f'Failed to load video file {video}'
+ fps = cap.get(cv2.CAP_PROP_FPS)
+ length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+ width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+
+ # Whether or not estimating motion in global coordinates
+ run_global = run_global and _run_global
+
+ # Preprocess
+ with torch.no_grad():
+ if not (osp.exists(osp.join(output_pth, 'tracking_results.pth')) and
+ osp.exists(osp.join(output_pth, 'slam_results.pth'))):
+
+ detector = DetectionModel(cfg.DEVICE.lower())
+ extractor = FeatureExtractor(cfg.DEVICE.lower(), cfg.FLIP_EVAL)
+
+ if run_global:
+ slam = SLAMModel(video, output_pth, width, height, calib)
+ else:
+ slam = None
+
+ bar = Bar('Preprocess: 2D detection and SLAM', fill='#', max=length)
+ while (cap.isOpened()):
+ flag, img = cap.read()
+ if not flag: break
+
+ # 2D detection and tracking
+ detector.track(img, fps, length)
+
+ # SLAM
+ if slam is not None:
+ slam.track()
+
+ bar.next()
+
+ tracking_results = detector.process(fps)
+
+ if slam is not None:
+ slam_results = slam.process()
+ else:
+ slam_results = np.zeros((length, 7))
+ slam_results[:, 3] = 1.0 # Unit quaternion
+
+ # Extract image features
+ # TODO: Merge this into the previous while loop with an online bbox smoothing.
+ tracking_results = extractor.run(video, tracking_results)
+ logger.info('Complete Data preprocessing!')
+
+ # Save the processed data
+ joblib.dump(tracking_results, osp.join(output_pth, 'tracking_results.pth'))
+ joblib.dump(slam_results, osp.join(output_pth, 'slam_results.pth'))
+ logger.info(f'Save processed data at {output_pth}')
+
+ # If the processed data already exists, load the processed data
+ else:
+ tracking_results = joblib.load(osp.join(output_pth, 'tracking_results.pth'))
+ slam_results = joblib.load(osp.join(output_pth, 'slam_results.pth'))
+ logger.info(f'Already processed data exists at {output_pth} ! Load the data .')
+
+ # Build dataset
+ dataset = CustomDataset(cfg, tracking_results, slam_results, width, height, fps)
+
+ # run WHAM
+ results = defaultdict(dict)
+
+ n_subjs = len(dataset)
+ for subj in range(n_subjs):
+
+ with torch.no_grad():
+ if cfg.FLIP_EVAL:
+ # Forward pass with flipped input
+ flipped_batch = dataset.load_data(subj, True)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = flipped_batch
+ flipped_pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel,
+ return_y_up=True, **kwargs)
+
+ # Forward pass with normal input
+ batch = dataset.load_data(subj)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = batch
+ pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel,
+ return_y_up=True, **kwargs)
+
+ # Merge two predictions
+ flipped_pose, flipped_shape = flipped_pred['pose'].squeeze(0), flipped_pred['betas'].squeeze(0)
+ pose, shape = pred['pose'].squeeze(0), pred['betas'].squeeze(0)
+ flipped_pose, pose = flipped_pose.reshape(-1, 24, 6), pose.reshape(-1, 24, 6)
+ avg_pose, avg_shape = avg_preds(pose, shape, flipped_pose, flipped_shape)
+ avg_pose = avg_pose.reshape(-1, 144)
+ avg_contact = (flipped_pred['contact'][..., [2, 3, 0, 1]] + pred['contact']) / 2
+
+ # Refine trajectory with merged prediction
+ network.pred_pose = avg_pose.view_as(network.pred_pose)
+ network.pred_shape = avg_shape.view_as(network.pred_shape)
+ network.pred_contact = avg_contact.view_as(network.pred_contact)
+ output = network.forward_smpl(**kwargs)
+ pred = network.refine_trajectory(output, cam_angvel, return_y_up=True)
+
+ else:
+ # data
+ batch = dataset.load_data(subj)
+ _id, x, inits, features, mask, init_root, cam_angvel, frame_id, kwargs = batch
+
+ # inference
+ pred = network(x, inits, features, mask=mask, init_root=init_root, cam_angvel=cam_angvel,
+ return_y_up=True, **kwargs)
+
+ # if False:
+ if run_smplify:
+ smplify = TemporalSMPLify(smpl, img_w=width, img_h=height, device=cfg.DEVICE)
+ input_keypoints = dataset.tracking_results[_id]['keypoints']
+ pred = smplify.fit(pred, input_keypoints, **kwargs)
+
+ with torch.no_grad():
+ network.pred_pose = pred['pose']
+ network.pred_shape = pred['betas']
+ network.pred_cam = pred['cam']
+ output = network.forward_smpl(**kwargs)
+ pred = network.refine_trajectory(output, cam_angvel, return_y_up=True)
+
+ # ========= Store results ========= #
+ pred_body_pose = matrix_to_axis_angle(pred['poses_body']).cpu().numpy().reshape(-1, 69)
+ pred_root = matrix_to_axis_angle(pred['poses_root_cam']).cpu().numpy().reshape(-1, 3)
+ pred_root_world = matrix_to_axis_angle(pred['poses_root_world']).cpu().numpy().reshape(-1, 3)
+ pred_pose = np.concatenate((pred_root, pred_body_pose), axis=-1)
+ pred_pose_world = np.concatenate((pred_root_world, pred_body_pose), axis=-1)
+ pred_trans = (pred['trans_cam'] - network.output.offset).cpu().numpy()
+
+ results[_id]['pose'] = pred_pose
+ results[_id]['trans'] = pred_trans
+ results[_id]['pose_world'] = pred_pose_world
+ results[_id]['trans_world'] = pred['trans_world'].cpu().squeeze(0).numpy()
+ results[_id]['betas'] = pred['betas'].cpu().squeeze(0).numpy()
+ results[_id]['verts'] = (pred['verts_cam'] + pred['trans_cam'].unsqueeze(1)).cpu().numpy()
+ results[_id]['frame_ids'] = frame_id
+
+ if save_pkl:
+ joblib.dump(results, osp.join(output_pth, "wham_output.pkl"))
+
+ # Visualize
+ if visualize:
+ from lib.vis.run_vis import run_vis_on_demo
+ with torch.no_grad():
+ run_vis_on_demo(cfg, video, results, output_pth, network.smpl, vis_global=run_global)
+
+
+if __name__ == '__main__':
+
+ VIDEO_PATH = "examples/test19.mov"
+ OUTPUT_PATH = "output/demo"
+ CALIB_PATH = None
+ ESTIMATE_LOCAL_ONLY = False
+ VISUALIZE = True
+ SAVE_PKL = True
+ RUN_SMPLIFY = False
+ GENDER = 'male'
+
+
+ cfg = get_cfg_defaults()
+ cfg.merge_from_file('configs/yamls/demo.yaml')
+
+ logger.info(f'GPU name -> {torch.cuda.get_device_name()}')
+ logger.info(f'GPU feat -> {torch.cuda.get_device_properties("cuda")}')
+
+ # ========= Load WHAM ========= #
+ smpl_batch_size = cfg.TRAIN.BATCH_SIZE * cfg.DATASET.SEQLEN
+ smpl = build_body_model(device=cfg.DEVICE, gender=GENDER, batch_size=smpl_batch_size)
+ network = build_network(cfg, smpl)
+ network.eval()
+
+ # Output folder
+ sequence = '.'.join(VIDEO_PATH.split('/')[-1].split('.')[:-1])
+ output_pth = osp.join(OUTPUT_PATH, sequence)
+ os.makedirs(output_pth, exist_ok=True)
+
+ faces_np = network.smpl.get_faces()
+ np.save(osp.join(output_pth, f'faces_{GENDER}.npy'), faces_np)
+
+ run(cfg,
+ VIDEO_PATH,
+ output_pth,
+ network,
+ CALIB_PATH,
+ run_global=not ESTIMATE_LOCAL_ONLY,
+ save_pkl=SAVE_PKL,
+ visualize=VISUALIZE,
+ run_smplify=RUN_SMPLIFY)
+
+ print()
+ logger.info('Done !')
diff --git a/third-party/DPVO/DPRetrieval/CMakeLists.txt b/third-party/DPVO/DPRetrieval/CMakeLists.txt
new file mode 100644
index 0000000000000000000000000000000000000000..8c78c643246f899e4a2ee6c6ff3fd412b9c87132
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/CMakeLists.txt
@@ -0,0 +1,17 @@
+cmake_minimum_required(VERSION 3.4...3.18)
+project(dpretrieval)
+find_package(OpenCV REQUIRED)
+include_directories(${OpenCV_INCLUDE_DIRS})
+
+find_package(DBoW2 REQUIRED)
+include_directories(${DBoW2_INCLUDE_DIRS})
+
+
+add_subdirectory(pybind11)
+pybind11_add_module(dpretrieval src/main.cpp)
+target_link_libraries(dpretrieval PRIVATE ${OpenCV_LIBS} ${DBoW2_LIBS} )
+
+# EXAMPLE_VERSION_INFO is defined by setup.py and passed into the C++ code as a
+# define (VERSION_INFO) here.
+target_compile_definitions(dpretrieval
+ PRIVATE VERSION_INFO=${EXAMPLE_VERSION_INFO})
\ No newline at end of file
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.appveyor.yml b/third-party/DPVO/DPRetrieval/pybind11/.appveyor.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0eed11a3473353796b6fd81edb0b220b55a358e7
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.appveyor.yml
@@ -0,0 +1,37 @@
+version: 1.0.{build}
+image:
+- Visual Studio 2015
+test: off
+skip_branch_with_pr: true
+build:
+ parallel: true
+platform:
+- x86
+environment:
+ matrix:
+ - PYTHON: 36
+ CONFIG: Debug
+ - PYTHON: 27
+ CONFIG: Debug
+install:
+- ps: |
+ $env:CMAKE_GENERATOR = "Visual Studio 14 2015"
+ if ($env:PLATFORM -eq "x64") { $env:PYTHON = "$env:PYTHON-x64" }
+ $env:PATH = "C:\Python$env:PYTHON\;C:\Python$env:PYTHON\Scripts\;$env:PATH"
+ python -W ignore -m pip install --upgrade pip wheel
+ python -W ignore -m pip install pytest numpy --no-warn-script-location pytest-timeout
+- ps: |
+ Start-FileDownload 'https://gitlab.com/libeigen/eigen/-/archive/3.3.7/eigen-3.3.7.zip'
+ 7z x eigen-3.3.7.zip -y > $null
+ $env:CMAKE_INCLUDE_PATH = "eigen-3.3.7;$env:CMAKE_INCLUDE_PATH"
+build_script:
+- cmake -G "%CMAKE_GENERATOR%" -A "%CMAKE_ARCH%"
+ -DCMAKE_CXX_STANDARD=14
+ -DPYBIND11_WERROR=ON
+ -DDOWNLOAD_CATCH=ON
+ -DCMAKE_SUPPRESS_REGENERATION=1
+ .
+- set MSBuildLogger="C:\Program Files\AppVeyor\BuildAgent\Appveyor.MSBuildLogger.dll"
+- cmake --build . --config %CONFIG% --target pytest -- /m /v:m /logger:%MSBuildLogger%
+- cmake --build . --config %CONFIG% --target cpptest -- /m /v:m /logger:%MSBuildLogger%
+on_failure: if exist "tests\test_cmake_build" type tests\test_cmake_build\*.log*
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.clang-format b/third-party/DPVO/DPRetrieval/pybind11/.clang-format
new file mode 100644
index 0000000000000000000000000000000000000000..7ec1c91bf1a83f14af90f67d99ca2dfb12df638d
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.clang-format
@@ -0,0 +1,38 @@
+---
+# See all possible options and defaults with:
+# clang-format --style=llvm --dump-config
+BasedOnStyle: LLVM
+AccessModifierOffset: -4
+AllowShortLambdasOnASingleLine: true
+AlwaysBreakTemplateDeclarations: Yes
+BinPackArguments: false
+BinPackParameters: false
+BreakBeforeBinaryOperators: All
+BreakConstructorInitializers: BeforeColon
+ColumnLimit: 99
+CommentPragmas: 'NOLINT:.*|^ IWYU pragma:'
+IncludeBlocks: Regroup
+IndentCaseLabels: true
+IndentPPDirectives: AfterHash
+IndentWidth: 4
+Language: Cpp
+SpaceAfterCStyleCast: true
+Standard: Cpp11
+StatementMacros: ['PyObject_HEAD']
+TabWidth: 4
+IncludeCategories:
+ - Regex: ''
+ Priority: 4
+ - Regex: '.*'
+ Priority: 5
+...
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.clang-tidy b/third-party/DPVO/DPRetrieval/pybind11/.clang-tidy
new file mode 100644
index 0000000000000000000000000000000000000000..340d4f5965d8706deb3d077ff713608f6a937856
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.clang-tidy
@@ -0,0 +1,72 @@
+FormatStyle: file
+
+Checks: '
+*bugprone*,
+clang-analyzer-optin.performance.Padding,
+clang-analyzer-optin.cplusplus.VirtualCall,
+cppcoreguidelines-init-variables,
+cppcoreguidelines-prefer-member-initializer,
+cppcoreguidelines-pro-type-static-cast-downcast,
+cppcoreguidelines-slicing,
+google-explicit-constructor,
+llvm-namespace-comment,
+misc-misplaced-const,
+misc-non-copyable-objects,
+misc-static-assert,
+misc-throw-by-value-catch-by-reference,
+misc-uniqueptr-reset-release,
+misc-unused-parameters,
+modernize-avoid-bind,
+modernize-make-shared,
+modernize-redundant-void-arg,
+modernize-replace-auto-ptr,
+modernize-replace-disallow-copy-and-assign-macro,
+modernize-replace-random-shuffle,
+modernize-shrink-to-fit,
+modernize-use-auto,
+modernize-use-bool-literals,
+modernize-use-equals-default,
+modernize-use-equals-delete,
+modernize-use-default-member-init,
+modernize-use-noexcept,
+modernize-use-emplace,
+modernize-use-override,
+modernize-use-using,
+*performance*,
+readability-avoid-const-params-in-decls,
+readability-braces-around-statements,
+readability-const-return-type,
+readability-container-size-empty,
+readability-delete-null-pointer,
+readability-else-after-return,
+readability-implicit-bool-conversion,
+readability-inconsistent-declaration-parameter-name,
+readability-make-member-function-const,
+readability-misplaced-array-index,
+readability-non-const-parameter,
+readability-qualified-auto,
+readability-redundant-function-ptr-dereference,
+readability-redundant-smartptr-get,
+readability-redundant-string-cstr,
+readability-simplify-subscript-expr,
+readability-static-accessed-through-instance,
+readability-static-definition-in-anonymous-namespace,
+readability-string-compare,
+readability-suspicious-call-argument,
+readability-uniqueptr-delete-release,
+-bugprone-exception-escape,
+-bugprone-reserved-identifier,
+-bugprone-unused-raii,
+'
+
+CheckOptions:
+- key: performance-for-range-copy.WarnOnAllAutoCopies
+ value: true
+- key: performance-unnecessary-value-param.AllowedTypes
+ value: 'exception_ptr$;'
+- key: readability-implicit-bool-conversion.AllowPointerConditions
+ value: true
+
+HeaderFilterRegex: 'pybind11/.*h'
+
+WarningsAsErrors: '*'
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.cmake-format.yaml b/third-party/DPVO/DPRetrieval/pybind11/.cmake-format.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..2371b63a853bf9b83965a1c1c15e3619942e7f21
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.cmake-format.yaml
@@ -0,0 +1,73 @@
+parse:
+ additional_commands:
+ pybind11_add_module:
+ flags:
+ - THIN_LTO
+ - MODULE
+ - SHARED
+ - NO_EXTRAS
+ - EXCLUDE_FROM_ALL
+ - SYSTEM
+
+format:
+ line_width: 99
+ tab_size: 2
+
+ # If an argument group contains more than this many sub-groups
+ # (parg or kwarg groups) then force it to a vertical layout.
+ max_subgroups_hwrap: 2
+
+ # If a positional argument group contains more than this many
+ # arguments, then force it to a vertical layout.
+ max_pargs_hwrap: 6
+
+ # If a cmdline positional group consumes more than this many
+ # lines without nesting, then invalidate the layout (and nest)
+ max_rows_cmdline: 2
+ separate_ctrl_name_with_space: false
+ separate_fn_name_with_space: false
+ dangle_parens: false
+
+ # If the trailing parenthesis must be 'dangled' on its on
+ # 'line, then align it to this reference: `prefix`: the start'
+ # 'of the statement, `prefix-indent`: the start of the'
+ # 'statement, plus one indentation level, `child`: align to'
+ # the column of the arguments
+ dangle_align: prefix
+ # If the statement spelling length (including space and
+ # parenthesis) is smaller than this amount, then force reject
+ # nested layouts.
+ min_prefix_chars: 4
+
+ # If the statement spelling length (including space and
+ # parenthesis) is larger than the tab width by more than this
+ # amount, then force reject un-nested layouts.
+ max_prefix_chars: 10
+
+ # If a candidate layout is wrapped horizontally but it exceeds
+ # this many lines, then reject the layout.
+ max_lines_hwrap: 2
+
+ line_ending: unix
+
+ # Format command names consistently as 'lower' or 'upper' case
+ command_case: canonical
+
+ # Format keywords consistently as 'lower' or 'upper' case
+ # unchanged is valid too
+ keyword_case: 'upper'
+
+ # A list of command names which should always be wrapped
+ always_wrap: []
+
+ # If true, the argument lists which are known to be sortable
+ # will be sorted lexicographically
+ enable_sort: true
+
+ # If true, the parsers may infer whether or not an argument
+ # list is sortable (without annotation).
+ autosort: false
+
+# Causes a few issues - can be solved later, possibly.
+markup:
+ enable_markup: false
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.pre-commit-config.yaml b/third-party/DPVO/DPRetrieval/pybind11/.pre-commit-config.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..44a1adb5fd7c4f7fe557031f2ab424bf6253a8f1
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.pre-commit-config.yaml
@@ -0,0 +1,147 @@
+# To use:
+#
+# pre-commit run -a
+#
+# Or:
+#
+# pre-commit install # (runs every time you commit in git)
+#
+# To update this file:
+#
+# pre-commit autoupdate
+#
+# See https://github.com/pre-commit/pre-commit
+
+repos:
+# Standard hooks
+- repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v4.1.0
+ hooks:
+ - id: check-added-large-files
+ - id: check-case-conflict
+ - id: check-docstring-first
+ - id: check-merge-conflict
+ - id: check-symlinks
+ - id: check-toml
+ - id: check-yaml
+ - id: debug-statements
+ - id: end-of-file-fixer
+ - id: mixed-line-ending
+ - id: requirements-txt-fixer
+ - id: trailing-whitespace
+ - id: fix-encoding-pragma
+ exclude: ^noxfile.py$
+
+- repo: https://github.com/asottile/pyupgrade
+ rev: v2.31.0
+ hooks:
+ - id: pyupgrade
+
+- repo: https://github.com/PyCQA/isort
+ rev: 5.10.1
+ hooks:
+ - id: isort
+
+# Black, the code formatter, natively supports pre-commit
+- repo: https://github.com/psf/black
+ rev: 21.12b0 # Keep in sync with blacken-docs
+ hooks:
+ - id: black
+
+- repo: https://github.com/asottile/blacken-docs
+ rev: v1.12.0
+ hooks:
+ - id: blacken-docs
+ additional_dependencies:
+ - black==21.12b0 # keep in sync with black hook
+
+# Changes tabs to spaces
+- repo: https://github.com/Lucas-C/pre-commit-hooks
+ rev: v1.1.10
+ hooks:
+ - id: remove-tabs
+
+# Autoremoves unused imports
+- repo: https://github.com/hadialqattan/pycln
+ rev: v1.1.0
+ hooks:
+ - id: pycln
+
+- repo: https://github.com/pre-commit/pygrep-hooks
+ rev: v1.9.0
+ hooks:
+ - id: python-check-blanket-noqa
+ - id: python-check-blanket-type-ignore
+ - id: python-no-log-warn
+ - id: rst-backticks
+ - id: rst-directive-colons
+ - id: rst-inline-touching-normal
+
+# Flake8 also supports pre-commit natively (same author)
+- repo: https://github.com/PyCQA/flake8
+ rev: 4.0.1
+ hooks:
+ - id: flake8
+ additional_dependencies: &flake8_dependencies
+ - flake8-bugbear
+ - pep8-naming
+ exclude: ^(docs/.*|tools/.*)$
+
+- repo: https://github.com/asottile/yesqa
+ rev: v1.3.0
+ hooks:
+ - id: yesqa
+ additional_dependencies: *flake8_dependencies
+
+# CMake formatting
+- repo: https://github.com/cheshirekow/cmake-format-precommit
+ rev: v0.6.13
+ hooks:
+ - id: cmake-format
+ additional_dependencies: [pyyaml]
+ types: [file]
+ files: (\.cmake|CMakeLists.txt)(.in)?$
+
+# Check static types with mypy
+- repo: https://github.com/pre-commit/mirrors-mypy
+ rev: v0.931
+ hooks:
+ - id: mypy
+ # Running per-file misbehaves a bit, so just run on all files, it's fast
+ pass_filenames: false
+ additional_dependencies: [typed_ast]
+
+# Checks the manifest for missing files (native support)
+- repo: https://github.com/mgedmin/check-manifest
+ rev: "0.47"
+ hooks:
+ - id: check-manifest
+ # This is a slow hook, so only run this if --hook-stage manual is passed
+ stages: [manual]
+ additional_dependencies: [cmake, ninja]
+
+- repo: https://github.com/codespell-project/codespell
+ rev: v2.1.0
+ hooks:
+ - id: codespell
+ exclude: ".supp$"
+ args: ["-L", "nd,ot,thist"]
+
+- repo: https://github.com/shellcheck-py/shellcheck-py
+ rev: v0.8.0.3
+ hooks:
+ - id: shellcheck
+
+# The original pybind11 checks for a few C++ style items
+- repo: local
+ hooks:
+ - id: disallow-caps
+ name: Disallow improper capitalization
+ language: pygrep
+ entry: PyBind|Numpy|Cmake|CCache|PyTest
+ exclude: .pre-commit-config.yaml
+
+- repo: https://github.com/pre-commit/mirrors-clang-format
+ rev: "v13.0.0"
+ hooks:
+ - id: clang-format
diff --git a/third-party/DPVO/DPRetrieval/pybind11/.readthedocs.yml b/third-party/DPVO/DPRetrieval/pybind11/.readthedocs.yml
new file mode 100644
index 0000000000000000000000000000000000000000..cf4c6d05049872ee5e8c66c871d1b7aa8ee03876
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/.readthedocs.yml
@@ -0,0 +1,3 @@
+python:
+ version: 3
+requirements_file: docs/requirements.txt
diff --git a/third-party/DPVO/DPRetrieval/pybind11/CMakeLists.txt b/third-party/DPVO/DPRetrieval/pybind11/CMakeLists.txt
new file mode 100644
index 0000000000000000000000000000000000000000..ca678c22440ab42bd1ff19621d54ee9b26e269b8
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/CMakeLists.txt
@@ -0,0 +1,299 @@
+# CMakeLists.txt -- Build system for the pybind11 modules
+#
+# Copyright (c) 2015 Wenzel Jakob
+#
+# All rights reserved. Use of this source code is governed by a
+# BSD-style license that can be found in the LICENSE file.
+
+cmake_minimum_required(VERSION 3.4)
+
+# The `cmake_minimum_required(VERSION 3.4...3.22)` syntax does not work with
+# some versions of VS that have a patched CMake 3.11. This forces us to emulate
+# the behavior using the following workaround:
+if(${CMAKE_VERSION} VERSION_LESS 3.22)
+ cmake_policy(VERSION ${CMAKE_MAJOR_VERSION}.${CMAKE_MINOR_VERSION})
+else()
+ cmake_policy(VERSION 3.22)
+endif()
+
+# Avoid infinite recursion if tests include this as a subdirectory
+if(DEFINED PYBIND11_MASTER_PROJECT)
+ return()
+endif()
+
+# Extract project version from source
+file(STRINGS "${CMAKE_CURRENT_SOURCE_DIR}/include/pybind11/detail/common.h"
+ pybind11_version_defines REGEX "#define PYBIND11_VERSION_(MAJOR|MINOR|PATCH) ")
+
+foreach(ver ${pybind11_version_defines})
+ if(ver MATCHES [[#define PYBIND11_VERSION_(MAJOR|MINOR|PATCH) +([^ ]+)$]])
+ set(PYBIND11_VERSION_${CMAKE_MATCH_1} "${CMAKE_MATCH_2}")
+ endif()
+endforeach()
+
+if(PYBIND11_VERSION_PATCH MATCHES [[\.([a-zA-Z0-9]+)$]])
+ set(pybind11_VERSION_TYPE "${CMAKE_MATCH_1}")
+endif()
+string(REGEX MATCH "^[0-9]+" PYBIND11_VERSION_PATCH "${PYBIND11_VERSION_PATCH}")
+
+project(
+ pybind11
+ LANGUAGES CXX
+ VERSION "${PYBIND11_VERSION_MAJOR}.${PYBIND11_VERSION_MINOR}.${PYBIND11_VERSION_PATCH}")
+
+# Standard includes
+include(GNUInstallDirs)
+include(CMakePackageConfigHelpers)
+include(CMakeDependentOption)
+
+if(NOT pybind11_FIND_QUIETLY)
+ message(STATUS "pybind11 v${pybind11_VERSION} ${pybind11_VERSION_TYPE}")
+endif()
+
+# Check if pybind11 is being used directly or via add_subdirectory
+if(CMAKE_SOURCE_DIR STREQUAL PROJECT_SOURCE_DIR)
+ ### Warn if not an out-of-source builds
+ if(CMAKE_CURRENT_SOURCE_DIR STREQUAL CMAKE_CURRENT_BINARY_DIR)
+ set(lines
+ "You are building in-place. If that is not what you intended to "
+ "do, you can clean the source directory with:\n"
+ "rm -r CMakeCache.txt CMakeFiles/ cmake_uninstall.cmake pybind11Config.cmake "
+ "pybind11ConfigVersion.cmake tests/CMakeFiles/\n")
+ message(AUTHOR_WARNING ${lines})
+ endif()
+
+ set(PYBIND11_MASTER_PROJECT ON)
+
+ if(OSX AND CMAKE_VERSION VERSION_LESS 3.7)
+ # Bug in macOS CMake < 3.7 is unable to download catch
+ message(WARNING "CMAKE 3.7+ needed on macOS to download catch, and newer HIGHLY recommended")
+ elseif(WINDOWS AND CMAKE_VERSION VERSION_LESS 3.8)
+ # Only tested with 3.8+ in CI.
+ message(WARNING "CMAKE 3.8+ tested on Windows, previous versions untested")
+ endif()
+
+ message(STATUS "CMake ${CMAKE_VERSION}")
+
+ if(CMAKE_CXX_STANDARD)
+ set(CMAKE_CXX_EXTENSIONS OFF)
+ set(CMAKE_CXX_STANDARD_REQUIRED ON)
+ endif()
+
+ set(pybind11_system "")
+
+ set_property(GLOBAL PROPERTY USE_FOLDERS ON)
+else()
+ set(PYBIND11_MASTER_PROJECT OFF)
+ set(pybind11_system SYSTEM)
+endif()
+
+# Options
+option(PYBIND11_INSTALL "Install pybind11 header files?" ${PYBIND11_MASTER_PROJECT})
+option(PYBIND11_TEST "Build pybind11 test suite?" ${PYBIND11_MASTER_PROJECT})
+option(PYBIND11_NOPYTHON "Disable search for Python" OFF)
+set(PYBIND11_INTERNALS_VERSION
+ ""
+ CACHE STRING "Override the ABI version, may be used to enable the unstable ABI.")
+
+cmake_dependent_option(
+ USE_PYTHON_INCLUDE_DIR
+ "Install pybind11 headers in Python include directory instead of default installation prefix"
+ OFF "PYBIND11_INSTALL" OFF)
+
+cmake_dependent_option(PYBIND11_FINDPYTHON "Force new FindPython" OFF
+ "NOT CMAKE_VERSION VERSION_LESS 3.12" OFF)
+
+# NB: when adding a header don't forget to also add it to setup.py
+set(PYBIND11_HEADERS
+ include/pybind11/detail/class.h
+ include/pybind11/detail/common.h
+ include/pybind11/detail/descr.h
+ include/pybind11/detail/init.h
+ include/pybind11/detail/internals.h
+ include/pybind11/detail/type_caster_base.h
+ include/pybind11/detail/typeid.h
+ include/pybind11/attr.h
+ include/pybind11/buffer_info.h
+ include/pybind11/cast.h
+ include/pybind11/chrono.h
+ include/pybind11/common.h
+ include/pybind11/complex.h
+ include/pybind11/options.h
+ include/pybind11/eigen.h
+ include/pybind11/embed.h
+ include/pybind11/eval.h
+ include/pybind11/gil.h
+ include/pybind11/iostream.h
+ include/pybind11/functional.h
+ include/pybind11/numpy.h
+ include/pybind11/operators.h
+ include/pybind11/pybind11.h
+ include/pybind11/pytypes.h
+ include/pybind11/stl.h
+ include/pybind11/stl_bind.h
+ include/pybind11/stl/filesystem.h)
+
+# Compare with grep and warn if mismatched
+if(PYBIND11_MASTER_PROJECT AND NOT CMAKE_VERSION VERSION_LESS 3.12)
+ file(
+ GLOB_RECURSE _pybind11_header_check
+ LIST_DIRECTORIES false
+ RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"
+ CONFIGURE_DEPENDS "include/pybind11/*.h")
+ set(_pybind11_here_only ${PYBIND11_HEADERS})
+ set(_pybind11_disk_only ${_pybind11_header_check})
+ list(REMOVE_ITEM _pybind11_here_only ${_pybind11_header_check})
+ list(REMOVE_ITEM _pybind11_disk_only ${PYBIND11_HEADERS})
+ if(_pybind11_here_only)
+ message(AUTHOR_WARNING "PYBIND11_HEADERS has extra files:" ${_pybind11_here_only})
+ endif()
+ if(_pybind11_disk_only)
+ message(AUTHOR_WARNING "PYBIND11_HEADERS is missing files:" ${_pybind11_disk_only})
+ endif()
+endif()
+
+# CMake 3.12 added list(TRANSFORM PREPEND
+# But we can't use it yet
+string(REPLACE "include/" "${CMAKE_CURRENT_SOURCE_DIR}/include/" PYBIND11_HEADERS
+ "${PYBIND11_HEADERS}")
+
+# Cache variable so this can be used in parent projects
+set(pybind11_INCLUDE_DIR
+ "${CMAKE_CURRENT_LIST_DIR}/include"
+ CACHE INTERNAL "Directory where pybind11 headers are located")
+
+# Backward compatible variable for add_subdirectory mode
+if(NOT PYBIND11_MASTER_PROJECT)
+ set(PYBIND11_INCLUDE_DIR
+ "${pybind11_INCLUDE_DIR}"
+ CACHE INTERNAL "")
+endif()
+
+# Note: when creating targets, you cannot use if statements at configure time -
+# you need generator expressions, because those will be placed in the target file.
+# You can also place ifs *in* the Config.in, but not here.
+
+# This section builds targets, but does *not* touch Python
+# Non-IMPORT targets cannot be defined twice
+if(NOT TARGET pybind11_headers)
+ # Build the headers-only target (no Python included):
+ # (long name used here to keep this from clashing in subdirectory mode)
+ add_library(pybind11_headers INTERFACE)
+ add_library(pybind11::pybind11_headers ALIAS pybind11_headers) # to match exported target
+ add_library(pybind11::headers ALIAS pybind11_headers) # easier to use/remember
+
+ target_include_directories(
+ pybind11_headers ${pybind11_system} INTERFACE $
+ $)
+
+ target_compile_features(pybind11_headers INTERFACE cxx_inheriting_constructors cxx_user_literals
+ cxx_right_angle_brackets)
+ if(NOT "${PYBIND11_INTERNALS_VERSION}" STREQUAL "")
+ target_compile_definitions(
+ pybind11_headers INTERFACE "PYBIND11_INTERNALS_VERSION=${PYBIND11_INTERNALS_VERSION}")
+ endif()
+else()
+ # It is invalid to install a target twice, too.
+ set(PYBIND11_INSTALL OFF)
+endif()
+
+include("${CMAKE_CURRENT_SOURCE_DIR}/tools/pybind11Common.cmake")
+
+# Relative directory setting
+if(USE_PYTHON_INCLUDE_DIR AND DEFINED Python_INCLUDE_DIRS)
+ file(RELATIVE_PATH CMAKE_INSTALL_INCLUDEDIR ${CMAKE_INSTALL_PREFIX} ${Python_INCLUDE_DIRS})
+elseif(USE_PYTHON_INCLUDE_DIR AND DEFINED PYTHON_INCLUDE_DIR)
+ file(RELATIVE_PATH CMAKE_INSTALL_INCLUDEDIR ${CMAKE_INSTALL_PREFIX} ${PYTHON_INCLUDE_DIRS})
+endif()
+
+if(PYBIND11_INSTALL)
+ install(DIRECTORY ${pybind11_INCLUDE_DIR}/pybind11 DESTINATION ${CMAKE_INSTALL_INCLUDEDIR})
+ set(PYBIND11_CMAKECONFIG_INSTALL_DIR
+ "${CMAKE_INSTALL_DATAROOTDIR}/cmake/${PROJECT_NAME}"
+ CACHE STRING "install path for pybind11Config.cmake")
+
+ if(IS_ABSOLUTE "${CMAKE_INSTALL_INCLUDEDIR}")
+ set(pybind11_INCLUDEDIR "${CMAKE_INSTALL_FULL_INCLUDEDIR}")
+ else()
+ set(pybind11_INCLUDEDIR "\$\{PACKAGE_PREFIX_DIR\}/${CMAKE_INSTALL_INCLUDEDIR}")
+ endif()
+
+ configure_package_config_file(
+ tools/${PROJECT_NAME}Config.cmake.in "${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}Config.cmake"
+ INSTALL_DESTINATION ${PYBIND11_CMAKECONFIG_INSTALL_DIR})
+
+ if(CMAKE_VERSION VERSION_LESS 3.14)
+ # Remove CMAKE_SIZEOF_VOID_P from ConfigVersion.cmake since the library does
+ # not depend on architecture specific settings or libraries.
+ set(_PYBIND11_CMAKE_SIZEOF_VOID_P ${CMAKE_SIZEOF_VOID_P})
+ unset(CMAKE_SIZEOF_VOID_P)
+
+ write_basic_package_version_file(
+ ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake
+ VERSION ${PROJECT_VERSION}
+ COMPATIBILITY AnyNewerVersion)
+
+ set(CMAKE_SIZEOF_VOID_P ${_PYBIND11_CMAKE_SIZEOF_VOID_P})
+ else()
+ # CMake 3.14+ natively supports header-only libraries
+ write_basic_package_version_file(
+ ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake
+ VERSION ${PROJECT_VERSION}
+ COMPATIBILITY AnyNewerVersion ARCH_INDEPENDENT)
+ endif()
+
+ install(
+ FILES ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}Config.cmake
+ ${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake
+ tools/FindPythonLibsNew.cmake
+ tools/pybind11Common.cmake
+ tools/pybind11Tools.cmake
+ tools/pybind11NewTools.cmake
+ DESTINATION ${PYBIND11_CMAKECONFIG_INSTALL_DIR})
+
+ if(NOT PYBIND11_EXPORT_NAME)
+ set(PYBIND11_EXPORT_NAME "${PROJECT_NAME}Targets")
+ endif()
+
+ install(TARGETS pybind11_headers EXPORT "${PYBIND11_EXPORT_NAME}")
+
+ install(
+ EXPORT "${PYBIND11_EXPORT_NAME}"
+ NAMESPACE "pybind11::"
+ DESTINATION ${PYBIND11_CMAKECONFIG_INSTALL_DIR})
+
+ # Uninstall target
+ if(PYBIND11_MASTER_PROJECT)
+ configure_file("${CMAKE_CURRENT_SOURCE_DIR}/tools/cmake_uninstall.cmake.in"
+ "${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake" IMMEDIATE @ONLY)
+
+ add_custom_target(uninstall COMMAND ${CMAKE_COMMAND} -P
+ ${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake)
+ endif()
+endif()
+
+# BUILD_TESTING takes priority, but only if this is the master project
+if(PYBIND11_MASTER_PROJECT AND DEFINED BUILD_TESTING)
+ if(BUILD_TESTING)
+ if(_pybind11_nopython)
+ message(FATAL_ERROR "Cannot activate tests in NOPYTHON mode")
+ else()
+ add_subdirectory(tests)
+ endif()
+ endif()
+else()
+ if(PYBIND11_TEST)
+ if(_pybind11_nopython)
+ message(FATAL_ERROR "Cannot activate tests in NOPYTHON mode")
+ else()
+ add_subdirectory(tests)
+ endif()
+ endif()
+endif()
+
+# Better symmetry with find_package(pybind11 CONFIG) mode.
+if(NOT PYBIND11_MASTER_PROJECT)
+ set(pybind11_FOUND
+ TRUE
+ CACHE INTERNAL "True if pybind11 and all required components found on the system")
+endif()
diff --git a/third-party/DPVO/DPRetrieval/pybind11/LICENSE b/third-party/DPVO/DPRetrieval/pybind11/LICENSE
new file mode 100644
index 0000000000000000000000000000000000000000..46b7deda3da0daac2b9c2547237aa38cf449f6b1
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/LICENSE
@@ -0,0 +1,29 @@
+Copyright (c) 2016 Wenzel Jakob , All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+ list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+ this list of conditions and the following disclaimer in the documentation
+ and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors
+ may be used to endorse or promote products derived from this software
+ without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+Please also refer to the file .github/CONTRIBUTING.md, which clarifies licensing of
+external contributions to this project including patches, pull requests, etc.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/MANIFEST.in b/third-party/DPVO/DPRetrieval/pybind11/MANIFEST.in
new file mode 100644
index 0000000000000000000000000000000000000000..f2502e15cc831c1909d7bcd5a1b3b86936d70a83
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/MANIFEST.in
@@ -0,0 +1,6 @@
+recursive-include pybind11/include/pybind11 *.h
+recursive-include pybind11 *.py
+recursive-include pybind11 py.typed
+recursive-include pybind11 *.pyi
+include pybind11/share/cmake/pybind11/*.cmake
+include LICENSE README.rst pyproject.toml setup.py setup.cfg
diff --git a/third-party/DPVO/DPRetrieval/pybind11/README.rst b/third-party/DPVO/DPRetrieval/pybind11/README.rst
new file mode 100644
index 0000000000000000000000000000000000000000..0f0ed4af2b96abe1dc0e15655bbb0aace117e351
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/README.rst
@@ -0,0 +1,180 @@
+.. figure:: https://github.com/pybind/pybind11/raw/master/docs/pybind11-logo.png
+ :alt: pybind11 logo
+
+**pybind11 — Seamless operability between C++11 and Python**
+
+|Latest Documentation Status| |Stable Documentation Status| |Gitter chat| |GitHub Discussions| |CI| |Build status|
+
+|Repology| |PyPI package| |Conda-forge| |Python Versions|
+
+`Setuptools example `_
+• `Scikit-build example `_
+• `CMake example `_
+
+.. start
+
+
+**pybind11** is a lightweight header-only library that exposes C++ types
+in Python and vice versa, mainly to create Python bindings of existing
+C++ code. Its goals and syntax are similar to the excellent
+`Boost.Python `_
+library by David Abrahams: to minimize boilerplate code in traditional
+extension modules by inferring type information using compile-time
+introspection.
+
+The main issue with Boost.Python—and the reason for creating such a
+similar project—is Boost. Boost is an enormously large and complex suite
+of utility libraries that works with almost every C++ compiler in
+existence. This compatibility has its cost: arcane template tricks and
+workarounds are necessary to support the oldest and buggiest of compiler
+specimens. Now that C++11-compatible compilers are widely available,
+this heavy machinery has become an excessively large and unnecessary
+dependency.
+
+Think of this library as a tiny self-contained version of Boost.Python
+with everything stripped away that isn’t relevant for binding
+generation. Without comments, the core header files only require ~4K
+lines of code and depend on Python (2.7 or 3.5+, or PyPy) and the C++
+standard library. This compact implementation was possible thanks to
+some of the new C++11 language features (specifically: tuples, lambda
+functions and variadic templates). Since its creation, this library has
+grown beyond Boost.Python in many ways, leading to dramatically simpler
+binding code in many common situations.
+
+Tutorial and reference documentation is provided at
+`pybind11.readthedocs.io `_.
+A PDF version of the manual is available
+`here `_.
+And the source code is always available at
+`github.com/pybind/pybind11 `_.
+
+
+Core features
+-------------
+
+
+pybind11 can map the following core C++ features to Python:
+
+- Functions accepting and returning custom data structures per value,
+ reference, or pointer
+- Instance methods and static methods
+- Overloaded functions
+- Instance attributes and static attributes
+- Arbitrary exception types
+- Enumerations
+- Callbacks
+- Iterators and ranges
+- Custom operators
+- Single and multiple inheritance
+- STL data structures
+- Smart pointers with reference counting like ``std::shared_ptr``
+- Internal references with correct reference counting
+- C++ classes with virtual (and pure virtual) methods can be extended
+ in Python
+
+Goodies
+-------
+
+In addition to the core functionality, pybind11 provides some extra
+goodies:
+
+- Python 2.7, 3.5+, and PyPy/PyPy3 7.3 are supported with an
+ implementation-agnostic interface.
+
+- It is possible to bind C++11 lambda functions with captured
+ variables. The lambda capture data is stored inside the resulting
+ Python function object.
+
+- pybind11 uses C++11 move constructors and move assignment operators
+ whenever possible to efficiently transfer custom data types.
+
+- It’s easy to expose the internal storage of custom data types through
+ Pythons’ buffer protocols. This is handy e.g. for fast conversion
+ between C++ matrix classes like Eigen and NumPy without expensive
+ copy operations.
+
+- pybind11 can automatically vectorize functions so that they are
+ transparently applied to all entries of one or more NumPy array
+ arguments.
+
+- Python's slice-based access and assignment operations can be
+ supported with just a few lines of code.
+
+- Everything is contained in just a few header files; there is no need
+ to link against any additional libraries.
+
+- Binaries are generally smaller by a factor of at least 2 compared to
+ equivalent bindings generated by Boost.Python. A recent pybind11
+ conversion of PyRosetta, an enormous Boost.Python binding project,
+ `reported `_
+ a binary size reduction of **5.4x** and compile time reduction by
+ **5.8x**.
+
+- Function signatures are precomputed at compile time (using
+ ``constexpr``), leading to smaller binaries.
+
+- With little extra effort, C++ types can be pickled and unpickled
+ similar to regular Python objects.
+
+Supported compilers
+-------------------
+
+1. Clang/LLVM 3.3 or newer (for Apple Xcode’s clang, this is 5.0.0 or
+ newer)
+2. GCC 4.8 or newer
+3. Microsoft Visual Studio 2015 Update 3 or newer
+4. Intel classic C++ compiler 18 or newer (ICC 20.2 tested in CI)
+5. Cygwin/GCC (previously tested on 2.5.1)
+6. NVCC (CUDA 11.0 tested in CI)
+7. NVIDIA PGI (20.9 tested in CI)
+
+About
+-----
+
+This project was created by `Wenzel
+Jakob `_. Significant features and/or
+improvements to the code were contributed by Jonas Adler, Lori A. Burns,
+Sylvain Corlay, Eric Cousineau, Aaron Gokaslan, Ralf Grosse-Kunstleve, Trent Houliston, Axel
+Huebl, @hulucc, Yannick Jadoul, Sergey Lyskov Johan Mabille, Tomasz Miąsko,
+Dean Moldovan, Ben Pritchard, Jason Rhinelander, Boris Schäling, Pim
+Schellart, Henry Schreiner, Ivan Smirnov, Boris Staletic, and Patrick Stewart.
+
+We thank Google for a generous financial contribution to the continuous
+integration infrastructure used by this project.
+
+
+Contributing
+~~~~~~~~~~~~
+
+See the `contributing
+guide `_
+for information on building and contributing to pybind11.
+
+License
+~~~~~~~
+
+pybind11 is provided under a BSD-style license that can be found in the
+`LICENSE `_
+file. By using, distributing, or contributing to this project, you agree
+to the terms and conditions of this license.
+
+.. |Latest Documentation Status| image:: https://readthedocs.org/projects/pybind11/badge?version=latest
+ :target: http://pybind11.readthedocs.org/en/latest
+.. |Stable Documentation Status| image:: https://img.shields.io/badge/docs-stable-blue.svg
+ :target: http://pybind11.readthedocs.org/en/stable
+.. |Gitter chat| image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
+ :target: https://gitter.im/pybind/Lobby
+.. |CI| image:: https://github.com/pybind/pybind11/workflows/CI/badge.svg
+ :target: https://github.com/pybind/pybind11/actions
+.. |Build status| image:: https://ci.appveyor.com/api/projects/status/riaj54pn4h08xy40?svg=true
+ :target: https://ci.appveyor.com/project/wjakob/pybind11
+.. |PyPI package| image:: https://img.shields.io/pypi/v/pybind11.svg
+ :target: https://pypi.org/project/pybind11/
+.. |Conda-forge| image:: https://img.shields.io/conda/vn/conda-forge/pybind11.svg
+ :target: https://github.com/conda-forge/pybind11-feedstock
+.. |Repology| image:: https://repology.org/badge/latest-versions/python:pybind11.svg
+ :target: https://repology.org/project/python:pybind11/versions
+.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/pybind11.svg
+ :target: https://pypi.org/project/pybind11/
+.. |GitHub Discussions| image:: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
+ :target: https://github.com/pybind/pybind11/discussions
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/Doxyfile b/third-party/DPVO/DPRetrieval/pybind11/docs/Doxyfile
new file mode 100644
index 0000000000000000000000000000000000000000..3dcbc5bfa5bfe77e2325e33a40c3bce720618fcb
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/Doxyfile
@@ -0,0 +1,22 @@
+PROJECT_NAME = pybind11
+INPUT = ../include/pybind11/
+RECURSIVE = YES
+
+GENERATE_HTML = NO
+GENERATE_LATEX = NO
+GENERATE_XML = YES
+XML_OUTPUT = .build/doxygenxml
+XML_PROGRAMLISTING = YES
+
+MACRO_EXPANSION = YES
+EXPAND_ONLY_PREDEF = YES
+EXPAND_AS_DEFINED = PYBIND11_RUNTIME_EXCEPTION
+
+ALIASES = "rst=\verbatim embed:rst"
+ALIASES += "endrst=\endverbatim"
+
+QUIET = YES
+WARNINGS = YES
+WARN_IF_UNDOCUMENTED = NO
+PREDEFINED = PY_MAJOR_VERSION=3 \
+ PYBIND11_NOINLINE
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/_static/theme_overrides.css b/third-party/DPVO/DPRetrieval/pybind11/docs/_static/theme_overrides.css
new file mode 100644
index 0000000000000000000000000000000000000000..8021731bdee9066aea8ebd1a32ab0e0b569a231d
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/_static/theme_overrides.css
@@ -0,0 +1,11 @@
+.wy-table-responsive table td,
+.wy-table-responsive table th {
+ white-space: initial !important;
+}
+.rst-content table.docutils td {
+ vertical-align: top !important;
+}
+div[class^='highlight'] pre {
+ white-space: pre;
+ white-space: pre-wrap;
+}
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/chrono.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/chrono.rst
new file mode 100644
index 0000000000000000000000000000000000000000..6adcc9fd3fa7c65bac94ff584a3188bf98490feb
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/chrono.rst
@@ -0,0 +1,81 @@
+Chrono
+======
+
+When including the additional header file :file:`pybind11/chrono.h` conversions
+from C++11 chrono datatypes to python datetime objects are automatically enabled.
+This header also enables conversions of python floats (often from sources such
+as ``time.monotonic()``, ``time.perf_counter()`` and ``time.process_time()``)
+into durations.
+
+An overview of clocks in C++11
+------------------------------
+
+A point of confusion when using these conversions is the differences between
+clocks provided in C++11. There are three clock types defined by the C++11
+standard and users can define their own if needed. Each of these clocks have
+different properties and when converting to and from python will give different
+results.
+
+The first clock defined by the standard is ``std::chrono::system_clock``. This
+clock measures the current date and time. However, this clock changes with to
+updates to the operating system time. For example, if your time is synchronised
+with a time server this clock will change. This makes this clock a poor choice
+for timing purposes but good for measuring the wall time.
+
+The second clock defined in the standard is ``std::chrono::steady_clock``.
+This clock ticks at a steady rate and is never adjusted. This makes it excellent
+for timing purposes, however the value in this clock does not correspond to the
+current date and time. Often this clock will be the amount of time your system
+has been on, although it does not have to be. This clock will never be the same
+clock as the system clock as the system clock can change but steady clocks
+cannot.
+
+The third clock defined in the standard is ``std::chrono::high_resolution_clock``.
+This clock is the clock that has the highest resolution out of the clocks in the
+system. It is normally a typedef to either the system clock or the steady clock
+but can be its own independent clock. This is important as when using these
+conversions as the types you get in python for this clock might be different
+depending on the system.
+If it is a typedef of the system clock, python will get datetime objects, but if
+it is a different clock they will be timedelta objects.
+
+Provided conversions
+--------------------
+
+.. rubric:: C++ to Python
+
+- ``std::chrono::system_clock::time_point`` → ``datetime.datetime``
+ System clock times are converted to python datetime instances. They are
+ in the local timezone, but do not have any timezone information attached
+ to them (they are naive datetime objects).
+
+- ``std::chrono::duration`` → ``datetime.timedelta``
+ Durations are converted to timedeltas, any precision in the duration
+ greater than microseconds is lost by rounding towards zero.
+
+- ``std::chrono::[other_clocks]::time_point`` → ``datetime.timedelta``
+ Any clock time that is not the system clock is converted to a time delta.
+ This timedelta measures the time from the clocks epoch to now.
+
+.. rubric:: Python to C++
+
+- ``datetime.datetime`` or ``datetime.date`` or ``datetime.time`` → ``std::chrono::system_clock::time_point``
+ Date/time objects are converted into system clock timepoints. Any
+ timezone information is ignored and the type is treated as a naive
+ object.
+
+- ``datetime.timedelta`` → ``std::chrono::duration``
+ Time delta are converted into durations with microsecond precision.
+
+- ``datetime.timedelta`` → ``std::chrono::[other_clocks]::time_point``
+ Time deltas that are converted into clock timepoints are treated as
+ the amount of time from the start of the clocks epoch.
+
+- ``float`` → ``std::chrono::duration``
+ Floats that are passed to C++ as durations be interpreted as a number of
+ seconds. These will be converted to the duration using ``duration_cast``
+ from the float.
+
+- ``float`` → ``std::chrono::[other_clocks]::time_point``
+ Floats that are passed to C++ as time points will be interpreted as the
+ number of seconds from the start of the clocks epoch.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/custom.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/custom.rst
new file mode 100644
index 0000000000000000000000000000000000000000..fee178a0d0d24e7e9d8d9a84bfc9013f65181f84
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/custom.rst
@@ -0,0 +1,93 @@
+Custom type casters
+===================
+
+In very rare cases, applications may require custom type casters that cannot be
+expressed using the abstractions provided by pybind11, thus requiring raw
+Python C API calls. This is fairly advanced usage and should only be pursued by
+experts who are familiar with the intricacies of Python reference counting.
+
+The following snippets demonstrate how this works for a very simple ``inty``
+type that that should be convertible from Python types that provide a
+``__int__(self)`` method.
+
+.. code-block:: cpp
+
+ struct inty { long long_value; };
+
+ void print(inty s) {
+ std::cout << s.long_value << std::endl;
+ }
+
+The following Python snippet demonstrates the intended usage from the Python side:
+
+.. code-block:: python
+
+ class A:
+ def __int__(self):
+ return 123
+
+
+ from example import print
+
+ print(A())
+
+To register the necessary conversion routines, it is necessary to add an
+instantiation of the ``pybind11::detail::type_caster`` template.
+Although this is an implementation detail, adding an instantiation of this
+type is explicitly allowed.
+
+.. code-block:: cpp
+
+ namespace pybind11 { namespace detail {
+ template <> struct type_caster {
+ public:
+ /**
+ * This macro establishes the name 'inty' in
+ * function signatures and declares a local variable
+ * 'value' of type inty
+ */
+ PYBIND11_TYPE_CASTER(inty, const_name("inty"));
+
+ /**
+ * Conversion part 1 (Python->C++): convert a PyObject into a inty
+ * instance or return false upon failure. The second argument
+ * indicates whether implicit conversions should be applied.
+ */
+ bool load(handle src, bool) {
+ /* Extract PyObject from handle */
+ PyObject *source = src.ptr();
+ /* Try converting into a Python integer value */
+ PyObject *tmp = PyNumber_Long(source);
+ if (!tmp)
+ return false;
+ /* Now try to convert into a C++ int */
+ value.long_value = PyLong_AsLong(tmp);
+ Py_DECREF(tmp);
+ /* Ensure return code was OK (to avoid out-of-range errors etc) */
+ return !(value.long_value == -1 && !PyErr_Occurred());
+ }
+
+ /**
+ * Conversion part 2 (C++ -> Python): convert an inty instance into
+ * a Python object. The second and third arguments are used to
+ * indicate the return value policy and parent object (for
+ * ``return_value_policy::reference_internal``) and are generally
+ * ignored by implicit casters.
+ */
+ static handle cast(inty src, return_value_policy /* policy */, handle /* parent */) {
+ return PyLong_FromLong(src.long_value);
+ }
+ };
+ }} // namespace pybind11::detail
+
+.. note::
+
+ A ``type_caster`` defined with ``PYBIND11_TYPE_CASTER(T, ...)`` requires
+ that ``T`` is default-constructible (``value`` is first default constructed
+ and then ``load()`` assigns to it).
+
+.. warning::
+
+ When using custom type casters, it's important to declare them consistently
+ in every compilation unit of the Python extension module. Otherwise,
+ undefined behavior can ensue.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/eigen.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/eigen.rst
new file mode 100644
index 0000000000000000000000000000000000000000..0e7fdb169019497e2e5226492f92483f656c3bb3
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/eigen.rst
@@ -0,0 +1,310 @@
+Eigen
+#####
+
+`Eigen `_ is C++ header-based library for dense and
+sparse linear algebra. Due to its popularity and widespread adoption, pybind11
+provides transparent conversion and limited mapping support between Eigen and
+Scientific Python linear algebra data types.
+
+To enable the built-in Eigen support you must include the optional header file
+:file:`pybind11/eigen.h`.
+
+Pass-by-value
+=============
+
+When binding a function with ordinary Eigen dense object arguments (for
+example, ``Eigen::MatrixXd``), pybind11 will accept any input value that is
+already (or convertible to) a ``numpy.ndarray`` with dimensions compatible with
+the Eigen type, copy its values into a temporary Eigen variable of the
+appropriate type, then call the function with this temporary variable.
+
+Sparse matrices are similarly copied to or from
+``scipy.sparse.csr_matrix``/``scipy.sparse.csc_matrix`` objects.
+
+Pass-by-reference
+=================
+
+One major limitation of the above is that every data conversion implicitly
+involves a copy, which can be both expensive (for large matrices) and disallows
+binding functions that change their (Matrix) arguments. Pybind11 allows you to
+work around this by using Eigen's ``Eigen::Ref`` class much as you
+would when writing a function taking a generic type in Eigen itself (subject to
+some limitations discussed below).
+
+When calling a bound function accepting a ``Eigen::Ref``
+type, pybind11 will attempt to avoid copying by using an ``Eigen::Map`` object
+that maps into the source ``numpy.ndarray`` data: this requires both that the
+data types are the same (e.g. ``dtype='float64'`` and ``MatrixType::Scalar`` is
+``double``); and that the storage is layout compatible. The latter limitation
+is discussed in detail in the section below, and requires careful
+consideration: by default, numpy matrices and Eigen matrices are *not* storage
+compatible.
+
+If the numpy matrix cannot be used as is (either because its types differ, e.g.
+passing an array of integers to an Eigen parameter requiring doubles, or
+because the storage is incompatible), pybind11 makes a temporary copy and
+passes the copy instead.
+
+When a bound function parameter is instead ``Eigen::Ref`` (note the
+lack of ``const``), pybind11 will only allow the function to be called if it
+can be mapped *and* if the numpy array is writeable (that is
+``a.flags.writeable`` is true). Any access (including modification) made to
+the passed variable will be transparently carried out directly on the
+``numpy.ndarray``.
+
+This means you can write code such as the following and have it work as
+expected:
+
+.. code-block:: cpp
+
+ void scale_by_2(Eigen::Ref v) {
+ v *= 2;
+ }
+
+Note, however, that you will likely run into limitations due to numpy and
+Eigen's difference default storage order for data; see the below section on
+:ref:`storage_orders` for details on how to bind code that won't run into such
+limitations.
+
+.. note::
+
+ Passing by reference is not supported for sparse types.
+
+Returning values to Python
+==========================
+
+When returning an ordinary dense Eigen matrix type to numpy (e.g.
+``Eigen::MatrixXd`` or ``Eigen::RowVectorXf``) pybind11 keeps the matrix and
+returns a numpy array that directly references the Eigen matrix: no copy of the
+data is performed. The numpy array will have ``array.flags.owndata`` set to
+``False`` to indicate that it does not own the data, and the lifetime of the
+stored Eigen matrix will be tied to the returned ``array``.
+
+If you bind a function with a non-reference, ``const`` return type (e.g.
+``const Eigen::MatrixXd``), the same thing happens except that pybind11 also
+sets the numpy array's ``writeable`` flag to false.
+
+If you return an lvalue reference or pointer, the usual pybind11 rules apply,
+as dictated by the binding function's return value policy (see the
+documentation on :ref:`return_value_policies` for full details). That means,
+without an explicit return value policy, lvalue references will be copied and
+pointers will be managed by pybind11. In order to avoid copying, you should
+explicitly specify an appropriate return value policy, as in the following
+example:
+
+.. code-block:: cpp
+
+ class MyClass {
+ Eigen::MatrixXd big_mat = Eigen::MatrixXd::Zero(10000, 10000);
+ public:
+ Eigen::MatrixXd &getMatrix() { return big_mat; }
+ const Eigen::MatrixXd &viewMatrix() { return big_mat; }
+ };
+
+ // Later, in binding code:
+ py::class_(m, "MyClass")
+ .def(py::init<>())
+ .def("copy_matrix", &MyClass::getMatrix) // Makes a copy!
+ .def("get_matrix", &MyClass::getMatrix, py::return_value_policy::reference_internal)
+ .def("view_matrix", &MyClass::viewMatrix, py::return_value_policy::reference_internal)
+ ;
+
+.. code-block:: python
+
+ a = MyClass()
+ m = a.get_matrix() # flags.writeable = True, flags.owndata = False
+ v = a.view_matrix() # flags.writeable = False, flags.owndata = False
+ c = a.copy_matrix() # flags.writeable = True, flags.owndata = True
+ # m[5,6] and v[5,6] refer to the same element, c[5,6] does not.
+
+Note in this example that ``py::return_value_policy::reference_internal`` is
+used to tie the life of the MyClass object to the life of the returned arrays.
+
+You may also return an ``Eigen::Ref``, ``Eigen::Map`` or other map-like Eigen
+object (for example, the return value of ``matrix.block()`` and related
+methods) that map into a dense Eigen type. When doing so, the default
+behaviour of pybind11 is to simply reference the returned data: you must take
+care to ensure that this data remains valid! You may ask pybind11 to
+explicitly *copy* such a return value by using the
+``py::return_value_policy::copy`` policy when binding the function. You may
+also use ``py::return_value_policy::reference_internal`` or a
+``py::keep_alive`` to ensure the data stays valid as long as the returned numpy
+array does.
+
+When returning such a reference of map, pybind11 additionally respects the
+readonly-status of the returned value, marking the numpy array as non-writeable
+if the reference or map was itself read-only.
+
+.. note::
+
+ Sparse types are always copied when returned.
+
+.. _storage_orders:
+
+Storage orders
+==============
+
+Passing arguments via ``Eigen::Ref`` has some limitations that you must be
+aware of in order to effectively pass matrices by reference. First and
+foremost is that the default ``Eigen::Ref`` class requires
+contiguous storage along columns (for column-major types, the default in Eigen)
+or rows if ``MatrixType`` is specifically an ``Eigen::RowMajor`` storage type.
+The former, Eigen's default, is incompatible with ``numpy``'s default row-major
+storage, and so you will not be able to pass numpy arrays to Eigen by reference
+without making one of two changes.
+
+(Note that this does not apply to vectors (or column or row matrices): for such
+types the "row-major" and "column-major" distinction is meaningless).
+
+The first approach is to change the use of ``Eigen::Ref`` to the
+more general ``Eigen::Ref>`` (or similar type with a fully dynamic stride type in the
+third template argument). Since this is a rather cumbersome type, pybind11
+provides a ``py::EigenDRef`` type alias for your convenience (along
+with EigenDMap for the equivalent Map, and EigenDStride for just the stride
+type).
+
+This type allows Eigen to map into any arbitrary storage order. This is not
+the default in Eigen for performance reasons: contiguous storage allows
+vectorization that cannot be done when storage is not known to be contiguous at
+compile time. The default ``Eigen::Ref`` stride type allows non-contiguous
+storage along the outer dimension (that is, the rows of a column-major matrix
+or columns of a row-major matrix), but not along the inner dimension.
+
+This type, however, has the added benefit of also being able to map numpy array
+slices. For example, the following (contrived) example uses Eigen with a numpy
+slice to multiply by 2 all coefficients that are both on even rows (0, 2, 4,
+...) and in columns 2, 5, or 8:
+
+.. code-block:: cpp
+
+ m.def("scale", [](py::EigenDRef m, double c) { m *= c; });
+
+.. code-block:: python
+
+ # a = np.array(...)
+ scale_by_2(myarray[0::2, 2:9:3])
+
+The second approach to avoid copying is more intrusive: rearranging the
+underlying data types to not run into the non-contiguous storage problem in the
+first place. In particular, that means using matrices with ``Eigen::RowMajor``
+storage, where appropriate, such as:
+
+.. code-block:: cpp
+
+ using RowMatrixXd = Eigen::Matrix;
+ // Use RowMatrixXd instead of MatrixXd
+
+Now bound functions accepting ``Eigen::Ref`` arguments will be
+callable with numpy's (default) arrays without involving a copying.
+
+You can, alternatively, change the storage order that numpy arrays use by
+adding the ``order='F'`` option when creating an array:
+
+.. code-block:: python
+
+ myarray = np.array(source, order="F")
+
+Such an object will be passable to a bound function accepting an
+``Eigen::Ref`` (or similar column-major Eigen type).
+
+One major caveat with this approach, however, is that it is not entirely as
+easy as simply flipping all Eigen or numpy usage from one to the other: some
+operations may alter the storage order of a numpy array. For example, ``a2 =
+array.transpose()`` results in ``a2`` being a view of ``array`` that references
+the same data, but in the opposite storage order!
+
+While this approach allows fully optimized vectorized calculations in Eigen, it
+cannot be used with array slices, unlike the first approach.
+
+When *returning* a matrix to Python (either a regular matrix, a reference via
+``Eigen::Ref<>``, or a map/block into a matrix), no special storage
+consideration is required: the created numpy array will have the required
+stride that allows numpy to properly interpret the array, whatever its storage
+order.
+
+Failing rather than copying
+===========================
+
+The default behaviour when binding ``Eigen::Ref`` Eigen
+references is to copy matrix values when passed a numpy array that does not
+conform to the element type of ``MatrixType`` or does not have a compatible
+stride layout. If you want to explicitly avoid copying in such a case, you
+should bind arguments using the ``py::arg().noconvert()`` annotation (as
+described in the :ref:`nonconverting_arguments` documentation).
+
+The following example shows an example of arguments that don't allow data
+copying to take place:
+
+.. code-block:: cpp
+
+ // The method and function to be bound:
+ class MyClass {
+ // ...
+ double some_method(const Eigen::Ref &matrix) { /* ... */ }
+ };
+ float some_function(const Eigen::Ref &big,
+ const Eigen::Ref &small) {
+ // ...
+ }
+
+ // The associated binding code:
+ using namespace pybind11::literals; // for "arg"_a
+ py::class_(m, "MyClass")
+ // ... other class definitions
+ .def("some_method", &MyClass::some_method, py::arg().noconvert());
+
+ m.def("some_function", &some_function,
+ "big"_a.noconvert(), // <- Don't allow copying for this arg
+ "small"_a // <- This one can be copied if needed
+ );
+
+With the above binding code, attempting to call the the ``some_method(m)``
+method on a ``MyClass`` object, or attempting to call ``some_function(m, m2)``
+will raise a ``RuntimeError`` rather than making a temporary copy of the array.
+It will, however, allow the ``m2`` argument to be copied into a temporary if
+necessary.
+
+Note that explicitly specifying ``.noconvert()`` is not required for *mutable*
+Eigen references (e.g. ``Eigen::Ref`` without ``const`` on the
+``MatrixXd``): mutable references will never be called with a temporary copy.
+
+Vectors versus column/row matrices
+==================================
+
+Eigen and numpy have fundamentally different notions of a vector. In Eigen, a
+vector is simply a matrix with the number of columns or rows set to 1 at
+compile time (for a column vector or row vector, respectively). NumPy, in
+contrast, has comparable 2-dimensional 1xN and Nx1 arrays, but *also* has
+1-dimensional arrays of size N.
+
+When passing a 2-dimensional 1xN or Nx1 array to Eigen, the Eigen type must
+have matching dimensions: That is, you cannot pass a 2-dimensional Nx1 numpy
+array to an Eigen value expecting a row vector, or a 1xN numpy array as a
+column vector argument.
+
+On the other hand, pybind11 allows you to pass 1-dimensional arrays of length N
+as Eigen parameters. If the Eigen type can hold a column vector of length N it
+will be passed as such a column vector. If not, but the Eigen type constraints
+will accept a row vector, it will be passed as a row vector. (The column
+vector takes precedence when both are supported, for example, when passing a
+1D numpy array to a MatrixXd argument). Note that the type need not be
+explicitly a vector: it is permitted to pass a 1D numpy array of size 5 to an
+Eigen ``Matrix``: you would end up with a 1x5 Eigen matrix.
+Passing the same to an ``Eigen::MatrixXd`` would result in a 5x1 Eigen matrix.
+
+When returning an Eigen vector to numpy, the conversion is ambiguous: a row
+vector of length 4 could be returned as either a 1D array of length 4, or as a
+2D array of size 1x4. When encountering such a situation, pybind11 compromises
+by considering the returned Eigen type: if it is a compile-time vector--that
+is, the type has either the number of rows or columns set to 1 at compile
+time--pybind11 converts to a 1D numpy array when returning the value. For
+instances that are a vector only at run-time (e.g. ``MatrixXd``,
+``Matrix``), pybind11 returns the vector as a 2D array to
+numpy. If this isn't want you want, you can use ``array.reshape(...)`` to get
+a view of the same data in the desired dimensions.
+
+.. seealso::
+
+ The file :file:`tests/test_eigen.cpp` contains a complete example that
+ shows how to pass Eigen sparse and dense data types in more detail.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/functional.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/functional.rst
new file mode 100644
index 0000000000000000000000000000000000000000..8b301895cca9dc4ab997c32d77a5d8622325a25a
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/functional.rst
@@ -0,0 +1,109 @@
+Functional
+##########
+
+The following features must be enabled by including :file:`pybind11/functional.h`.
+
+
+Callbacks and passing anonymous functions
+=========================================
+
+The C++11 standard brought lambda functions and the generic polymorphic
+function wrapper ``std::function<>`` to the C++ programming language, which
+enable powerful new ways of working with functions. Lambda functions come in
+two flavors: stateless lambda function resemble classic function pointers that
+link to an anonymous piece of code, while stateful lambda functions
+additionally depend on captured variables that are stored in an anonymous
+*lambda closure object*.
+
+Here is a simple example of a C++ function that takes an arbitrary function
+(stateful or stateless) with signature ``int -> int`` as an argument and runs
+it with the value 10.
+
+.. code-block:: cpp
+
+ int func_arg(const std::function &f) {
+ return f(10);
+ }
+
+The example below is more involved: it takes a function of signature ``int -> int``
+and returns another function of the same kind. The return value is a stateful
+lambda function, which stores the value ``f`` in the capture object and adds 1 to
+its return value upon execution.
+
+.. code-block:: cpp
+
+ std::function func_ret(const std::function &f) {
+ return [f](int i) {
+ return f(i) + 1;
+ };
+ }
+
+This example demonstrates using python named parameters in C++ callbacks which
+requires using ``py::cpp_function`` as a wrapper. Usage is similar to defining
+methods of classes:
+
+.. code-block:: cpp
+
+ py::cpp_function func_cpp() {
+ return py::cpp_function([](int i) { return i+1; },
+ py::arg("number"));
+ }
+
+After including the extra header file :file:`pybind11/functional.h`, it is almost
+trivial to generate binding code for all of these functions.
+
+.. code-block:: cpp
+
+ #include
+
+ PYBIND11_MODULE(example, m) {
+ m.def("func_arg", &func_arg);
+ m.def("func_ret", &func_ret);
+ m.def("func_cpp", &func_cpp);
+ }
+
+The following interactive session shows how to call them from Python.
+
+.. code-block:: pycon
+
+ $ python
+ >>> import example
+ >>> def square(i):
+ ... return i * i
+ ...
+ >>> example.func_arg(square)
+ 100L
+ >>> square_plus_1 = example.func_ret(square)
+ >>> square_plus_1(4)
+ 17L
+ >>> plus_1 = func_cpp()
+ >>> plus_1(number=43)
+ 44L
+
+.. warning::
+
+ Keep in mind that passing a function from C++ to Python (or vice versa)
+ will instantiate a piece of wrapper code that translates function
+ invocations between the two languages. Naturally, this translation
+ increases the computational cost of each function call somewhat. A
+ problematic situation can arise when a function is copied back and forth
+ between Python and C++ many times in a row, in which case the underlying
+ wrappers will accumulate correspondingly. The resulting long sequence of
+ C++ -> Python -> C++ -> ... roundtrips can significantly decrease
+ performance.
+
+ There is one exception: pybind11 detects case where a stateless function
+ (i.e. a function pointer or a lambda function without captured variables)
+ is passed as an argument to another C++ function exposed in Python. In this
+ case, there is no overhead. Pybind11 will extract the underlying C++
+ function pointer from the wrapped function to sidestep a potential C++ ->
+ Python -> C++ roundtrip. This is demonstrated in :file:`tests/test_callbacks.cpp`.
+
+.. note::
+
+ This functionality is very useful when generating bindings for callbacks in
+ C++ libraries (e.g. GUI libraries, asynchronous networking libraries, etc.).
+
+ The file :file:`tests/test_callbacks.cpp` contains a complete example
+ that demonstrates how to work with callbacks and anonymous functions in
+ more detail.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/index.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..336aed113a16acc963e7587a43b1dae79b07917f
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/index.rst
@@ -0,0 +1,43 @@
+.. _type-conversions:
+
+Type conversions
+################
+
+Apart from enabling cross-language function calls, a fundamental problem
+that a binding tool like pybind11 must address is to provide access to
+native Python types in C++ and vice versa. There are three fundamentally
+different ways to do this—which approach is preferable for a particular type
+depends on the situation at hand.
+
+1. Use a native C++ type everywhere. In this case, the type must be wrapped
+ using pybind11-generated bindings so that Python can interact with it.
+
+2. Use a native Python type everywhere. It will need to be wrapped so that
+ C++ functions can interact with it.
+
+3. Use a native C++ type on the C++ side and a native Python type on the
+ Python side. pybind11 refers to this as a *type conversion*.
+
+ Type conversions are the most "natural" option in the sense that native
+ (non-wrapped) types are used everywhere. The main downside is that a copy
+ of the data must be made on every Python ↔ C++ transition: this is
+ needed since the C++ and Python versions of the same type generally won't
+ have the same memory layout.
+
+ pybind11 can perform many kinds of conversions automatically. An overview
+ is provided in the table ":ref:`conversion_table`".
+
+The following subsections discuss the differences between these options in more
+detail. The main focus in this section is on type conversions, which represent
+the last case of the above list.
+
+.. toctree::
+ :maxdepth: 1
+
+ overview
+ strings
+ stl
+ functional
+ chrono
+ eigen
+ custom
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/overview.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/overview.rst
new file mode 100644
index 0000000000000000000000000000000000000000..7c512a6051b73c4693492efe8f0898efca53490d
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/overview.rst
@@ -0,0 +1,171 @@
+Overview
+########
+
+.. rubric:: 1. Native type in C++, wrapper in Python
+
+Exposing a custom C++ type using :class:`py::class_` was covered in detail
+in the :doc:`/classes` section. There, the underlying data structure is
+always the original C++ class while the :class:`py::class_` wrapper provides
+a Python interface. Internally, when an object like this is sent from C++ to
+Python, pybind11 will just add the outer wrapper layer over the native C++
+object. Getting it back from Python is just a matter of peeling off the
+wrapper.
+
+.. rubric:: 2. Wrapper in C++, native type in Python
+
+This is the exact opposite situation. Now, we have a type which is native to
+Python, like a ``tuple`` or a ``list``. One way to get this data into C++ is
+with the :class:`py::object` family of wrappers. These are explained in more
+detail in the :doc:`/advanced/pycpp/object` section. We'll just give a quick
+example here:
+
+.. code-block:: cpp
+
+ void print_list(py::list my_list) {
+ for (auto item : my_list)
+ std::cout << item << " ";
+ }
+
+.. code-block:: pycon
+
+ >>> print_list([1, 2, 3])
+ 1 2 3
+
+The Python ``list`` is not converted in any way -- it's just wrapped in a C++
+:class:`py::list` class. At its core it's still a Python object. Copying a
+:class:`py::list` will do the usual reference-counting like in Python.
+Returning the object to Python will just remove the thin wrapper.
+
+.. rubric:: 3. Converting between native C++ and Python types
+
+In the previous two cases we had a native type in one language and a wrapper in
+the other. Now, we have native types on both sides and we convert between them.
+
+.. code-block:: cpp
+
+ void print_vector(const std::vector &v) {
+ for (auto item : v)
+ std::cout << item << "\n";
+ }
+
+.. code-block:: pycon
+
+ >>> print_vector([1, 2, 3])
+ 1 2 3
+
+In this case, pybind11 will construct a new ``std::vector`` and copy each
+element from the Python ``list``. The newly constructed object will be passed
+to ``print_vector``. The same thing happens in the other direction: a new
+``list`` is made to match the value returned from C++.
+
+Lots of these conversions are supported out of the box, as shown in the table
+below. They are very convenient, but keep in mind that these conversions are
+fundamentally based on copying data. This is perfectly fine for small immutable
+types but it may become quite expensive for large data structures. This can be
+avoided by overriding the automatic conversion with a custom wrapper (i.e. the
+above-mentioned approach 1). This requires some manual effort and more details
+are available in the :ref:`opaque` section.
+
+.. _conversion_table:
+
+List of all builtin conversions
+-------------------------------
+
+The following basic data types are supported out of the box (some may require
+an additional extension header to be included). To pass other data structures
+as arguments and return values, refer to the section on binding :ref:`classes`.
+
++------------------------------------+---------------------------+-----------------------------------+
+| Data type | Description | Header file |
++====================================+===========================+===================================+
+| ``int8_t``, ``uint8_t`` | 8-bit integers | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``int16_t``, ``uint16_t`` | 16-bit integers | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``int32_t``, ``uint32_t`` | 32-bit integers | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``int64_t``, ``uint64_t`` | 64-bit integers | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``ssize_t``, ``size_t`` | Platform-dependent size | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``float``, ``double`` | Floating point types | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``bool`` | Two-state Boolean type | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``char`` | Character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``char16_t`` | UTF-16 character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``char32_t`` | UTF-32 character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``wchar_t`` | Wide character literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``const char *`` | UTF-8 string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``const char16_t *`` | UTF-16 string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``const char32_t *`` | UTF-32 string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``const wchar_t *`` | Wide string literal | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::string`` | STL dynamic UTF-8 string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::u16string`` | STL dynamic UTF-16 string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::u32string`` | STL dynamic UTF-32 string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::wstring`` | STL dynamic wide string | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::string_view``, | STL C++17 string views | :file:`pybind11/pybind11.h` |
+| ``std::u16string_view``, etc. | | |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::pair`` | Pair of two custom types | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::tuple<...>`` | Arbitrary tuple of types | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::reference_wrapper<...>`` | Reference type wrapper | :file:`pybind11/pybind11.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::complex`` | Complex numbers | :file:`pybind11/complex.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::array`` | STL static array | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::vector`` | STL dynamic array | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::deque`` | STL double-ended queue | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::valarray`` | STL value array | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::list`` | STL linked list | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::map`` | STL ordered map | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::unordered_map`` | STL unordered map | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::set`` | STL ordered set | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::unordered_set`` | STL unordered set | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::optional`` | STL optional type (C++17) | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::experimental::optional`` | STL optional type (exp.) | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::variant<...>`` | Type-safe union (C++17) | :file:`pybind11/stl.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::filesystem::path`` | STL path (C++17) [#]_ | :file:`pybind11/stl/filesystem.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::function<...>`` | STL polymorphic function | :file:`pybind11/functional.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::chrono::duration<...>`` | STL time duration | :file:`pybind11/chrono.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``std::chrono::time_point<...>`` | STL date/time | :file:`pybind11/chrono.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``Eigen::Matrix<...>`` | Eigen: dense matrix | :file:`pybind11/eigen.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``Eigen::Map<...>`` | Eigen: mapped memory | :file:`pybind11/eigen.h` |
++------------------------------------+---------------------------+-----------------------------------+
+| ``Eigen::SparseMatrix<...>`` | Eigen: sparse matrix | :file:`pybind11/eigen.h` |
++------------------------------------+---------------------------+-----------------------------------+
+
+.. [#] ``std::filesystem::path`` is converted to ``pathlib.Path`` and
+ ``os.PathLike`` is converted to ``std::filesystem::path``, but this requires
+ Python 3.6 (for ``__fspath__`` support).
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/stl.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/stl.rst
new file mode 100644
index 0000000000000000000000000000000000000000..99099c7237c1f29b82ce02dd2f419068baf6a3fa
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/stl.rst
@@ -0,0 +1,251 @@
+STL containers
+##############
+
+Automatic conversion
+====================
+
+When including the additional header file :file:`pybind11/stl.h`, conversions
+between ``std::vector<>``/``std::deque<>``/``std::list<>``/``std::array<>``/``std::valarray<>``,
+``std::set<>``/``std::unordered_set<>``, and
+``std::map<>``/``std::unordered_map<>`` and the Python ``list``, ``set`` and
+``dict`` data structures are automatically enabled. The types ``std::pair<>``
+and ``std::tuple<>`` are already supported out of the box with just the core
+:file:`pybind11/pybind11.h` header.
+
+The major downside of these implicit conversions is that containers must be
+converted (i.e. copied) on every Python->C++ and C++->Python transition, which
+can have implications on the program semantics and performance. Please read the
+next sections for more details and alternative approaches that avoid this.
+
+.. note::
+
+ Arbitrary nesting of any of these types is possible.
+
+.. seealso::
+
+ The file :file:`tests/test_stl.cpp` contains a complete
+ example that demonstrates how to pass STL data types in more detail.
+
+.. _cpp17_container_casters:
+
+C++17 library containers
+========================
+
+The :file:`pybind11/stl.h` header also includes support for ``std::optional<>``
+and ``std::variant<>``. These require a C++17 compiler and standard library.
+In C++14 mode, ``std::experimental::optional<>`` is supported if available.
+
+Various versions of these containers also exist for C++11 (e.g. in Boost).
+pybind11 provides an easy way to specialize the ``type_caster`` for such
+types:
+
+.. code-block:: cpp
+
+ // `boost::optional` as an example -- can be any `std::optional`-like container
+ namespace pybind11 { namespace detail {
+ template
+ struct type_caster> : optional_caster> {};
+ }}
+
+The above should be placed in a header file and included in all translation units
+where automatic conversion is needed. Similarly, a specialization can be provided
+for custom variant types:
+
+.. code-block:: cpp
+
+ // `boost::variant` as an example -- can be any `std::variant`-like container
+ namespace pybind11 { namespace detail {
+ template
+ struct type_caster> : variant_caster> {};
+
+ // Specifies the function used to visit the variant -- `apply_visitor` instead of `visit`
+ template <>
+ struct visit_helper {
+ template
+ static auto call(Args &&...args) -> decltype(boost::apply_visitor(args...)) {
+ return boost::apply_visitor(args...);
+ }
+ };
+ }} // namespace pybind11::detail
+
+The ``visit_helper`` specialization is not required if your ``name::variant`` provides
+a ``name::visit()`` function. For any other function name, the specialization must be
+included to tell pybind11 how to visit the variant.
+
+.. warning::
+
+ When converting a ``variant`` type, pybind11 follows the same rules as when
+ determining which function overload to call (:ref:`overload_resolution`), and
+ so the same caveats hold. In particular, the order in which the ``variant``'s
+ alternatives are listed is important, since pybind11 will try conversions in
+ this order. This means that, for example, when converting ``variant``,
+ the ``bool`` variant will never be selected, as any Python ``bool`` is already
+ an ``int`` and is convertible to a C++ ``int``. Changing the order of alternatives
+ (and using ``variant``, in this example) provides a solution.
+
+.. note::
+
+ pybind11 only supports the modern implementation of ``boost::variant``
+ which makes use of variadic templates. This requires Boost 1.56 or newer.
+ Additionally, on Windows, MSVC 2017 is required because ``boost::variant``
+ falls back to the old non-variadic implementation on MSVC 2015.
+
+.. _opaque:
+
+Making opaque types
+===================
+
+pybind11 heavily relies on a template matching mechanism to convert parameters
+and return values that are constructed from STL data types such as vectors,
+linked lists, hash tables, etc. This even works in a recursive manner, for
+instance to deal with lists of hash maps of pairs of elementary and custom
+types, etc.
+
+However, a fundamental limitation of this approach is that internal conversions
+between Python and C++ types involve a copy operation that prevents
+pass-by-reference semantics. What does this mean?
+
+Suppose we bind the following function
+
+.. code-block:: cpp
+
+ void append_1(std::vector &v) {
+ v.push_back(1);
+ }
+
+and call it from Python, the following happens:
+
+.. code-block:: pycon
+
+ >>> v = [5, 6]
+ >>> append_1(v)
+ >>> print(v)
+ [5, 6]
+
+As you can see, when passing STL data structures by reference, modifications
+are not propagated back the Python side. A similar situation arises when
+exposing STL data structures using the ``def_readwrite`` or ``def_readonly``
+functions:
+
+.. code-block:: cpp
+
+ /* ... definition ... */
+
+ class MyClass {
+ std::vector contents;
+ };
+
+ /* ... binding code ... */
+
+ py::class_(m, "MyClass")
+ .def(py::init<>())
+ .def_readwrite("contents", &MyClass::contents);
+
+In this case, properties can be read and written in their entirety. However, an
+``append`` operation involving such a list type has no effect:
+
+.. code-block:: pycon
+
+ >>> m = MyClass()
+ >>> m.contents = [5, 6]
+ >>> print(m.contents)
+ [5, 6]
+ >>> m.contents.append(7)
+ >>> print(m.contents)
+ [5, 6]
+
+Finally, the involved copy operations can be costly when dealing with very
+large lists. To deal with all of the above situations, pybind11 provides a
+macro named ``PYBIND11_MAKE_OPAQUE(T)`` that disables the template-based
+conversion machinery of types, thus rendering them *opaque*. The contents of
+opaque objects are never inspected or extracted, hence they *can* be passed by
+reference. For instance, to turn ``std::vector`` into an opaque type, add
+the declaration
+
+.. code-block:: cpp
+
+ PYBIND11_MAKE_OPAQUE(std::vector);
+
+before any binding code (e.g. invocations to ``class_::def()``, etc.). This
+macro must be specified at the top level (and outside of any namespaces), since
+it adds a template instantiation of ``type_caster``. If your binding code consists of
+multiple compilation units, it must be present in every file (typically via a
+common header) preceding any usage of ``std::vector``. Opaque types must
+also have a corresponding ``class_`` declaration to associate them with a name
+in Python, and to define a set of available operations, e.g.:
+
+.. code-block:: cpp
+
+ py::class_>(m, "IntVector")
+ .def(py::init<>())
+ .def("clear", &std::vector::clear)
+ .def("pop_back", &std::vector::pop_back)
+ .def("__len__", [](const std::vector &v) { return v.size(); })
+ .def("__iter__", [](std::vector &v) {
+ return py::make_iterator(v.begin(), v.end());
+ }, py::keep_alive<0, 1>()) /* Keep vector alive while iterator is used */
+ // ....
+
+.. seealso::
+
+ The file :file:`tests/test_opaque_types.cpp` contains a complete
+ example that demonstrates how to create and expose opaque types using
+ pybind11 in more detail.
+
+.. _stl_bind:
+
+Binding STL containers
+======================
+
+The ability to expose STL containers as native Python objects is a fairly
+common request, hence pybind11 also provides an optional header file named
+:file:`pybind11/stl_bind.h` that does exactly this. The mapped containers try
+to match the behavior of their native Python counterparts as much as possible.
+
+The following example showcases usage of :file:`pybind11/stl_bind.h`:
+
+.. code-block:: cpp
+
+ // Don't forget this
+ #include
+
+ PYBIND11_MAKE_OPAQUE(std::vector);
+ PYBIND11_MAKE_OPAQUE(std::map);
+
+ // ...
+
+ // later in binding code:
+ py::bind_vector>(m, "VectorInt");
+ py::bind_map>(m, "MapStringDouble");
+
+When binding STL containers pybind11 considers the types of the container's
+elements to decide whether the container should be confined to the local module
+(via the :ref:`module_local` feature). If the container element types are
+anything other than already-bound custom types bound without
+``py::module_local()`` the container binding will have ``py::module_local()``
+applied. This includes converting types such as numeric types, strings, Eigen
+types; and types that have not yet been bound at the time of the stl container
+binding. This module-local binding is designed to avoid potential conflicts
+between module bindings (for example, from two separate modules each attempting
+to bind ``std::vector`` as a python type).
+
+It is possible to override this behavior to force a definition to be either
+module-local or global. To do so, you can pass the attributes
+``py::module_local()`` (to make the binding module-local) or
+``py::module_local(false)`` (to make the binding global) into the
+``py::bind_vector`` or ``py::bind_map`` arguments:
+
+.. code-block:: cpp
+
+ py::bind_vector>(m, "VectorInt", py::module_local(false));
+
+Note, however, that such a global binding would make it impossible to load this
+module at the same time as any other pybind module that also attempts to bind
+the same container type (``std::vector`` in the above example).
+
+See :ref:`module_local` for more details on module-local bindings.
+
+.. seealso::
+
+ The file :file:`tests/test_stl_binders.cpp` shows how to use the
+ convenience STL container wrappers.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/strings.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/strings.rst
new file mode 100644
index 0000000000000000000000000000000000000000..0834de98002eca3e95ddadf4962927b3e94b1060
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/cast/strings.rst
@@ -0,0 +1,305 @@
+Strings, bytes and Unicode conversions
+######################################
+
+.. note::
+
+ This section discusses string handling in terms of Python 3 strings. For
+ Python 2.7, replace all occurrences of ``str`` with ``unicode`` and
+ ``bytes`` with ``str``. Python 2.7 users may find it best to use ``from
+ __future__ import unicode_literals`` to avoid unintentionally using ``str``
+ instead of ``unicode``.
+
+Passing Python strings to C++
+=============================
+
+When a Python ``str`` is passed from Python to a C++ function that accepts
+``std::string`` or ``char *`` as arguments, pybind11 will encode the Python
+string to UTF-8. All Python ``str`` can be encoded in UTF-8, so this operation
+does not fail.
+
+The C++ language is encoding agnostic. It is the responsibility of the
+programmer to track encodings. It's often easiest to simply `use UTF-8
+everywhere `_.
+
+.. code-block:: c++
+
+ m.def("utf8_test",
+ [](const std::string &s) {
+ cout << "utf-8 is icing on the cake.\n";
+ cout << s;
+ }
+ );
+ m.def("utf8_charptr",
+ [](const char *s) {
+ cout << "My favorite food is\n";
+ cout << s;
+ }
+ );
+
+.. code-block:: pycon
+
+ >>> utf8_test("🎂")
+ utf-8 is icing on the cake.
+ 🎂
+
+ >>> utf8_charptr("🍕")
+ My favorite food is
+ 🍕
+
+.. note::
+
+ Some terminal emulators do not support UTF-8 or emoji fonts and may not
+ display the example above correctly.
+
+The results are the same whether the C++ function accepts arguments by value or
+reference, and whether or not ``const`` is used.
+
+Passing bytes to C++
+--------------------
+
+A Python ``bytes`` object will be passed to C++ functions that accept
+``std::string`` or ``char*`` *without* conversion. On Python 3, in order to
+make a function *only* accept ``bytes`` (and not ``str``), declare it as taking
+a ``py::bytes`` argument.
+
+
+Returning C++ strings to Python
+===============================
+
+When a C++ function returns a ``std::string`` or ``char*`` to a Python caller,
+**pybind11 will assume that the string is valid UTF-8** and will decode it to a
+native Python ``str``, using the same API as Python uses to perform
+``bytes.decode('utf-8')``. If this implicit conversion fails, pybind11 will
+raise a ``UnicodeDecodeError``.
+
+.. code-block:: c++
+
+ m.def("std_string_return",
+ []() {
+ return std::string("This string needs to be UTF-8 encoded");
+ }
+ );
+
+.. code-block:: pycon
+
+ >>> isinstance(example.std_string_return(), str)
+ True
+
+
+Because UTF-8 is inclusive of pure ASCII, there is never any issue with
+returning a pure ASCII string to Python. If there is any possibility that the
+string is not pure ASCII, it is necessary to ensure the encoding is valid
+UTF-8.
+
+.. warning::
+
+ Implicit conversion assumes that a returned ``char *`` is null-terminated.
+ If there is no null terminator a buffer overrun will occur.
+
+Explicit conversions
+--------------------
+
+If some C++ code constructs a ``std::string`` that is not a UTF-8 string, one
+can perform a explicit conversion and return a ``py::str`` object. Explicit
+conversion has the same overhead as implicit conversion.
+
+.. code-block:: c++
+
+ // This uses the Python C API to convert Latin-1 to Unicode
+ m.def("str_output",
+ []() {
+ std::string s = "Send your r\xe9sum\xe9 to Alice in HR"; // Latin-1
+ py::str py_s = PyUnicode_DecodeLatin1(s.data(), s.length());
+ return py_s;
+ }
+ );
+
+.. code-block:: pycon
+
+ >>> str_output()
+ 'Send your résumé to Alice in HR'
+
+The `Python C API
+`_ provides
+several built-in codecs.
+
+
+One could also use a third party encoding library such as libiconv to transcode
+to UTF-8.
+
+Return C++ strings without conversion
+-------------------------------------
+
+If the data in a C++ ``std::string`` does not represent text and should be
+returned to Python as ``bytes``, then one can return the data as a
+``py::bytes`` object.
+
+.. code-block:: c++
+
+ m.def("return_bytes",
+ []() {
+ std::string s("\xba\xd0\xba\xd0"); // Not valid UTF-8
+ return py::bytes(s); // Return the data without transcoding
+ }
+ );
+
+.. code-block:: pycon
+
+ >>> example.return_bytes()
+ b'\xba\xd0\xba\xd0'
+
+
+Note the asymmetry: pybind11 will convert ``bytes`` to ``std::string`` without
+encoding, but cannot convert ``std::string`` back to ``bytes`` implicitly.
+
+.. code-block:: c++
+
+ m.def("asymmetry",
+ [](std::string s) { // Accepts str or bytes from Python
+ return s; // Looks harmless, but implicitly converts to str
+ }
+ );
+
+.. code-block:: pycon
+
+ >>> isinstance(example.asymmetry(b"have some bytes"), str)
+ True
+
+ >>> example.asymmetry(b"\xba\xd0\xba\xd0") # invalid utf-8 as bytes
+ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
+
+
+Wide character strings
+======================
+
+When a Python ``str`` is passed to a C++ function expecting ``std::wstring``,
+``wchar_t*``, ``std::u16string`` or ``std::u32string``, the ``str`` will be
+encoded to UTF-16 or UTF-32 depending on how the C++ compiler implements each
+type, in the platform's native endianness. When strings of these types are
+returned, they are assumed to contain valid UTF-16 or UTF-32, and will be
+decoded to Python ``str``.
+
+.. code-block:: c++
+
+ #define UNICODE
+ #include
+
+ m.def("set_window_text",
+ [](HWND hwnd, std::wstring s) {
+ // Call SetWindowText with null-terminated UTF-16 string
+ ::SetWindowText(hwnd, s.c_str());
+ }
+ );
+ m.def("get_window_text",
+ [](HWND hwnd) {
+ const int buffer_size = ::GetWindowTextLength(hwnd) + 1;
+ auto buffer = std::make_unique< wchar_t[] >(buffer_size);
+
+ ::GetWindowText(hwnd, buffer.data(), buffer_size);
+
+ std::wstring text(buffer.get());
+
+ // wstring will be converted to Python str
+ return text;
+ }
+ );
+
+.. warning::
+
+ Wide character strings may not work as described on Python 2.7 or Python
+ 3.3 compiled with ``--enable-unicode=ucs2``.
+
+Strings in multibyte encodings such as Shift-JIS must transcoded to a
+UTF-8/16/32 before being returned to Python.
+
+
+Character literals
+==================
+
+C++ functions that accept character literals as input will receive the first
+character of a Python ``str`` as their input. If the string is longer than one
+Unicode character, trailing characters will be ignored.
+
+When a character literal is returned from C++ (such as a ``char`` or a
+``wchar_t``), it will be converted to a ``str`` that represents the single
+character.
+
+.. code-block:: c++
+
+ m.def("pass_char", [](char c) { return c; });
+ m.def("pass_wchar", [](wchar_t w) { return w; });
+
+.. code-block:: pycon
+
+ >>> example.pass_char("A")
+ 'A'
+
+While C++ will cast integers to character types (``char c = 0x65;``), pybind11
+does not convert Python integers to characters implicitly. The Python function
+``chr()`` can be used to convert integers to characters.
+
+.. code-block:: pycon
+
+ >>> example.pass_char(0x65)
+ TypeError
+
+ >>> example.pass_char(chr(0x65))
+ 'A'
+
+If the desire is to work with an 8-bit integer, use ``int8_t`` or ``uint8_t``
+as the argument type.
+
+Grapheme clusters
+-----------------
+
+A single grapheme may be represented by two or more Unicode characters. For
+example 'é' is usually represented as U+00E9 but can also be expressed as the
+combining character sequence U+0065 U+0301 (that is, the letter 'e' followed by
+a combining acute accent). The combining character will be lost if the
+two-character sequence is passed as an argument, even though it renders as a
+single grapheme.
+
+.. code-block:: pycon
+
+ >>> example.pass_wchar("é")
+ 'é'
+
+ >>> combining_e_acute = "e" + "\u0301"
+
+ >>> combining_e_acute
+ 'é'
+
+ >>> combining_e_acute == "é"
+ False
+
+ >>> example.pass_wchar(combining_e_acute)
+ 'e'
+
+Normalizing combining characters before passing the character literal to C++
+may resolve *some* of these issues:
+
+.. code-block:: pycon
+
+ >>> example.pass_wchar(unicodedata.normalize("NFC", combining_e_acute))
+ 'é'
+
+In some languages (Thai for example), there are `graphemes that cannot be
+expressed as a single Unicode code point
+`_, so there is
+no way to capture them in a C++ character type.
+
+
+C++17 string views
+==================
+
+C++17 string views are automatically supported when compiling in C++17 mode.
+They follow the same rules for encoding and decoding as the corresponding STL
+string type (for example, a ``std::u16string_view`` argument will be passed
+UTF-16-encoded data, and a returned ``std::string_view`` will be decoded as
+UTF-8).
+
+References
+==========
+
+* `The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) `_
+* `C++ - Using STL Strings at Win32 API Boundaries `_
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/classes.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/classes.rst
new file mode 100644
index 0000000000000000000000000000000000000000..64322e94bc7e955cafd5edb4a2ca7cbe596fd08f
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/classes.rst
@@ -0,0 +1,1349 @@
+Classes
+#######
+
+This section presents advanced binding code for classes and it is assumed
+that you are already familiar with the basics from :doc:`/classes`.
+
+.. _overriding_virtuals:
+
+Overriding virtual functions in Python
+======================================
+
+Suppose that a C++ class or interface has a virtual function that we'd like
+to override from within Python (we'll focus on the class ``Animal``; ``Dog`` is
+given as a specific example of how one would do this with traditional C++
+code).
+
+.. code-block:: cpp
+
+ class Animal {
+ public:
+ virtual ~Animal() { }
+ virtual std::string go(int n_times) = 0;
+ };
+
+ class Dog : public Animal {
+ public:
+ std::string go(int n_times) override {
+ std::string result;
+ for (int i=0; igo(3);
+ }
+
+Normally, the binding code for these classes would look as follows:
+
+.. code-block:: cpp
+
+ PYBIND11_MODULE(example, m) {
+ py::class_(m, "Animal")
+ .def("go", &Animal::go);
+
+ py::class_(m, "Dog")
+ .def(py::init<>());
+
+ m.def("call_go", &call_go);
+ }
+
+However, these bindings are impossible to extend: ``Animal`` is not
+constructible, and we clearly require some kind of "trampoline" that
+redirects virtual calls back to Python.
+
+Defining a new type of ``Animal`` from within Python is possible but requires a
+helper class that is defined as follows:
+
+.. code-block:: cpp
+
+ class PyAnimal : public Animal {
+ public:
+ /* Inherit the constructors */
+ using Animal::Animal;
+
+ /* Trampoline (need one for each virtual function) */
+ std::string go(int n_times) override {
+ PYBIND11_OVERRIDE_PURE(
+ std::string, /* Return type */
+ Animal, /* Parent class */
+ go, /* Name of function in C++ (must match Python name) */
+ n_times /* Argument(s) */
+ );
+ }
+ };
+
+The macro :c:macro:`PYBIND11_OVERRIDE_PURE` should be used for pure virtual
+functions, and :c:macro:`PYBIND11_OVERRIDE` should be used for functions which have
+a default implementation. There are also two alternate macros
+:c:macro:`PYBIND11_OVERRIDE_PURE_NAME` and :c:macro:`PYBIND11_OVERRIDE_NAME` which
+take a string-valued name argument between the *Parent class* and *Name of the
+function* slots, which defines the name of function in Python. This is required
+when the C++ and Python versions of the
+function have different names, e.g. ``operator()`` vs ``__call__``.
+
+The binding code also needs a few minor adaptations (highlighted):
+
+.. code-block:: cpp
+ :emphasize-lines: 2,3
+
+ PYBIND11_MODULE(example, m) {
+ py::class_(m, "Animal")
+ .def(py::init<>())
+ .def("go", &Animal::go);
+
+ py::class_(m, "Dog")
+ .def(py::init<>());
+
+ m.def("call_go", &call_go);
+ }
+
+Importantly, pybind11 is made aware of the trampoline helper class by
+specifying it as an extra template argument to :class:`class_`. (This can also
+be combined with other template arguments such as a custom holder type; the
+order of template types does not matter). Following this, we are able to
+define a constructor as usual.
+
+Bindings should be made against the actual class, not the trampoline helper class.
+
+.. code-block:: cpp
+ :emphasize-lines: 3
+
+ py::class_(m, "Animal");
+ .def(py::init<>())
+ .def("go", &PyAnimal::go); /* <--- THIS IS WRONG, use &Animal::go */
+
+Note, however, that the above is sufficient for allowing python classes to
+extend ``Animal``, but not ``Dog``: see :ref:`virtual_and_inheritance` for the
+necessary steps required to providing proper overriding support for inherited
+classes.
+
+The Python session below shows how to override ``Animal::go`` and invoke it via
+a virtual method call.
+
+.. code-block:: pycon
+
+ >>> from example import *
+ >>> d = Dog()
+ >>> call_go(d)
+ u'woof! woof! woof! '
+ >>> class Cat(Animal):
+ ... def go(self, n_times):
+ ... return "meow! " * n_times
+ ...
+ >>> c = Cat()
+ >>> call_go(c)
+ u'meow! meow! meow! '
+
+If you are defining a custom constructor in a derived Python class, you *must*
+ensure that you explicitly call the bound C++ constructor using ``__init__``,
+*regardless* of whether it is a default constructor or not. Otherwise, the
+memory for the C++ portion of the instance will be left uninitialized, which
+will generally leave the C++ instance in an invalid state and cause undefined
+behavior if the C++ instance is subsequently used.
+
+.. versionchanged:: 2.6
+ The default pybind11 metaclass will throw a ``TypeError`` when it detects
+ that ``__init__`` was not called by a derived class.
+
+Here is an example:
+
+.. code-block:: python
+
+ class Dachshund(Dog):
+ def __init__(self, name):
+ Dog.__init__(self) # Without this, a TypeError is raised.
+ self.name = name
+
+ def bark(self):
+ return "yap!"
+
+Note that a direct ``__init__`` constructor *should be called*, and ``super()``
+should not be used. For simple cases of linear inheritance, ``super()``
+may work, but once you begin mixing Python and C++ multiple inheritance,
+things will fall apart due to differences between Python's MRO and C++'s
+mechanisms.
+
+Please take a look at the :ref:`macro_notes` before using this feature.
+
+.. note::
+
+ When the overridden type returns a reference or pointer to a type that
+ pybind11 converts from Python (for example, numeric values, std::string,
+ and other built-in value-converting types), there are some limitations to
+ be aware of:
+
+ - because in these cases there is no C++ variable to reference (the value
+ is stored in the referenced Python variable), pybind11 provides one in
+ the PYBIND11_OVERRIDE macros (when needed) with static storage duration.
+ Note that this means that invoking the overridden method on *any*
+ instance will change the referenced value stored in *all* instances of
+ that type.
+
+ - Attempts to modify a non-const reference will not have the desired
+ effect: it will change only the static cache variable, but this change
+ will not propagate to underlying Python instance, and the change will be
+ replaced the next time the override is invoked.
+
+.. warning::
+
+ The :c:macro:`PYBIND11_OVERRIDE` and accompanying macros used to be called
+ ``PYBIND11_OVERLOAD`` up until pybind11 v2.5.0, and :func:`get_override`
+ used to be called ``get_overload``. This naming was corrected and the older
+ macro and function names may soon be deprecated, in order to reduce
+ confusion with overloaded functions and methods and ``py::overload_cast``
+ (see :ref:`classes`).
+
+.. seealso::
+
+ The file :file:`tests/test_virtual_functions.cpp` contains a complete
+ example that demonstrates how to override virtual functions using pybind11
+ in more detail.
+
+.. _virtual_and_inheritance:
+
+Combining virtual functions and inheritance
+===========================================
+
+When combining virtual methods with inheritance, you need to be sure to provide
+an override for each method for which you want to allow overrides from derived
+python classes. For example, suppose we extend the above ``Animal``/``Dog``
+example as follows:
+
+.. code-block:: cpp
+
+ class Animal {
+ public:
+ virtual std::string go(int n_times) = 0;
+ virtual std::string name() { return "unknown"; }
+ };
+ class Dog : public Animal {
+ public:
+ std::string go(int n_times) override {
+ std::string result;
+ for (int i=0; i class PyAnimal : public AnimalBase {
+ public:
+ using AnimalBase::AnimalBase; // Inherit constructors
+ std::string go(int n_times) override { PYBIND11_OVERRIDE_PURE(std::string, AnimalBase, go, n_times); }
+ std::string name() override { PYBIND11_OVERRIDE(std::string, AnimalBase, name, ); }
+ };
+ template class PyDog : public PyAnimal {
+ public:
+ using PyAnimal::PyAnimal; // Inherit constructors
+ // Override PyAnimal's pure virtual go() with a non-pure one:
+ std::string go(int n_times) override { PYBIND11_OVERRIDE(std::string, DogBase, go, n_times); }
+ std::string bark() override { PYBIND11_OVERRIDE(std::string, DogBase, bark, ); }
+ };
+
+This technique has the advantage of requiring just one trampoline method to be
+declared per virtual method and pure virtual method override. It does,
+however, require the compiler to generate at least as many methods (and
+possibly more, if both pure virtual and overridden pure virtual methods are
+exposed, as above).
+
+The classes are then registered with pybind11 using:
+
+.. code-block:: cpp
+
+ py::class_> animal(m, "Animal");
+ py::class_> dog(m, "Dog");
+ py::class_> husky(m, "Husky");
+ // ... add animal, dog, husky definitions
+
+Note that ``Husky`` did not require a dedicated trampoline template class at
+all, since it neither declares any new virtual methods nor provides any pure
+virtual method implementations.
+
+With either the repeated-virtuals or templated trampoline methods in place, you
+can now create a python class that inherits from ``Dog``:
+
+.. code-block:: python
+
+ class ShihTzu(Dog):
+ def bark(self):
+ return "yip!"
+
+.. seealso::
+
+ See the file :file:`tests/test_virtual_functions.cpp` for complete examples
+ using both the duplication and templated trampoline approaches.
+
+.. _extended_aliases:
+
+Extended trampoline class functionality
+=======================================
+
+.. _extended_class_functionality_forced_trampoline:
+
+Forced trampoline class initialisation
+--------------------------------------
+The trampoline classes described in the previous sections are, by default, only
+initialized when needed. More specifically, they are initialized when a python
+class actually inherits from a registered type (instead of merely creating an
+instance of the registered type), or when a registered constructor is only
+valid for the trampoline class but not the registered class. This is primarily
+for performance reasons: when the trampoline class is not needed for anything
+except virtual method dispatching, not initializing the trampoline class
+improves performance by avoiding needing to do a run-time check to see if the
+inheriting python instance has an overridden method.
+
+Sometimes, however, it is useful to always initialize a trampoline class as an
+intermediate class that does more than just handle virtual method dispatching.
+For example, such a class might perform extra class initialization, extra
+destruction operations, and might define new members and methods to enable a
+more python-like interface to a class.
+
+In order to tell pybind11 that it should *always* initialize the trampoline
+class when creating new instances of a type, the class constructors should be
+declared using ``py::init_alias()`` instead of the usual
+``py::init()``. This forces construction via the trampoline class,
+ensuring member initialization and (eventual) destruction.
+
+.. seealso::
+
+ See the file :file:`tests/test_virtual_functions.cpp` for complete examples
+ showing both normal and forced trampoline instantiation.
+
+Different method signatures
+---------------------------
+The macro's introduced in :ref:`overriding_virtuals` cover most of the standard
+use cases when exposing C++ classes to Python. Sometimes it is hard or unwieldy
+to create a direct one-on-one mapping between the arguments and method return
+type.
+
+An example would be when the C++ signature contains output arguments using
+references (See also :ref:`faq_reference_arguments`). Another way of solving
+this is to use the method body of the trampoline class to do conversions to the
+input and return of the Python method.
+
+The main building block to do so is the :func:`get_override`, this function
+allows retrieving a method implemented in Python from within the trampoline's
+methods. Consider for example a C++ method which has the signature
+``bool myMethod(int32_t& value)``, where the return indicates whether
+something should be done with the ``value``. This can be made convenient on the
+Python side by allowing the Python function to return ``None`` or an ``int``:
+
+.. code-block:: cpp
+
+ bool MyClass::myMethod(int32_t& value)
+ {
+ pybind11::gil_scoped_acquire gil; // Acquire the GIL while in this scope.
+ // Try to look up the overridden method on the Python side.
+ pybind11::function override = pybind11::get_override(this, "myMethod");
+ if (override) { // method is found
+ auto obj = override(value); // Call the Python function.
+ if (py::isinstance(obj)) { // check if it returned a Python integer type
+ value = obj.cast(); // Cast it and assign it to the value.
+ return true; // Return true; value should be used.
+ } else {
+ return false; // Python returned none, return false.
+ }
+ }
+ return false; // Alternatively return MyClass::myMethod(value);
+ }
+
+
+.. _custom_constructors:
+
+Custom constructors
+===================
+
+The syntax for binding constructors was previously introduced, but it only
+works when a constructor of the appropriate arguments actually exists on the
+C++ side. To extend this to more general cases, pybind11 makes it possible
+to bind factory functions as constructors. For example, suppose you have a
+class like this:
+
+.. code-block:: cpp
+
+ class Example {
+ private:
+ Example(int); // private constructor
+ public:
+ // Factory function:
+ static Example create(int a) { return Example(a); }
+ };
+
+ py::class_(m, "Example")
+ .def(py::init(&Example::create));
+
+While it is possible to create a straightforward binding of the static
+``create`` method, it may sometimes be preferable to expose it as a constructor
+on the Python side. This can be accomplished by calling ``.def(py::init(...))``
+with the function reference returning the new instance passed as an argument.
+It is also possible to use this approach to bind a function returning a new
+instance by raw pointer or by the holder (e.g. ``std::unique_ptr``).
+
+The following example shows the different approaches:
+
+.. code-block:: cpp
+
+ class Example {
+ private:
+ Example(int); // private constructor
+ public:
+ // Factory function - returned by value:
+ static Example create(int a) { return Example(a); }
+
+ // These constructors are publicly callable:
+ Example(double);
+ Example(int, int);
+ Example(std::string);
+ };
+
+ py::class_(m, "Example")
+ // Bind the factory function as a constructor:
+ .def(py::init(&Example::create))
+ // Bind a lambda function returning a pointer wrapped in a holder:
+ .def(py::init([](std::string arg) {
+ return std::unique_ptr(new Example(arg));
+ }))
+ // Return a raw pointer:
+ .def(py::init([](int a, int b) { return new Example(a, b); }))
+ // You can mix the above with regular C++ constructor bindings as well:
+ .def(py::init())
+ ;
+
+When the constructor is invoked from Python, pybind11 will call the factory
+function and store the resulting C++ instance in the Python instance.
+
+When combining factory functions constructors with :ref:`virtual function
+trampolines ` there are two approaches. The first is to
+add a constructor to the alias class that takes a base value by
+rvalue-reference. If such a constructor is available, it will be used to
+construct an alias instance from the value returned by the factory function.
+The second option is to provide two factory functions to ``py::init()``: the
+first will be invoked when no alias class is required (i.e. when the class is
+being used but not inherited from in Python), and the second will be invoked
+when an alias is required.
+
+You can also specify a single factory function that always returns an alias
+instance: this will result in behaviour similar to ``py::init_alias<...>()``,
+as described in the :ref:`extended trampoline class documentation
+`.
+
+The following example shows the different factory approaches for a class with
+an alias:
+
+.. code-block:: cpp
+
+ #include
+ class Example {
+ public:
+ // ...
+ virtual ~Example() = default;
+ };
+ class PyExample : public Example {
+ public:
+ using Example::Example;
+ PyExample(Example &&base) : Example(std::move(base)) {}
+ };
+ py::class_(m, "Example")
+ // Returns an Example pointer. If a PyExample is needed, the Example
+ // instance will be moved via the extra constructor in PyExample, above.
+ .def(py::init([]() { return new Example(); }))
+ // Two callbacks:
+ .def(py::init([]() { return new Example(); } /* no alias needed */,
+ []() { return new PyExample(); } /* alias needed */))
+ // *Always* returns an alias instance (like py::init_alias<>())
+ .def(py::init([]() { return new PyExample(); }))
+ ;
+
+Brace initialization
+--------------------
+
+``pybind11::init<>`` internally uses C++11 brace initialization to call the
+constructor of the target class. This means that it can be used to bind
+*implicit* constructors as well:
+
+.. code-block:: cpp
+
+ struct Aggregate {
+ int a;
+ std::string b;
+ };
+
+ py::class_(m, "Aggregate")
+ .def(py::init());
+
+.. note::
+
+ Note that brace initialization preferentially invokes constructor overloads
+ taking a ``std::initializer_list``. In the rare event that this causes an
+ issue, you can work around it by using ``py::init(...)`` with a lambda
+ function that constructs the new object as desired.
+
+.. _classes_with_non_public_destructors:
+
+Non-public destructors
+======================
+
+If a class has a private or protected destructor (as might e.g. be the case in
+a singleton pattern), a compile error will occur when creating bindings via
+pybind11. The underlying issue is that the ``std::unique_ptr`` holder type that
+is responsible for managing the lifetime of instances will reference the
+destructor even if no deallocations ever take place. In order to expose classes
+with private or protected destructors, it is possible to override the holder
+type via a holder type argument to ``class_``. Pybind11 provides a helper class
+``py::nodelete`` that disables any destructor invocations. In this case, it is
+crucial that instances are deallocated on the C++ side to avoid memory leaks.
+
+.. code-block:: cpp
+
+ /* ... definition ... */
+
+ class MyClass {
+ private:
+ ~MyClass() { }
+ };
+
+ /* ... binding code ... */
+
+ py::class_>(m, "MyClass")
+ .def(py::init<>())
+
+.. _destructors_that_call_python:
+
+Destructors that call Python
+============================
+
+If a Python function is invoked from a C++ destructor, an exception may be thrown
+of type :class:`error_already_set`. If this error is thrown out of a class destructor,
+``std::terminate()`` will be called, terminating the process. Class destructors
+must catch all exceptions of type :class:`error_already_set` to discard the Python
+exception using :func:`error_already_set::discard_as_unraisable`.
+
+Every Python function should be treated as *possibly throwing*. When a Python generator
+stops yielding items, Python will throw a ``StopIteration`` exception, which can pass
+though C++ destructors if the generator's stack frame holds the last reference to C++
+objects.
+
+For more information, see :ref:`the documentation on exceptions `.
+
+.. code-block:: cpp
+
+ class MyClass {
+ public:
+ ~MyClass() {
+ try {
+ py::print("Even printing is dangerous in a destructor");
+ py::exec("raise ValueError('This is an unraisable exception')");
+ } catch (py::error_already_set &e) {
+ // error_context should be information about where/why the occurred,
+ // e.g. use __func__ to get the name of the current function
+ e.discard_as_unraisable(__func__);
+ }
+ }
+ };
+
+.. note::
+
+ pybind11 does not support C++ destructors marked ``noexcept(false)``.
+
+.. versionadded:: 2.6
+
+.. _implicit_conversions:
+
+Implicit conversions
+====================
+
+Suppose that instances of two types ``A`` and ``B`` are used in a project, and
+that an ``A`` can easily be converted into an instance of type ``B`` (examples of this
+could be a fixed and an arbitrary precision number type).
+
+.. code-block:: cpp
+
+ py::class_(m, "A")
+ /// ... members ...
+
+ py::class_(m, "B")
+ .def(py::init())
+ /// ... members ...
+
+ m.def("func",
+ [](const B &) { /* .... */ }
+ );
+
+To invoke the function ``func`` using a variable ``a`` containing an ``A``
+instance, we'd have to write ``func(B(a))`` in Python. On the other hand, C++
+will automatically apply an implicit type conversion, which makes it possible
+to directly write ``func(a)``.
+
+In this situation (i.e. where ``B`` has a constructor that converts from
+``A``), the following statement enables similar implicit conversions on the
+Python side:
+
+.. code-block:: cpp
+
+ py::implicitly_convertible();
+
+.. note::
+
+ Implicit conversions from ``A`` to ``B`` only work when ``B`` is a custom
+ data type that is exposed to Python via pybind11.
+
+ To prevent runaway recursion, implicit conversions are non-reentrant: an
+ implicit conversion invoked as part of another implicit conversion of the
+ same type (i.e. from ``A`` to ``B``) will fail.
+
+.. _static_properties:
+
+Static properties
+=================
+
+The section on :ref:`properties` discussed the creation of instance properties
+that are implemented in terms of C++ getters and setters.
+
+Static properties can also be created in a similar way to expose getters and
+setters of static class attributes. Note that the implicit ``self`` argument
+also exists in this case and is used to pass the Python ``type`` subclass
+instance. This parameter will often not be needed by the C++ side, and the
+following example illustrates how to instantiate a lambda getter function
+that ignores it:
+
+.. code-block:: cpp
+
+ py::class_(m, "Foo")
+ .def_property_readonly_static("foo", [](py::object /* self */) { return Foo(); });
+
+Operator overloading
+====================
+
+Suppose that we're given the following ``Vector2`` class with a vector addition
+and scalar multiplication operation, all implemented using overloaded operators
+in C++.
+
+.. code-block:: cpp
+
+ class Vector2 {
+ public:
+ Vector2(float x, float y) : x(x), y(y) { }
+
+ Vector2 operator+(const Vector2 &v) const { return Vector2(x + v.x, y + v.y); }
+ Vector2 operator*(float value) const { return Vector2(x * value, y * value); }
+ Vector2& operator+=(const Vector2 &v) { x += v.x; y += v.y; return *this; }
+ Vector2& operator*=(float v) { x *= v; y *= v; return *this; }
+
+ friend Vector2 operator*(float f, const Vector2 &v) {
+ return Vector2(f * v.x, f * v.y);
+ }
+
+ std::string toString() const {
+ return "[" + std::to_string(x) + ", " + std::to_string(y) + "]";
+ }
+ private:
+ float x, y;
+ };
+
+The following snippet shows how the above operators can be conveniently exposed
+to Python.
+
+.. code-block:: cpp
+
+ #include
+
+ PYBIND11_MODULE(example, m) {
+ py::class_(m, "Vector2")
+ .def(py::init())
+ .def(py::self + py::self)
+ .def(py::self += py::self)
+ .def(py::self *= float())
+ .def(float() * py::self)
+ .def(py::self * float())
+ .def(-py::self)
+ .def("__repr__", &Vector2::toString);
+ }
+
+Note that a line like
+
+.. code-block:: cpp
+
+ .def(py::self * float())
+
+is really just short hand notation for
+
+.. code-block:: cpp
+
+ .def("__mul__", [](const Vector2 &a, float b) {
+ return a * b;
+ }, py::is_operator())
+
+This can be useful for exposing additional operators that don't exist on the
+C++ side, or to perform other types of customization. The ``py::is_operator``
+flag marker is needed to inform pybind11 that this is an operator, which
+returns ``NotImplemented`` when invoked with incompatible arguments rather than
+throwing a type error.
+
+.. note::
+
+ To use the more convenient ``py::self`` notation, the additional
+ header file :file:`pybind11/operators.h` must be included.
+
+.. seealso::
+
+ The file :file:`tests/test_operator_overloading.cpp` contains a
+ complete example that demonstrates how to work with overloaded operators in
+ more detail.
+
+.. _pickling:
+
+Pickling support
+================
+
+Python's ``pickle`` module provides a powerful facility to serialize and
+de-serialize a Python object graph into a binary data stream. To pickle and
+unpickle C++ classes using pybind11, a ``py::pickle()`` definition must be
+provided. Suppose the class in question has the following signature:
+
+.. code-block:: cpp
+
+ class Pickleable {
+ public:
+ Pickleable(const std::string &value) : m_value(value) { }
+ const std::string &value() const { return m_value; }
+
+ void setExtra(int extra) { m_extra = extra; }
+ int extra() const { return m_extra; }
+ private:
+ std::string m_value;
+ int m_extra = 0;
+ };
+
+Pickling support in Python is enabled by defining the ``__setstate__`` and
+``__getstate__`` methods [#f3]_. For pybind11 classes, use ``py::pickle()``
+to bind these two functions:
+
+.. code-block:: cpp
+
+ py::class_(m, "Pickleable")
+ .def(py::init())
+ .def("value", &Pickleable::value)
+ .def("extra", &Pickleable::extra)
+ .def("setExtra", &Pickleable::setExtra)
+ .def(py::pickle(
+ [](const Pickleable &p) { // __getstate__
+ /* Return a tuple that fully encodes the state of the object */
+ return py::make_tuple(p.value(), p.extra());
+ },
+ [](py::tuple t) { // __setstate__
+ if (t.size() != 2)
+ throw std::runtime_error("Invalid state!");
+
+ /* Create a new C++ instance */
+ Pickleable p(t[0].cast());
+
+ /* Assign any additional state */
+ p.setExtra(t[1].cast());
+
+ return p;
+ }
+ ));
+
+The ``__setstate__`` part of the ``py::pickle()`` definition follows the same
+rules as the single-argument version of ``py::init()``. The return type can be
+a value, pointer or holder type. See :ref:`custom_constructors` for details.
+
+An instance can now be pickled as follows:
+
+.. code-block:: python
+
+ try:
+ import cPickle as pickle # Use cPickle on Python 2.7
+ except ImportError:
+ import pickle
+
+ p = Pickleable("test_value")
+ p.setExtra(15)
+ data = pickle.dumps(p, 2)
+
+
+.. note::
+ Note that only the cPickle module is supported on Python 2.7.
+
+ The second argument to ``dumps`` is also crucial: it selects the pickle
+ protocol version 2, since the older version 1 is not supported. Newer
+ versions are also fine—for instance, specify ``-1`` to always use the
+ latest available version. Beware: failure to follow these instructions
+ will cause important pybind11 memory allocation routines to be skipped
+ during unpickling, which will likely lead to memory corruption and/or
+ segmentation faults.
+
+.. seealso::
+
+ The file :file:`tests/test_pickling.cpp` contains a complete example
+ that demonstrates how to pickle and unpickle types using pybind11 in more
+ detail.
+
+.. [#f3] http://docs.python.org/3/library/pickle.html#pickling-class-instances
+
+Deepcopy support
+================
+
+Python normally uses references in assignments. Sometimes a real copy is needed
+to prevent changing all copies. The ``copy`` module [#f5]_ provides these
+capabilities.
+
+On Python 3, a class with pickle support is automatically also (deep)copy
+compatible. However, performance can be improved by adding custom
+``__copy__`` and ``__deepcopy__`` methods. With Python 2.7, these custom methods
+are mandatory for (deep)copy compatibility, because pybind11 only supports
+cPickle.
+
+For simple classes (deep)copy can be enabled by using the copy constructor,
+which should look as follows:
+
+.. code-block:: cpp
+
+ py::class_(m, "Copyable")
+ .def("__copy__", [](const Copyable &self) {
+ return Copyable(self);
+ })
+ .def("__deepcopy__", [](const Copyable &self, py::dict) {
+ return Copyable(self);
+ }, "memo"_a);
+
+.. note::
+
+ Dynamic attributes will not be copied in this example.
+
+.. [#f5] https://docs.python.org/3/library/copy.html
+
+Multiple Inheritance
+====================
+
+pybind11 can create bindings for types that derive from multiple base types
+(aka. *multiple inheritance*). To do so, specify all bases in the template
+arguments of the ``class_`` declaration:
+
+.. code-block:: cpp
+
+ py::class_(m, "MyType")
+ ...
+
+The base types can be specified in arbitrary order, and they can even be
+interspersed with alias types and holder types (discussed earlier in this
+document)---pybind11 will automatically find out which is which. The only
+requirement is that the first template argument is the type to be declared.
+
+It is also permitted to inherit multiply from exported C++ classes in Python,
+as well as inheriting from multiple Python and/or pybind11-exported classes.
+
+There is one caveat regarding the implementation of this feature:
+
+When only one base type is specified for a C++ type that actually has multiple
+bases, pybind11 will assume that it does not participate in multiple
+inheritance, which can lead to undefined behavior. In such cases, add the tag
+``multiple_inheritance`` to the class constructor:
+
+.. code-block:: cpp
+
+ py::class_(m, "MyType", py::multiple_inheritance());
+
+The tag is redundant and does not need to be specified when multiple base types
+are listed.
+
+.. _module_local:
+
+Module-local class bindings
+===========================
+
+When creating a binding for a class, pybind11 by default makes that binding
+"global" across modules. What this means is that a type defined in one module
+can be returned from any module resulting in the same Python type. For
+example, this allows the following:
+
+.. code-block:: cpp
+
+ // In the module1.cpp binding code for module1:
+ py::class_(m, "Pet")
+ .def(py::init())
+ .def_readonly("name", &Pet::name);
+
+.. code-block:: cpp
+
+ // In the module2.cpp binding code for module2:
+ m.def("create_pet", [](std::string name) { return new Pet(name); });
+
+.. code-block:: pycon
+
+ >>> from module1 import Pet
+ >>> from module2 import create_pet
+ >>> pet1 = Pet("Kitty")
+ >>> pet2 = create_pet("Doggy")
+ >>> pet2.name()
+ 'Doggy'
+
+When writing binding code for a library, this is usually desirable: this
+allows, for example, splitting up a complex library into multiple Python
+modules.
+
+In some cases, however, this can cause conflicts. For example, suppose two
+unrelated modules make use of an external C++ library and each provide custom
+bindings for one of that library's classes. This will result in an error when
+a Python program attempts to import both modules (directly or indirectly)
+because of conflicting definitions on the external type:
+
+.. code-block:: cpp
+
+ // dogs.cpp
+
+ // Binding for external library class:
+ py::class(m, "Pet")
+ .def("name", &pets::Pet::name);
+
+ // Binding for local extension class:
+ py::class(m, "Dog")
+ .def(py::init());
+
+.. code-block:: cpp
+
+ // cats.cpp, in a completely separate project from the above dogs.cpp.
+
+ // Binding for external library class:
+ py::class(m, "Pet")
+ .def("get_name", &pets::Pet::name);
+
+ // Binding for local extending class:
+ py::class(m, "Cat")
+ .def(py::init());
+
+.. code-block:: pycon
+
+ >>> import cats
+ >>> import dogs
+ Traceback (most recent call last):
+ File "", line 1, in
+ ImportError: generic_type: type "Pet" is already registered!
+
+To get around this, you can tell pybind11 to keep the external class binding
+localized to the module by passing the ``py::module_local()`` attribute into
+the ``py::class_`` constructor:
+
+.. code-block:: cpp
+
+ // Pet binding in dogs.cpp:
+ py::class(m, "Pet", py::module_local())
+ .def("name", &pets::Pet::name);
+
+.. code-block:: cpp
+
+ // Pet binding in cats.cpp:
+ py::class(m, "Pet", py::module_local())
+ .def("get_name", &pets::Pet::name);
+
+This makes the Python-side ``dogs.Pet`` and ``cats.Pet`` into distinct classes,
+avoiding the conflict and allowing both modules to be loaded. C++ code in the
+``dogs`` module that casts or returns a ``Pet`` instance will result in a
+``dogs.Pet`` Python instance, while C++ code in the ``cats`` module will result
+in a ``cats.Pet`` Python instance.
+
+This does come with two caveats, however: First, external modules cannot return
+or cast a ``Pet`` instance to Python (unless they also provide their own local
+bindings). Second, from the Python point of view they are two distinct classes.
+
+Note that the locality only applies in the C++ -> Python direction. When
+passing such a ``py::module_local`` type into a C++ function, the module-local
+classes are still considered. This means that if the following function is
+added to any module (including but not limited to the ``cats`` and ``dogs``
+modules above) it will be callable with either a ``dogs.Pet`` or ``cats.Pet``
+argument:
+
+.. code-block:: cpp
+
+ m.def("pet_name", [](const pets::Pet &pet) { return pet.name(); });
+
+For example, suppose the above function is added to each of ``cats.cpp``,
+``dogs.cpp`` and ``frogs.cpp`` (where ``frogs.cpp`` is some other module that
+does *not* bind ``Pets`` at all).
+
+.. code-block:: pycon
+
+ >>> import cats, dogs, frogs # No error because of the added py::module_local()
+ >>> mycat, mydog = cats.Cat("Fluffy"), dogs.Dog("Rover")
+ >>> (cats.pet_name(mycat), dogs.pet_name(mydog))
+ ('Fluffy', 'Rover')
+ >>> (cats.pet_name(mydog), dogs.pet_name(mycat), frogs.pet_name(mycat))
+ ('Rover', 'Fluffy', 'Fluffy')
+
+It is possible to use ``py::module_local()`` registrations in one module even
+if another module registers the same type globally: within the module with the
+module-local definition, all C++ instances will be cast to the associated bound
+Python type. In other modules any such values are converted to the global
+Python type created elsewhere.
+
+.. note::
+
+ STL bindings (as provided via the optional :file:`pybind11/stl_bind.h`
+ header) apply ``py::module_local`` by default when the bound type might
+ conflict with other modules; see :ref:`stl_bind` for details.
+
+.. note::
+
+ The localization of the bound types is actually tied to the shared object
+ or binary generated by the compiler/linker. For typical modules created
+ with ``PYBIND11_MODULE()``, this distinction is not significant. It is
+ possible, however, when :ref:`embedding` to embed multiple modules in the
+ same binary (see :ref:`embedding_modules`). In such a case, the
+ localization will apply across all embedded modules within the same binary.
+
+.. seealso::
+
+ The file :file:`tests/test_local_bindings.cpp` contains additional examples
+ that demonstrate how ``py::module_local()`` works.
+
+Binding protected member functions
+==================================
+
+It's normally not possible to expose ``protected`` member functions to Python:
+
+.. code-block:: cpp
+
+ class A {
+ protected:
+ int foo() const { return 42; }
+ };
+
+ py::class_(m, "A")
+ .def("foo", &A::foo); // error: 'foo' is a protected member of 'A'
+
+On one hand, this is good because non-``public`` members aren't meant to be
+accessed from the outside. But we may want to make use of ``protected``
+functions in derived Python classes.
+
+The following pattern makes this possible:
+
+.. code-block:: cpp
+
+ class A {
+ protected:
+ int foo() const { return 42; }
+ };
+
+ class Publicist : public A { // helper type for exposing protected functions
+ public:
+ using A::foo; // inherited with different access modifier
+ };
+
+ py::class_(m, "A") // bind the primary class
+ .def("foo", &Publicist::foo); // expose protected methods via the publicist
+
+This works because ``&Publicist::foo`` is exactly the same function as
+``&A::foo`` (same signature and address), just with a different access
+modifier. The only purpose of the ``Publicist`` helper class is to make
+the function name ``public``.
+
+If the intent is to expose ``protected`` ``virtual`` functions which can be
+overridden in Python, the publicist pattern can be combined with the previously
+described trampoline:
+
+.. code-block:: cpp
+
+ class A {
+ public:
+ virtual ~A() = default;
+
+ protected:
+ virtual int foo() const { return 42; }
+ };
+
+ class Trampoline : public A {
+ public:
+ int foo() const override { PYBIND11_OVERRIDE(int, A, foo, ); }
+ };
+
+ class Publicist : public A {
+ public:
+ using A::foo;
+ };
+
+ py::class_(m, "A") // <-- `Trampoline` here
+ .def("foo", &Publicist::foo); // <-- `Publicist` here, not `Trampoline`!
+
+.. note::
+
+ MSVC 2015 has a compiler bug (fixed in version 2017) which
+ requires a more explicit function binding in the form of
+ ``.def("foo", static_cast(&Publicist::foo));``
+ where ``int (A::*)() const`` is the type of ``A::foo``.
+
+Binding final classes
+=====================
+
+Some classes may not be appropriate to inherit from. In C++11, classes can
+use the ``final`` specifier to ensure that a class cannot be inherited from.
+The ``py::is_final`` attribute can be used to ensure that Python classes
+cannot inherit from a specified type. The underlying C++ type does not need
+to be declared final.
+
+.. code-block:: cpp
+
+ class IsFinal final {};
+
+ py::class_(m, "IsFinal", py::is_final());
+
+When you try to inherit from such a class in Python, you will now get this
+error:
+
+.. code-block:: pycon
+
+ >>> class PyFinalChild(IsFinal):
+ ... pass
+ ...
+ TypeError: type 'IsFinal' is not an acceptable base type
+
+.. note:: This attribute is currently ignored on PyPy
+
+.. versionadded:: 2.6
+
+Binding classes with template parameters
+========================================
+
+pybind11 can also wrap classes that have template parameters. Consider these classes:
+
+.. code-block:: cpp
+
+ struct Cat {};
+ struct Dog {};
+
+ template
+ struct Cage {
+ Cage(PetType& pet);
+ PetType& get();
+ };
+
+C++ templates may only be instantiated at compile time, so pybind11 can only
+wrap instantiated templated classes. You cannot wrap a non-instantiated template:
+
+.. code-block:: cpp
+
+ // BROKEN (this will not compile)
+ py::class_(m, "Cage");
+ .def("get", &Cage::get);
+
+You must explicitly specify each template/type combination that you want to
+wrap separately.
+
+.. code-block:: cpp
+
+ // ok
+ py::class_>(m, "CatCage")
+ .def("get", &Cage::get);
+
+ // ok
+ py::class_>(m, "DogCage")
+ .def("get", &Cage::get);
+
+If your class methods have template parameters you can wrap those as well,
+but once again each instantiation must be explicitly specified:
+
+.. code-block:: cpp
+
+ typename
+ struct MyClass {
+ template
+ T fn(V v);
+ };
+
+ py::class>(m, "MyClassT")
+ .def("fn", &MyClass::fn);
+
+Custom automatic downcasters
+============================
+
+As explained in :ref:`inheritance`, pybind11 comes with built-in
+understanding of the dynamic type of polymorphic objects in C++; that
+is, returning a Pet to Python produces a Python object that knows it's
+wrapping a Dog, if Pet has virtual methods and pybind11 knows about
+Dog and this Pet is in fact a Dog. Sometimes, you might want to
+provide this automatic downcasting behavior when creating bindings for
+a class hierarchy that does not use standard C++ polymorphism, such as
+LLVM [#f4]_. As long as there's some way to determine at runtime
+whether a downcast is safe, you can proceed by specializing the
+``pybind11::polymorphic_type_hook`` template:
+
+.. code-block:: cpp
+
+ enum class PetKind { Cat, Dog, Zebra };
+ struct Pet { // Not polymorphic: has no virtual methods
+ const PetKind kind;
+ int age = 0;
+ protected:
+ Pet(PetKind _kind) : kind(_kind) {}
+ };
+ struct Dog : Pet {
+ Dog() : Pet(PetKind::Dog) {}
+ std::string sound = "woof!";
+ std::string bark() const { return sound; }
+ };
+
+ namespace pybind11 {
+ template<> struct polymorphic_type_hook {
+ static const void *get(const Pet *src, const std::type_info*& type) {
+ // note that src may be nullptr
+ if (src && src->kind == PetKind::Dog) {
+ type = &typeid(Dog);
+ return static_cast(src);
+ }
+ return src;
+ }
+ };
+ } // namespace pybind11
+
+When pybind11 wants to convert a C++ pointer of type ``Base*`` to a
+Python object, it calls ``polymorphic_type_hook::get()`` to
+determine if a downcast is possible. The ``get()`` function should use
+whatever runtime information is available to determine if its ``src``
+parameter is in fact an instance of some class ``Derived`` that
+inherits from ``Base``. If it finds such a ``Derived``, it sets ``type
+= &typeid(Derived)`` and returns a pointer to the ``Derived`` object
+that contains ``src``. Otherwise, it just returns ``src``, leaving
+``type`` at its default value of nullptr. If you set ``type`` to a
+type that pybind11 doesn't know about, no downcasting will occur, and
+the original ``src`` pointer will be used with its static type
+``Base*``.
+
+It is critical that the returned pointer and ``type`` argument of
+``get()`` agree with each other: if ``type`` is set to something
+non-null, the returned pointer must point to the start of an object
+whose type is ``type``. If the hierarchy being exposed uses only
+single inheritance, a simple ``return src;`` will achieve this just
+fine, but in the general case, you must cast ``src`` to the
+appropriate derived-class pointer (e.g. using
+``static_cast(src)``) before allowing it to be returned as a
+``void*``.
+
+.. [#f4] https://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html
+
+.. note::
+
+ pybind11's standard support for downcasting objects whose types
+ have virtual methods is implemented using
+ ``polymorphic_type_hook`` too, using the standard C++ ability to
+ determine the most-derived type of a polymorphic object using
+ ``typeid()`` and to cast a base pointer to that most-derived type
+ (even if you don't know what it is) using ``dynamic_cast``.
+
+.. seealso::
+
+ The file :file:`tests/test_tagbased_polymorphic.cpp` contains a
+ more complete example, including a demonstration of how to provide
+ automatic downcasting for an entire class hierarchy without
+ writing one get() function for each class.
+
+Accessing the type object
+=========================
+
+You can get the type object from a C++ class that has already been registered using:
+
+.. code-block:: cpp
+
+ py::type T_py = py::type::of();
+
+You can directly use ``py::type::of(ob)`` to get the type object from any python
+object, just like ``type(ob)`` in Python.
+
+.. note::
+
+ Other types, like ``py::type::of()``, do not work, see :ref:`type-conversions`.
+
+.. versionadded:: 2.6
+
+Custom type setup
+=================
+
+For advanced use cases, such as enabling garbage collection support, you may
+wish to directly manipulate the ``PyHeapTypeObject`` corresponding to a
+``py::class_`` definition.
+
+You can do that using ``py::custom_type_setup``:
+
+.. code-block:: cpp
+
+ struct OwnsPythonObjects {
+ py::object value = py::none();
+ };
+ py::class_ cls(
+ m, "OwnsPythonObjects", py::custom_type_setup([](PyHeapTypeObject *heap_type) {
+ auto *type = &heap_type->ht_type;
+ type->tp_flags |= Py_TPFLAGS_HAVE_GC;
+ type->tp_traverse = [](PyObject *self_base, visitproc visit, void *arg) {
+ auto &self = py::cast(py::handle(self_base));
+ Py_VISIT(self.value.ptr());
+ return 0;
+ };
+ type->tp_clear = [](PyObject *self_base) {
+ auto &self = py::cast(py::handle(self_base));
+ self.value = py::none();
+ return 0;
+ };
+ }));
+ cls.def(py::init<>());
+ cls.def_readwrite("value", &OwnsPythonObjects::value);
+
+.. versionadded:: 2.8
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/embedding.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/embedding.rst
new file mode 100644
index 0000000000000000000000000000000000000000..78a03e7dc02550f010e1ef795eacb8139a3be6ba
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/embedding.rst
@@ -0,0 +1,262 @@
+.. _embedding:
+
+Embedding the interpreter
+#########################
+
+While pybind11 is mainly focused on extending Python using C++, it's also
+possible to do the reverse: embed the Python interpreter into a C++ program.
+All of the other documentation pages still apply here, so refer to them for
+general pybind11 usage. This section will cover a few extra things required
+for embedding.
+
+Getting started
+===============
+
+A basic executable with an embedded interpreter can be created with just a few
+lines of CMake and the ``pybind11::embed`` target, as shown below. For more
+information, see :doc:`/compiling`.
+
+.. code-block:: cmake
+
+ cmake_minimum_required(VERSION 3.4)
+ project(example)
+
+ find_package(pybind11 REQUIRED) # or `add_subdirectory(pybind11)`
+
+ add_executable(example main.cpp)
+ target_link_libraries(example PRIVATE pybind11::embed)
+
+The essential structure of the ``main.cpp`` file looks like this:
+
+.. code-block:: cpp
+
+ #include // everything needed for embedding
+ namespace py = pybind11;
+
+ int main() {
+ py::scoped_interpreter guard{}; // start the interpreter and keep it alive
+
+ py::print("Hello, World!"); // use the Python API
+ }
+
+The interpreter must be initialized before using any Python API, which includes
+all the functions and classes in pybind11. The RAII guard class ``scoped_interpreter``
+takes care of the interpreter lifetime. After the guard is destroyed, the interpreter
+shuts down and clears its memory. No Python functions can be called after this.
+
+Executing Python code
+=====================
+
+There are a few different ways to run Python code. One option is to use ``eval``,
+``exec`` or ``eval_file``, as explained in :ref:`eval`. Here is a quick example in
+the context of an executable with an embedded interpreter:
+
+.. code-block:: cpp
+
+ #include
+ namespace py = pybind11;
+
+ int main() {
+ py::scoped_interpreter guard{};
+
+ py::exec(R"(
+ kwargs = dict(name="World", number=42)
+ message = "Hello, {name}! The answer is {number}".format(**kwargs)
+ print(message)
+ )");
+ }
+
+Alternatively, similar results can be achieved using pybind11's API (see
+:doc:`/advanced/pycpp/index` for more details).
+
+.. code-block:: cpp
+
+ #include
+ namespace py = pybind11;
+ using namespace py::literals;
+
+ int main() {
+ py::scoped_interpreter guard{};
+
+ auto kwargs = py::dict("name"_a="World", "number"_a=42);
+ auto message = "Hello, {name}! The answer is {number}"_s.format(**kwargs);
+ py::print(message);
+ }
+
+The two approaches can also be combined:
+
+.. code-block:: cpp
+
+ #include
+ #include
+
+ namespace py = pybind11;
+ using namespace py::literals;
+
+ int main() {
+ py::scoped_interpreter guard{};
+
+ auto locals = py::dict("name"_a="World", "number"_a=42);
+ py::exec(R"(
+ message = "Hello, {name}! The answer is {number}".format(**locals())
+ )", py::globals(), locals);
+
+ auto message = locals["message"].cast();
+ std::cout << message;
+ }
+
+Importing modules
+=================
+
+Python modules can be imported using ``module_::import()``:
+
+.. code-block:: cpp
+
+ py::module_ sys = py::module_::import("sys");
+ py::print(sys.attr("path"));
+
+For convenience, the current working directory is included in ``sys.path`` when
+embedding the interpreter. This makes it easy to import local Python files:
+
+.. code-block:: python
+
+ """calc.py located in the working directory"""
+
+
+ def add(i, j):
+ return i + j
+
+
+.. code-block:: cpp
+
+ py::module_ calc = py::module_::import("calc");
+ py::object result = calc.attr("add")(1, 2);
+ int n = result.cast();
+ assert(n == 3);
+
+Modules can be reloaded using ``module_::reload()`` if the source is modified e.g.
+by an external process. This can be useful in scenarios where the application
+imports a user defined data processing script which needs to be updated after
+changes by the user. Note that this function does not reload modules recursively.
+
+.. _embedding_modules:
+
+Adding embedded modules
+=======================
+
+Embedded binary modules can be added using the ``PYBIND11_EMBEDDED_MODULE`` macro.
+Note that the definition must be placed at global scope. They can be imported
+like any other module.
+
+.. code-block:: cpp
+
+ #include
+ namespace py = pybind11;
+
+ PYBIND11_EMBEDDED_MODULE(fast_calc, m) {
+ // `m` is a `py::module_` which is used to bind functions and classes
+ m.def("add", [](int i, int j) {
+ return i + j;
+ });
+ }
+
+ int main() {
+ py::scoped_interpreter guard{};
+
+ auto fast_calc = py::module_::import("fast_calc");
+ auto result = fast_calc.attr("add")(1, 2).cast();
+ assert(result == 3);
+ }
+
+Unlike extension modules where only a single binary module can be created, on
+the embedded side an unlimited number of modules can be added using multiple
+``PYBIND11_EMBEDDED_MODULE`` definitions (as long as they have unique names).
+
+These modules are added to Python's list of builtins, so they can also be
+imported in pure Python files loaded by the interpreter. Everything interacts
+naturally:
+
+.. code-block:: python
+
+ """py_module.py located in the working directory"""
+ import cpp_module
+
+ a = cpp_module.a
+ b = a + 1
+
+
+.. code-block:: cpp
+
+ #include
+ namespace py = pybind11;
+
+ PYBIND11_EMBEDDED_MODULE(cpp_module, m) {
+ m.attr("a") = 1;
+ }
+
+ int main() {
+ py::scoped_interpreter guard{};
+
+ auto py_module = py::module_::import("py_module");
+
+ auto locals = py::dict("fmt"_a="{} + {} = {}", **py_module.attr("__dict__"));
+ assert(locals["a"].cast() == 1);
+ assert(locals["b"].cast() == 2);
+
+ py::exec(R"(
+ c = a + b
+ message = fmt.format(a, b, c)
+ )", py::globals(), locals);
+
+ assert(locals["c"].cast() == 3);
+ assert(locals["message"].cast() == "1 + 2 = 3");
+ }
+
+
+Interpreter lifetime
+====================
+
+The Python interpreter shuts down when ``scoped_interpreter`` is destroyed. After
+this, creating a new instance will restart the interpreter. Alternatively, the
+``initialize_interpreter`` / ``finalize_interpreter`` pair of functions can be used
+to directly set the state at any time.
+
+Modules created with pybind11 can be safely re-initialized after the interpreter
+has been restarted. However, this may not apply to third-party extension modules.
+The issue is that Python itself cannot completely unload extension modules and
+there are several caveats with regard to interpreter restarting. In short, not
+all memory may be freed, either due to Python reference cycles or user-created
+global data. All the details can be found in the CPython documentation.
+
+.. warning::
+
+ Creating two concurrent ``scoped_interpreter`` guards is a fatal error. So is
+ calling ``initialize_interpreter`` for a second time after the interpreter
+ has already been initialized.
+
+ Do not use the raw CPython API functions ``Py_Initialize`` and
+ ``Py_Finalize`` as these do not properly handle the lifetime of
+ pybind11's internal data.
+
+
+Sub-interpreter support
+=======================
+
+Creating multiple copies of ``scoped_interpreter`` is not possible because it
+represents the main Python interpreter. Sub-interpreters are something different
+and they do permit the existence of multiple interpreters. This is an advanced
+feature of the CPython API and should be handled with care. pybind11 does not
+currently offer a C++ interface for sub-interpreters, so refer to the CPython
+documentation for all the details regarding this feature.
+
+We'll just mention a couple of caveats the sub-interpreters support in pybind11:
+
+ 1. Sub-interpreters will not receive independent copies of embedded modules.
+ Instead, these are shared and modifications in one interpreter may be
+ reflected in another.
+
+ 2. Managing multiple threads, multiple interpreters and the GIL can be
+ challenging and there are several caveats here, even within the pure
+ CPython API (please refer to the Python docs for details). As for
+ pybind11, keep in mind that ``gil_scoped_release`` and ``gil_scoped_acquire``
+ do not take sub-interpreters into account.
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/exceptions.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/exceptions.rst
new file mode 100644
index 0000000000000000000000000000000000000000..7fac473263902f2df3d6c3f4098b70b81a033028
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/exceptions.rst
@@ -0,0 +1,398 @@
+Exceptions
+##########
+
+Built-in C++ to Python exception translation
+============================================
+
+When Python calls C++ code through pybind11, pybind11 provides a C++ exception handler
+that will trap C++ exceptions, translate them to the corresponding Python exception,
+and raise them so that Python code can handle them.
+
+pybind11 defines translations for ``std::exception`` and its standard
+subclasses, and several special exception classes that translate to specific
+Python exceptions. Note that these are not actually Python exceptions, so they
+cannot be examined using the Python C API. Instead, they are pure C++ objects
+that pybind11 will translate the corresponding Python exception when they arrive
+at its exception handler.
+
+.. tabularcolumns:: |p{0.5\textwidth}|p{0.45\textwidth}|
+
++--------------------------------------+--------------------------------------+
+| Exception thrown by C++ | Translated to Python exception type |
++======================================+======================================+
+| :class:`std::exception` | ``RuntimeError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::bad_alloc` | ``MemoryError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::domain_error` | ``ValueError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::invalid_argument` | ``ValueError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::length_error` | ``ValueError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::out_of_range` | ``IndexError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::range_error` | ``ValueError`` |
++--------------------------------------+--------------------------------------+
+| :class:`std::overflow_error` | ``OverflowError`` |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::stop_iteration` | ``StopIteration`` (used to implement |
+| | custom iterators) |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::index_error` | ``IndexError`` (used to indicate out |
+| | of bounds access in ``__getitem__``, |
+| | ``__setitem__``, etc.) |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::key_error` | ``KeyError`` (used to indicate out |
+| | of bounds access in ``__getitem__``, |
+| | ``__setitem__`` in dict-like |
+| | objects, etc.) |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::value_error` | ``ValueError`` (used to indicate |
+| | wrong value passed in |
+| | ``container.remove(...)``) |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::type_error` | ``TypeError`` |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::buffer_error` | ``BufferError`` |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::import_error` | ``ImportError`` |
++--------------------------------------+--------------------------------------+
+| :class:`pybind11::attribute_error` | ``AttributeError`` |
++--------------------------------------+--------------------------------------+
+| Any other exception | ``RuntimeError`` |
++--------------------------------------+--------------------------------------+
+
+Exception translation is not bidirectional. That is, *catching* the C++
+exceptions defined above will not trap exceptions that originate from
+Python. For that, catch :class:`pybind11::error_already_set`. See :ref:`below
+` for further details.
+
+There is also a special exception :class:`cast_error` that is thrown by
+:func:`handle::call` when the input arguments cannot be converted to Python
+objects.
+
+Registering custom translators
+==============================
+
+If the default exception conversion policy described above is insufficient,
+pybind11 also provides support for registering custom exception translators.
+Similar to pybind11 classes, exception translators can be local to the module
+they are defined in or global to the entire python session. To register a simple
+exception conversion that translates a C++ exception into a new Python exception
+using the C++ exception's ``what()`` method, a helper function is available:
+
+.. code-block:: cpp
+
+ py::register_exception(module, "PyExp");
+
+This call creates a Python exception class with the name ``PyExp`` in the given
+module and automatically converts any encountered exceptions of type ``CppExp``
+into Python exceptions of type ``PyExp``.
+
+A matching function is available for registering a local exception translator:
+
+.. code-block:: cpp
+
+ py::register_local_exception(module, "PyExp");
+
+
+It is possible to specify base class for the exception using the third
+parameter, a ``handle``:
+
+.. code-block:: cpp
+
+ py::register_exception(module, "PyExp", PyExc_RuntimeError);
+ py::register_local_exception(module, "PyExp", PyExc_RuntimeError);
+
+Then ``PyExp`` can be caught both as ``PyExp`` and ``RuntimeError``.
+
+The class objects of the built-in Python exceptions are listed in the Python
+documentation on `Standard Exceptions `_.
+The default base class is ``PyExc_Exception``.
+
+When more advanced exception translation is needed, the functions
+``py::register_exception_translator(translator)`` and
+``py::register_local_exception_translator(translator)`` can be used to register
+functions that can translate arbitrary exception types (and which may include
+additional logic to do so). The functions takes a stateless callable (e.g. a
+function pointer or a lambda function without captured variables) with the call
+signature ``void(std::exception_ptr)``.
+
+When a C++ exception is thrown, the registered exception translators are tried
+in reverse order of registration (i.e. the last registered translator gets the
+first shot at handling the exception). All local translators will be tried
+before a global translator is tried.
+
+Inside the translator, ``std::rethrow_exception`` should be used within
+a try block to re-throw the exception. One or more catch clauses to catch
+the appropriate exceptions should then be used with each clause using
+``PyErr_SetString`` to set a Python exception or ``ex(string)`` to set
+the python exception to a custom exception type (see below).
+
+To declare a custom Python exception type, declare a ``py::exception`` variable
+and use this in the associated exception translator (note: it is often useful
+to make this a static declaration when using it inside a lambda expression
+without requiring capturing).
+
+The following example demonstrates this for a hypothetical exception classes
+``MyCustomException`` and ``OtherException``: the first is translated to a
+custom python exception ``MyCustomError``, while the second is translated to a
+standard python RuntimeError:
+
+.. code-block:: cpp
+
+ static py::exception exc(m, "MyCustomError");
+ py::register_exception_translator([](std::exception_ptr p) {
+ try {
+ if (p) std::rethrow_exception(p);
+ } catch (const MyCustomException &e) {
+ exc(e.what());
+ } catch (const OtherException &e) {
+ PyErr_SetString(PyExc_RuntimeError, e.what());
+ }
+ });
+
+Multiple exceptions can be handled by a single translator, as shown in the
+example above. If the exception is not caught by the current translator, the
+previously registered one gets a chance.
+
+If none of the registered exception translators is able to handle the
+exception, it is handled by the default converter as described in the previous
+section.
+
+.. seealso::
+
+ The file :file:`tests/test_exceptions.cpp` contains examples
+ of various custom exception translators and custom exception types.
+
+.. note::
+
+ Call either ``PyErr_SetString`` or a custom exception's call
+ operator (``exc(string)``) for every exception caught in a custom exception
+ translator. Failure to do so will cause Python to crash with ``SystemError:
+ error return without exception set``.
+
+ Exceptions that you do not plan to handle should simply not be caught, or
+ may be explicitly (re-)thrown to delegate it to the other,
+ previously-declared existing exception translators.
+
+ Note that ``libc++`` and ``libstdc++`` `behave differently `_
+ with ``-fvisibility=hidden``. Therefore exceptions that are used across ABI boundaries need to be explicitly exported, as exercised in ``tests/test_exceptions.h``.
+ See also: "Problems with C++ exceptions" under `GCC Wiki `_.
+
+
+Local vs Global Exception Translators
+=====================================
+
+When a global exception translator is registered, it will be applied across all
+modules in the reverse order of registration. This can create behavior where the
+order of module import influences how exceptions are translated.
+
+If module1 has the following translator:
+
+.. code-block:: cpp
+
+ py::register_exception_translator([](std::exception_ptr p) {
+ try {
+ if (p) std::rethrow_exception(p);
+ } catch (const std::invalid_argument &e) {
+ PyErr_SetString("module1 handled this")
+ }
+ }
+
+and module2 has the following similar translator:
+
+.. code-block:: cpp
+
+ py::register_exception_translator([](std::exception_ptr p) {
+ try {
+ if (p) std::rethrow_exception(p);
+ } catch (const std::invalid_argument &e) {
+ PyErr_SetString("module2 handled this")
+ }
+ }
+
+then which translator handles the invalid_argument will be determined by the
+order that module1 and module2 are imported. Since exception translators are
+applied in the reverse order of registration, which ever module was imported
+last will "win" and that translator will be applied.
+
+If there are multiple pybind11 modules that share exception types (either
+standard built-in or custom) loaded into a single python instance and
+consistent error handling behavior is needed, then local translators should be
+used.
+
+Changing the previous example to use ``register_local_exception_translator``
+would mean that when invalid_argument is thrown in the module2 code, the
+module2 translator will always handle it, while in module1, the module1
+translator will do the same.
+
+.. _handling_python_exceptions_cpp:
+
+Handling exceptions from Python in C++
+======================================
+
+When C++ calls Python functions, such as in a callback function or when
+manipulating Python objects, and Python raises an ``Exception``, pybind11
+converts the Python exception into a C++ exception of type
+:class:`pybind11::error_already_set` whose payload contains a C++ string textual
+summary and the actual Python exception. ``error_already_set`` is used to
+propagate Python exception back to Python (or possibly, handle them in C++).
+
+.. tabularcolumns:: |p{0.5\textwidth}|p{0.45\textwidth}|
+
++--------------------------------------+--------------------------------------+
+| Exception raised in Python | Thrown as C++ exception type |
++======================================+======================================+
+| Any Python ``Exception`` | :class:`pybind11::error_already_set` |
++--------------------------------------+--------------------------------------+
+
+For example:
+
+.. code-block:: cpp
+
+ try {
+ // open("missing.txt", "r")
+ auto file = py::module_::import("io").attr("open")("missing.txt", "r");
+ auto text = file.attr("read")();
+ file.attr("close")();
+ } catch (py::error_already_set &e) {
+ if (e.matches(PyExc_FileNotFoundError)) {
+ py::print("missing.txt not found");
+ } else if (e.matches(PyExc_PermissionError)) {
+ py::print("missing.txt found but not accessible");
+ } else {
+ throw;
+ }
+ }
+
+Note that C++ to Python exception translation does not apply here, since that is
+a method for translating C++ exceptions to Python, not vice versa. The error raised
+from Python is always ``error_already_set``.
+
+This example illustrates this behavior:
+
+.. code-block:: cpp
+
+ try {
+ py::eval("raise ValueError('The Ring')");
+ } catch (py::value_error &boromir) {
+ // Boromir never gets the ring
+ assert(false);
+ } catch (py::error_already_set &frodo) {
+ // Frodo gets the ring
+ py::print("I will take the ring");
+ }
+
+ try {
+ // py::value_error is a request for pybind11 to raise a Python exception
+ throw py::value_error("The ball");
+ } catch (py::error_already_set &cat) {
+ // cat won't catch the ball since
+ // py::value_error is not a Python exception
+ assert(false);
+ } catch (py::value_error &dog) {
+ // dog will catch the ball
+ py::print("Run Spot run");
+ throw; // Throw it again (pybind11 will raise ValueError)
+ }
+
+Handling errors from the Python C API
+=====================================
+
+Where possible, use :ref:`pybind11 wrappers ` instead of calling
+the Python C API directly. When calling the Python C API directly, in
+addition to manually managing reference counts, one must follow the pybind11
+error protocol, which is outlined here.
+
+After calling the Python C API, if Python returns an error,
+``throw py::error_already_set();``, which allows pybind11 to deal with the
+exception and pass it back to the Python interpreter. This includes calls to
+the error setting functions such as ``PyErr_SetString``.
+
+.. code-block:: cpp
+
+ PyErr_SetString(PyExc_TypeError, "C API type error demo");
+ throw py::error_already_set();
+
+ // But it would be easier to simply...
+ throw py::type_error("pybind11 wrapper type error");
+
+Alternately, to ignore the error, call `PyErr_Clear
+`_.
+
+Any Python error must be thrown or cleared, or Python/pybind11 will be left in
+an invalid state.
+
+Chaining exceptions ('raise from')
+==================================
+
+In Python 3.3 a mechanism for indicating that exceptions were caused by other
+exceptions was introduced:
+
+.. code-block:: py
+
+ try:
+ print(1 / 0)
+ except Exception as exc:
+ raise RuntimeError("could not divide by zero") from exc
+
+To do a similar thing in pybind11, you can use the ``py::raise_from`` function. It
+sets the current python error indicator, so to continue propagating the exception
+you should ``throw py::error_already_set()`` (Python 3 only).
+
+.. code-block:: cpp
+
+ try {
+ py::eval("print(1 / 0"));
+ } catch (py::error_already_set &e) {
+ py::raise_from(e, PyExc_RuntimeError, "could not divide by zero");
+ throw py::error_already_set();
+ }
+
+.. versionadded:: 2.8
+
+.. _unraisable_exceptions:
+
+Handling unraisable exceptions
+==============================
+
+If a Python function invoked from a C++ destructor or any function marked
+``noexcept(true)`` (collectively, "noexcept functions") throws an exception, there
+is no way to propagate the exception, as such functions may not throw.
+Should they throw or fail to catch any exceptions in their call graph,
+the C++ runtime calls ``std::terminate()`` to abort immediately.
+
+Similarly, Python exceptions raised in a class's ``__del__`` method do not
+propagate, but are logged by Python as an unraisable error. In Python 3.8+, a
+`system hook is triggered
+`_
+and an auditing event is logged.
+
+Any noexcept function should have a try-catch block that traps
+class:`error_already_set` (or any other exception that can occur). Note that
+pybind11 wrappers around Python exceptions such as
+:class:`pybind11::value_error` are *not* Python exceptions; they are C++
+exceptions that pybind11 catches and converts to Python exceptions. Noexcept
+functions cannot propagate these exceptions either. A useful approach is to
+convert them to Python exceptions and then ``discard_as_unraisable`` as shown
+below.
+
+.. code-block:: cpp
+
+ void nonthrowing_func() noexcept(true) {
+ try {
+ // ...
+ } catch (py::error_already_set &eas) {
+ // Discard the Python error using Python APIs, using the C++ magic
+ // variable __func__. Python already knows the type and value and of the
+ // exception object.
+ eas.discard_as_unraisable(__func__);
+ } catch (const std::exception &e) {
+ // Log and discard C++ exceptions.
+ third_party::log(e);
+ }
+ }
+
+.. versionadded:: 2.6
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/functions.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/functions.rst
new file mode 100644
index 0000000000000000000000000000000000000000..5c53fd941c28a751fad92b8e3561b78260601f33
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/functions.rst
@@ -0,0 +1,615 @@
+Functions
+#########
+
+Before proceeding with this section, make sure that you are already familiar
+with the basics of binding functions and classes, as explained in :doc:`/basics`
+and :doc:`/classes`. The following guide is applicable to both free and member
+functions, i.e. *methods* in Python.
+
+.. _return_value_policies:
+
+Return value policies
+=====================
+
+Python and C++ use fundamentally different ways of managing the memory and
+lifetime of objects managed by them. This can lead to issues when creating
+bindings for functions that return a non-trivial type. Just by looking at the
+type information, it is not clear whether Python should take charge of the
+returned value and eventually free its resources, or if this is handled on the
+C++ side. For this reason, pybind11 provides a several *return value policy*
+annotations that can be passed to the :func:`module_::def` and
+:func:`class_::def` functions. The default policy is
+:enum:`return_value_policy::automatic`.
+
+Return value policies are tricky, and it's very important to get them right.
+Just to illustrate what can go wrong, consider the following simple example:
+
+.. code-block:: cpp
+
+ /* Function declaration */
+ Data *get_data() { return _data; /* (pointer to a static data structure) */ }
+ ...
+
+ /* Binding code */
+ m.def("get_data", &get_data); // <-- KABOOM, will cause crash when called from Python
+
+What's going on here? When ``get_data()`` is called from Python, the return
+value (a native C++ type) must be wrapped to turn it into a usable Python type.
+In this case, the default return value policy (:enum:`return_value_policy::automatic`)
+causes pybind11 to assume ownership of the static ``_data`` instance.
+
+When Python's garbage collector eventually deletes the Python
+wrapper, pybind11 will also attempt to delete the C++ instance (via ``operator
+delete()``) due to the implied ownership. At this point, the entire application
+will come crashing down, though errors could also be more subtle and involve
+silent data corruption.
+
+In the above example, the policy :enum:`return_value_policy::reference` should have
+been specified so that the global data instance is only *referenced* without any
+implied transfer of ownership, i.e.:
+
+.. code-block:: cpp
+
+ m.def("get_data", &get_data, py::return_value_policy::reference);
+
+On the other hand, this is not the right policy for many other situations,
+where ignoring ownership could lead to resource leaks.
+As a developer using pybind11, it's important to be familiar with the different
+return value policies, including which situation calls for which one of them.
+The following table provides an overview of available policies:
+
+.. tabularcolumns:: |p{0.5\textwidth}|p{0.45\textwidth}|
+
++--------------------------------------------------+----------------------------------------------------------------------------+
+| Return value policy | Description |
++==================================================+============================================================================+
+| :enum:`return_value_policy::take_ownership` | Reference an existing object (i.e. do not create a new copy) and take |
+| | ownership. Python will call the destructor and delete operator when the |
+| | object's reference count reaches zero. Undefined behavior ensues when the |
+| | C++ side does the same, or when the data was not dynamically allocated. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::copy` | Create a new copy of the returned object, which will be owned by Python. |
+| | This policy is comparably safe because the lifetimes of the two instances |
+| | are decoupled. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::move` | Use ``std::move`` to move the return value contents into a new instance |
+| | that will be owned by Python. This policy is comparably safe because the |
+| | lifetimes of the two instances (move source and destination) are decoupled.|
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::reference` | Reference an existing object, but do not take ownership. The C++ side is |
+| | responsible for managing the object's lifetime and deallocating it when |
+| | it is no longer used. Warning: undefined behavior will ensue when the C++ |
+| | side deletes an object that is still referenced and used by Python. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::reference_internal` | Indicates that the lifetime of the return value is tied to the lifetime |
+| | of a parent object, namely the implicit ``this``, or ``self`` argument of |
+| | the called method or property. Internally, this policy works just like |
+| | :enum:`return_value_policy::reference` but additionally applies a |
+| | ``keep_alive<0, 1>`` *call policy* (described in the next section) that |
+| | prevents the parent object from being garbage collected as long as the |
+| | return value is referenced by Python. This is the default policy for |
+| | property getters created via ``def_property``, ``def_readwrite``, etc. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::automatic` | This policy falls back to the policy |
+| | :enum:`return_value_policy::take_ownership` when the return value is a |
+| | pointer. Otherwise, it uses :enum:`return_value_policy::move` or |
+| | :enum:`return_value_policy::copy` for rvalue and lvalue references, |
+| | respectively. See above for a description of what all of these different |
+| | policies do. This is the default policy for ``py::class_``-wrapped types. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+| :enum:`return_value_policy::automatic_reference` | As above, but use policy :enum:`return_value_policy::reference` when the |
+| | return value is a pointer. This is the default conversion policy for |
+| | function arguments when calling Python functions manually from C++ code |
+| | (i.e. via ``handle::operator()``) and the casters in ``pybind11/stl.h``. |
+| | You probably won't need to use this explicitly. |
++--------------------------------------------------+----------------------------------------------------------------------------+
+
+Return value policies can also be applied to properties:
+
+.. code-block:: cpp
+
+ class_(m, "MyClass")
+ .def_property("data", &MyClass::getData, &MyClass::setData,
+ py::return_value_policy::copy);
+
+Technically, the code above applies the policy to both the getter and the
+setter function, however, the setter doesn't really care about *return*
+value policies which makes this a convenient terse syntax. Alternatively,
+targeted arguments can be passed through the :class:`cpp_function` constructor:
+
+.. code-block:: cpp
+
+ class_(m, "MyClass")
+ .def_property("data",
+ py::cpp_function(&MyClass::getData, py::return_value_policy::copy),
+ py::cpp_function(&MyClass::setData)
+ );
+
+.. warning::
+
+ Code with invalid return value policies might access uninitialized memory or
+ free data structures multiple times, which can lead to hard-to-debug
+ non-determinism and segmentation faults, hence it is worth spending the
+ time to understand all the different options in the table above.
+
+.. note::
+
+ One important aspect of the above policies is that they only apply to
+ instances which pybind11 has *not* seen before, in which case the policy
+ clarifies essential questions about the return value's lifetime and
+ ownership. When pybind11 knows the instance already (as identified by its
+ type and address in memory), it will return the existing Python object
+ wrapper rather than creating a new copy.
+
+.. note::
+
+ The next section on :ref:`call_policies` discusses *call policies* that can be
+ specified *in addition* to a return value policy from the list above. Call
+ policies indicate reference relationships that can involve both return values
+ and parameters of functions.
+
+.. note::
+
+ As an alternative to elaborate call policies and lifetime management logic,
+ consider using smart pointers (see the section on :ref:`smart_pointers` for
+ details). Smart pointers can tell whether an object is still referenced from
+ C++ or Python, which generally eliminates the kinds of inconsistencies that
+ can lead to crashes or undefined behavior. For functions returning smart
+ pointers, it is not necessary to specify a return value policy.
+
+.. _call_policies:
+
+Additional call policies
+========================
+
+In addition to the above return value policies, further *call policies* can be
+specified to indicate dependencies between parameters or ensure a certain state
+for the function call.
+
+Keep alive
+----------
+
+In general, this policy is required when the C++ object is any kind of container
+and another object is being added to the container. ``keep_alive``
+indicates that the argument with index ``Patient`` should be kept alive at least
+until the argument with index ``Nurse`` is freed by the garbage collector. Argument
+indices start at one, while zero refers to the return value. For methods, index
+``1`` refers to the implicit ``this`` pointer, while regular arguments begin at
+index ``2``. Arbitrarily many call policies can be specified. When a ``Nurse``
+with value ``None`` is detected at runtime, the call policy does nothing.
+
+When the nurse is not a pybind11-registered type, the implementation internally
+relies on the ability to create a *weak reference* to the nurse object. When
+the nurse object is not a pybind11-registered type and does not support weak
+references, an exception will be thrown.
+
+If you use an incorrect argument index, you will get a ``RuntimeError`` saying
+``Could not activate keep_alive!``. You should review the indices you're using.
+
+Consider the following example: here, the binding code for a list append
+operation ties the lifetime of the newly added element to the underlying
+container:
+
+.. code-block:: cpp
+
+ py::class_(m, "List")
+ .def("append", &List::append, py::keep_alive<1, 2>());
+
+For consistency, the argument indexing is identical for constructors. Index
+``1`` still refers to the implicit ``this`` pointer, i.e. the object which is
+being constructed. Index ``0`` refers to the return type which is presumed to
+be ``void`` when a constructor is viewed like a function. The following example
+ties the lifetime of the constructor element to the constructed object:
+
+.. code-block:: cpp
+
+ py::class_(m, "Nurse")
+ .def(py::init(), py::keep_alive<1, 2>());
+
+.. note::
+
+ ``keep_alive`` is analogous to the ``with_custodian_and_ward`` (if Nurse,
+ Patient != 0) and ``with_custodian_and_ward_postcall`` (if Nurse/Patient ==
+ 0) policies from Boost.Python.
+
+Call guard
+----------
+
+The ``call_guard`` policy allows any scope guard type ``T`` to be placed
+around the function call. For example, this definition:
+
+.. code-block:: cpp
+
+ m.def("foo", foo, py::call_guard());
+
+is equivalent to the following pseudocode:
+
+.. code-block:: cpp
+
+ m.def("foo", [](args...) {
+ T scope_guard;
+ return foo(args...); // forwarded arguments
+ });
+
+The only requirement is that ``T`` is default-constructible, but otherwise any
+scope guard will work. This is very useful in combination with ``gil_scoped_release``.
+See :ref:`gil`.
+
+Multiple guards can also be specified as ``py::call_guard``. The
+constructor order is left to right and destruction happens in reverse.
+
+.. seealso::
+
+ The file :file:`tests/test_call_policies.cpp` contains a complete example
+ that demonstrates using `keep_alive` and `call_guard` in more detail.
+
+.. _python_objects_as_args:
+
+Python objects as arguments
+===========================
+
+pybind11 exposes all major Python types using thin C++ wrapper classes. These
+wrapper classes can also be used as parameters of functions in bindings, which
+makes it possible to directly work with native Python types on the C++ side.
+For instance, the following statement iterates over a Python ``dict``:
+
+.. code-block:: cpp
+
+ void print_dict(const py::dict& dict) {
+ /* Easily interact with Python types */
+ for (auto item : dict)
+ std::cout << "key=" << std::string(py::str(item.first)) << ", "
+ << "value=" << std::string(py::str(item.second)) << std::endl;
+ }
+
+It can be exported:
+
+.. code-block:: cpp
+
+ m.def("print_dict", &print_dict);
+
+And used in Python as usual:
+
+.. code-block:: pycon
+
+ >>> print_dict({"foo": 123, "bar": "hello"})
+ key=foo, value=123
+ key=bar, value=hello
+
+For more information on using Python objects in C++, see :doc:`/advanced/pycpp/index`.
+
+Accepting \*args and \*\*kwargs
+===============================
+
+Python provides a useful mechanism to define functions that accept arbitrary
+numbers of arguments and keyword arguments:
+
+.. code-block:: python
+
+ def generic(*args, **kwargs):
+ ... # do something with args and kwargs
+
+Such functions can also be created using pybind11:
+
+.. code-block:: cpp
+
+ void generic(py::args args, const py::kwargs& kwargs) {
+ /// .. do something with args
+ if (kwargs)
+ /// .. do something with kwargs
+ }
+
+ /// Binding code
+ m.def("generic", &generic);
+
+The class ``py::args`` derives from ``py::tuple`` and ``py::kwargs`` derives
+from ``py::dict``.
+
+You may also use just one or the other, and may combine these with other
+arguments. Note, however, that ``py::kwargs`` must always be the last argument
+of the function, and ``py::args`` implies that any further arguments are
+keyword-only (see :ref:`keyword_only_arguments`).
+
+Please refer to the other examples for details on how to iterate over these,
+and on how to cast their entries into C++ objects. A demonstration is also
+available in ``tests/test_kwargs_and_defaults.cpp``.
+
+.. note::
+
+ When combining \*args or \*\*kwargs with :ref:`keyword_args` you should
+ *not* include ``py::arg`` tags for the ``py::args`` and ``py::kwargs``
+ arguments.
+
+Default arguments revisited
+===========================
+
+The section on :ref:`default_args` previously discussed basic usage of default
+arguments using pybind11. One noteworthy aspect of their implementation is that
+default arguments are converted to Python objects right at declaration time.
+Consider the following example:
+
+.. code-block:: cpp
+
+ py::class_("MyClass")
+ .def("myFunction", py::arg("arg") = SomeType(123));
+
+In this case, pybind11 must already be set up to deal with values of the type
+``SomeType`` (via a prior instantiation of ``py::class_``), or an
+exception will be thrown.
+
+Another aspect worth highlighting is that the "preview" of the default argument
+in the function signature is generated using the object's ``__repr__`` method.
+If not available, the signature may not be very helpful, e.g.:
+
+.. code-block:: pycon
+
+ FUNCTIONS
+ ...
+ | myFunction(...)
+ | Signature : (MyClass, arg : SomeType = ) -> NoneType
+ ...
+
+The first way of addressing this is by defining ``SomeType.__repr__``.
+Alternatively, it is possible to specify the human-readable preview of the
+default argument manually using the ``arg_v`` notation:
+
+.. code-block:: cpp
+
+ py::class_("MyClass")
+ .def("myFunction", py::arg_v("arg", SomeType(123), "SomeType(123)"));
+
+Sometimes it may be necessary to pass a null pointer value as a default
+argument. In this case, remember to cast it to the underlying type in question,
+like so:
+
+.. code-block:: cpp
+
+ py::class_("MyClass")
+ .def("myFunction", py::arg("arg") = static_cast(nullptr));
+
+.. _keyword_only_arguments:
+
+Keyword-only arguments
+======================
+
+Python 3 introduced keyword-only arguments by specifying an unnamed ``*``
+argument in a function definition:
+
+.. code-block:: python
+
+ def f(a, *, b): # a can be positional or via keyword; b must be via keyword
+ pass
+
+
+ f(a=1, b=2) # good
+ f(b=2, a=1) # good
+ f(1, b=2) # good
+ f(1, 2) # TypeError: f() takes 1 positional argument but 2 were given
+
+Pybind11 provides a ``py::kw_only`` object that allows you to implement
+the same behaviour by specifying the object between positional and keyword-only
+argument annotations when registering the function:
+
+.. code-block:: cpp
+
+ m.def("f", [](int a, int b) { /* ... */ },
+ py::arg("a"), py::kw_only(), py::arg("b"));
+
+Note that you currently cannot combine this with a ``py::args`` argument. This
+feature does *not* require Python 3 to work.
+
+.. versionadded:: 2.6
+
+As of pybind11 2.9, a ``py::args`` argument implies that any following arguments
+are keyword-only, as if ``py::kw_only()`` had been specified in the same
+relative location of the argument list as the ``py::args`` argument. The
+``py::kw_only()`` may be included to be explicit about this, but is not
+required. (Prior to 2.9 ``py::args`` may only occur at the end of the argument
+list, or immediately before a ``py::kwargs`` argument at the end).
+
+.. versionadded:: 2.9
+
+Positional-only arguments
+=========================
+
+Python 3.8 introduced a new positional-only argument syntax, using ``/`` in the
+function definition (note that this has been a convention for CPython
+positional arguments, such as in ``pow()``, since Python 2). You can
+do the same thing in any version of Python using ``py::pos_only()``:
+
+.. code-block:: cpp
+
+ m.def("f", [](int a, int b) { /* ... */ },
+ py::arg("a"), py::pos_only(), py::arg("b"));
+
+You now cannot give argument ``a`` by keyword. This can be combined with
+keyword-only arguments, as well.
+
+.. versionadded:: 2.6
+
+.. _nonconverting_arguments:
+
+Non-converting arguments
+========================
+
+Certain argument types may support conversion from one type to another. Some
+examples of conversions are:
+
+* :ref:`implicit_conversions` declared using ``py::implicitly_convertible()``
+* Calling a method accepting a double with an integer argument
+* Calling a ``std::complex`` argument with a non-complex python type
+ (for example, with a float). (Requires the optional ``pybind11/complex.h``
+ header).
+* Calling a function taking an Eigen matrix reference with a numpy array of the
+ wrong type or of an incompatible data layout. (Requires the optional
+ ``pybind11/eigen.h`` header).
+
+This behaviour is sometimes undesirable: the binding code may prefer to raise
+an error rather than convert the argument. This behaviour can be obtained
+through ``py::arg`` by calling the ``.noconvert()`` method of the ``py::arg``
+object, such as:
+
+.. code-block:: cpp
+
+ m.def("floats_only", [](double f) { return 0.5 * f; }, py::arg("f").noconvert());
+ m.def("floats_preferred", [](double f) { return 0.5 * f; }, py::arg("f"));
+
+Attempting the call the second function (the one without ``.noconvert()``) with
+an integer will succeed, but attempting to call the ``.noconvert()`` version
+will fail with a ``TypeError``:
+
+.. code-block:: pycon
+
+ >>> floats_preferred(4)
+ 2.0
+ >>> floats_only(4)
+ Traceback (most recent call last):
+ File "", line 1, in
+ TypeError: floats_only(): incompatible function arguments. The following argument types are supported:
+ 1. (f: float) -> float
+
+ Invoked with: 4
+
+You may, of course, combine this with the :var:`_a` shorthand notation (see
+:ref:`keyword_args`) and/or :ref:`default_args`. It is also permitted to omit
+the argument name by using the ``py::arg()`` constructor without an argument
+name, i.e. by specifying ``py::arg().noconvert()``.
+
+.. note::
+
+ When specifying ``py::arg`` options it is necessary to provide the same
+ number of options as the bound function has arguments. Thus if you want to
+ enable no-convert behaviour for just one of several arguments, you will
+ need to specify a ``py::arg()`` annotation for each argument with the
+ no-convert argument modified to ``py::arg().noconvert()``.
+
+.. _none_arguments:
+
+Allow/Prohibiting None arguments
+================================
+
+When a C++ type registered with :class:`py::class_` is passed as an argument to
+a function taking the instance as pointer or shared holder (e.g. ``shared_ptr``
+or a custom, copyable holder as described in :ref:`smart_pointers`), pybind
+allows ``None`` to be passed from Python which results in calling the C++
+function with ``nullptr`` (or an empty holder) for the argument.
+
+To explicitly enable or disable this behaviour, using the
+``.none`` method of the :class:`py::arg` object:
+
+.. code-block:: cpp
+
+ py::class_(m, "Dog").def(py::init<>());
+ py::class_(m, "Cat").def(py::init<>());
+ m.def("bark", [](Dog *dog) -> std::string {
+ if (dog) return "woof!"; /* Called with a Dog instance */
+ else return "(no dog)"; /* Called with None, dog == nullptr */
+ }, py::arg("dog").none(true));
+ m.def("meow", [](Cat *cat) -> std::string {
+ // Can't be called with None argument
+ return "meow";
+ }, py::arg("cat").none(false));
+
+With the above, the Python call ``bark(None)`` will return the string ``"(no
+dog)"``, while attempting to call ``meow(None)`` will raise a ``TypeError``:
+
+.. code-block:: pycon
+
+ >>> from animals import Dog, Cat, bark, meow
+ >>> bark(Dog())
+ 'woof!'
+ >>> meow(Cat())
+ 'meow'
+ >>> bark(None)
+ '(no dog)'
+ >>> meow(None)
+ Traceback (most recent call last):
+ File "", line 1, in
+ TypeError: meow(): incompatible function arguments. The following argument types are supported:
+ 1. (cat: animals.Cat) -> str
+
+ Invoked with: None
+
+The default behaviour when the tag is unspecified is to allow ``None``.
+
+.. note::
+
+ Even when ``.none(true)`` is specified for an argument, ``None`` will be converted to a
+ ``nullptr`` *only* for custom and :ref:`opaque ` types. Pointers to built-in types
+ (``double *``, ``int *``, ...) and STL types (``std::vector *``, ...; if ``pybind11/stl.h``
+ is included) are copied when converted to C++ (see :doc:`/advanced/cast/overview`) and will
+ not allow ``None`` as argument. To pass optional argument of these copied types consider
+ using ``std::optional``
+
+.. _overload_resolution:
+
+Overload resolution order
+=========================
+
+When a function or method with multiple overloads is called from Python,
+pybind11 determines which overload to call in two passes. The first pass
+attempts to call each overload without allowing argument conversion (as if
+every argument had been specified as ``py::arg().noconvert()`` as described
+above).
+
+If no overload succeeds in the no-conversion first pass, a second pass is
+attempted in which argument conversion is allowed (except where prohibited via
+an explicit ``py::arg().noconvert()`` attribute in the function definition).
+
+If the second pass also fails a ``TypeError`` is raised.
+
+Within each pass, overloads are tried in the order they were registered with
+pybind11. If the ``py::prepend()`` tag is added to the definition, a function
+can be placed at the beginning of the overload sequence instead, allowing user
+overloads to proceed built in functions.
+
+What this means in practice is that pybind11 will prefer any overload that does
+not require conversion of arguments to an overload that does, but otherwise
+prefers earlier-defined overloads to later-defined ones.
+
+.. note::
+
+ pybind11 does *not* further prioritize based on the number/pattern of
+ overloaded arguments. That is, pybind11 does not prioritize a function
+ requiring one conversion over one requiring three, but only prioritizes
+ overloads requiring no conversion at all to overloads that require
+ conversion of at least one argument.
+
+.. versionadded:: 2.6
+
+ The ``py::prepend()`` tag.
+
+Binding functions with template parameters
+==========================================
+
+You can bind functions that have template parameters. Here's a function:
+
+.. code-block:: cpp
+
+ template
+ void set(T t);
+
+C++ templates cannot be instantiated at runtime, so you cannot bind the
+non-instantiated function:
+
+.. code-block:: cpp
+
+ // BROKEN (this will not compile)
+ m.def("set", &set);
+
+You must bind each instantiated function template separately. You may bind
+each instantiation with the same name, which will be treated the same as
+an overloaded function:
+
+.. code-block:: cpp
+
+ m.def("set", &set);
+ m.def("set", &set);
+
+Sometimes it's more clear to bind them with separate names, which is also
+an option:
+
+.. code-block:: cpp
+
+ m.def("setInt", &set);
+ m.def("setString", &set);
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/misc.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/misc.rst
new file mode 100644
index 0000000000000000000000000000000000000000..25187070677eafdaca3f001248565a86aebbe762
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/misc.rst
@@ -0,0 +1,337 @@
+Miscellaneous
+#############
+
+.. _macro_notes:
+
+General notes regarding convenience macros
+==========================================
+
+pybind11 provides a few convenience macros such as
+:func:`PYBIND11_DECLARE_HOLDER_TYPE` and ``PYBIND11_OVERRIDE_*``. Since these
+are "just" macros that are evaluated in the preprocessor (which has no concept
+of types), they *will* get confused by commas in a template argument; for
+example, consider:
+
+.. code-block:: cpp
+
+ PYBIND11_OVERRIDE(MyReturnType, Class, func)
+
+The limitation of the C preprocessor interprets this as five arguments (with new
+arguments beginning after each comma) rather than three. To get around this,
+there are two alternatives: you can use a type alias, or you can wrap the type
+using the ``PYBIND11_TYPE`` macro:
+
+.. code-block:: cpp
+
+ // Version 1: using a type alias
+ using ReturnType = MyReturnType;
+ using ClassType = Class;
+ PYBIND11_OVERRIDE(ReturnType, ClassType, func);
+
+ // Version 2: using the PYBIND11_TYPE macro:
+ PYBIND11_OVERRIDE(PYBIND11_TYPE(MyReturnType),
+ PYBIND11_TYPE(Class), func)
+
+The ``PYBIND11_MAKE_OPAQUE`` macro does *not* require the above workarounds.
+
+.. _gil:
+
+Global Interpreter Lock (GIL)
+=============================
+
+When calling a C++ function from Python, the GIL is always held.
+The classes :class:`gil_scoped_release` and :class:`gil_scoped_acquire` can be
+used to acquire and release the global interpreter lock in the body of a C++
+function call. In this way, long-running C++ code can be parallelized using
+multiple Python threads. Taking :ref:`overriding_virtuals` as an example, this
+could be realized as follows (important changes highlighted):
+
+.. code-block:: cpp
+ :emphasize-lines: 8,9,31,32
+
+ class PyAnimal : public Animal {
+ public:
+ /* Inherit the constructors */
+ using Animal::Animal;
+
+ /* Trampoline (need one for each virtual function) */
+ std::string go(int n_times) {
+ /* Acquire GIL before calling Python code */
+ py::gil_scoped_acquire acquire;
+
+ PYBIND11_OVERRIDE_PURE(
+ std::string, /* Return type */
+ Animal, /* Parent class */
+ go, /* Name of function */
+ n_times /* Argument(s) */
+ );
+ }
+ };
+
+ PYBIND11_MODULE(example, m) {
+ py::class_ animal(m, "Animal");
+ animal
+ .def(py::init<>())
+ .def("go", &Animal::go);
+
+ py::class_(m, "Dog", animal)
+ .def(py::init<>());
+
+ m.def("call_go", [](Animal *animal) -> std::string {
+ /* Release GIL before calling into (potentially long-running) C++ code */
+ py::gil_scoped_release release;
+ return call_go(animal);
+ });
+ }
+
+The ``call_go`` wrapper can also be simplified using the ``call_guard`` policy
+(see :ref:`call_policies`) which yields the same result:
+
+.. code-block:: cpp
+
+ m.def("call_go", &call_go, py::call_guard());
+
+
+Binding sequence data types, iterators, the slicing protocol, etc.
+==================================================================
+
+Please refer to the supplemental example for details.
+
+.. seealso::
+
+ The file :file:`tests/test_sequences_and_iterators.cpp` contains a
+ complete example that shows how to bind a sequence data type, including
+ length queries (``__len__``), iterators (``__iter__``), the slicing
+ protocol and other kinds of useful operations.
+
+
+Partitioning code over multiple extension modules
+=================================================
+
+It's straightforward to split binding code over multiple extension modules,
+while referencing types that are declared elsewhere. Everything "just" works
+without any special precautions. One exception to this rule occurs when
+extending a type declared in another extension module. Recall the basic example
+from Section :ref:`inheritance`.
+
+.. code-block:: cpp
+
+ py::class_ pet(m, "Pet");
+ pet.def(py::init())
+ .def_readwrite("name", &Pet::name);
+
+ py::class_(m, "Dog", pet /* <- specify parent */)
+ .def(py::init())
+ .def("bark", &Dog::bark);
+
+Suppose now that ``Pet`` bindings are defined in a module named ``basic``,
+whereas the ``Dog`` bindings are defined somewhere else. The challenge is of
+course that the variable ``pet`` is not available anymore though it is needed
+to indicate the inheritance relationship to the constructor of ``class_``.
+However, it can be acquired as follows:
+
+.. code-block:: cpp
+
+ py::object pet = (py::object) py::module_::import("basic").attr("Pet");
+
+ py::class_(m, "Dog", pet)
+ .def(py::init())
+ .def("bark", &Dog::bark);
+
+Alternatively, you can specify the base class as a template parameter option to
+``class_``, which performs an automated lookup of the corresponding Python
+type. Like the above code, however, this also requires invoking the ``import``
+function once to ensure that the pybind11 binding code of the module ``basic``
+has been executed:
+
+.. code-block:: cpp
+
+ py::module_::import("basic");
+
+ py::class_(m, "Dog")
+ .def(py::init())
+ .def("bark", &Dog::bark);
+
+Naturally, both methods will fail when there are cyclic dependencies.
+
+Note that pybind11 code compiled with hidden-by-default symbol visibility (e.g.
+via the command line flag ``-fvisibility=hidden`` on GCC/Clang), which is
+required for proper pybind11 functionality, can interfere with the ability to
+access types defined in another extension module. Working around this requires
+manually exporting types that are accessed by multiple extension modules;
+pybind11 provides a macro to do just this:
+
+.. code-block:: cpp
+
+ class PYBIND11_EXPORT Dog : public Animal {
+ ...
+ };
+
+Note also that it is possible (although would rarely be required) to share arbitrary
+C++ objects between extension modules at runtime. Internal library data is shared
+between modules using capsule machinery [#f6]_ which can be also utilized for
+storing, modifying and accessing user-defined data. Note that an extension module
+will "see" other extensions' data if and only if they were built with the same
+pybind11 version. Consider the following example:
+
+.. code-block:: cpp
+
+ auto data = reinterpret_cast(py::get_shared_data("mydata"));
+ if (!data)
+ data = static_cast(py::set_shared_data("mydata", new MyData(42)));
+
+If the above snippet was used in several separately compiled extension modules,
+the first one to be imported would create a ``MyData`` instance and associate
+a ``"mydata"`` key with a pointer to it. Extensions that are imported later
+would be then able to access the data behind the same pointer.
+
+.. [#f6] https://docs.python.org/3/extending/extending.html#using-capsules
+
+Module Destructors
+==================
+
+pybind11 does not provide an explicit mechanism to invoke cleanup code at
+module destruction time. In rare cases where such functionality is required, it
+is possible to emulate it using Python capsules or weak references with a
+destruction callback.
+
+.. code-block:: cpp
+
+ auto cleanup_callback = []() {
+ // perform cleanup here -- this function is called with the GIL held
+ };
+
+ m.add_object("_cleanup", py::capsule(cleanup_callback));
+
+This approach has the potential downside that instances of classes exposed
+within the module may still be alive when the cleanup callback is invoked
+(whether this is acceptable will generally depend on the application).
+
+Alternatively, the capsule may also be stashed within a type object, which
+ensures that it not called before all instances of that type have been
+collected:
+
+.. code-block:: cpp
+
+ auto cleanup_callback = []() { /* ... */ };
+ m.attr("BaseClass").attr("_cleanup") = py::capsule(cleanup_callback);
+
+Both approaches also expose a potentially dangerous ``_cleanup`` attribute in
+Python, which may be undesirable from an API standpoint (a premature explicit
+call from Python might lead to undefined behavior). Yet another approach that
+avoids this issue involves weak reference with a cleanup callback:
+
+.. code-block:: cpp
+
+ // Register a callback function that is invoked when the BaseClass object is collected
+ py::cpp_function cleanup_callback(
+ [](py::handle weakref) {
+ // perform cleanup here -- this function is called with the GIL held
+
+ weakref.dec_ref(); // release weak reference
+ }
+ );
+
+ // Create a weak reference with a cleanup callback and initially leak it
+ (void) py::weakref(m.attr("BaseClass"), cleanup_callback).release();
+
+.. note::
+
+ PyPy does not garbage collect objects when the interpreter exits. An alternative
+ approach (which also works on CPython) is to use the :py:mod:`atexit` module [#f7]_,
+ for example:
+
+ .. code-block:: cpp
+
+ auto atexit = py::module_::import("atexit");
+ atexit.attr("register")(py::cpp_function([]() {
+ // perform cleanup here -- this function is called with the GIL held
+ }));
+
+ .. [#f7] https://docs.python.org/3/library/atexit.html
+
+
+Generating documentation using Sphinx
+=====================================
+
+Sphinx [#f4]_ has the ability to inspect the signatures and documentation
+strings in pybind11-based extension modules to automatically generate beautiful
+documentation in a variety formats. The python_example repository [#f5]_ contains a
+simple example repository which uses this approach.
+
+There are two potential gotchas when using this approach: first, make sure that
+the resulting strings do not contain any :kbd:`TAB` characters, which break the
+docstring parsing routines. You may want to use C++11 raw string literals,
+which are convenient for multi-line comments. Conveniently, any excess
+indentation will be automatically be removed by Sphinx. However, for this to
+work, it is important that all lines are indented consistently, i.e.:
+
+.. code-block:: cpp
+
+ // ok
+ m.def("foo", &foo, R"mydelimiter(
+ The foo function
+
+ Parameters
+ ----------
+ )mydelimiter");
+
+ // *not ok*
+ m.def("foo", &foo, R"mydelimiter(The foo function
+
+ Parameters
+ ----------
+ )mydelimiter");
+
+By default, pybind11 automatically generates and prepends a signature to the docstring of a function
+registered with ``module_::def()`` and ``class_::def()``. Sometimes this
+behavior is not desirable, because you want to provide your own signature or remove
+the docstring completely to exclude the function from the Sphinx documentation.
+The class ``options`` allows you to selectively suppress auto-generated signatures:
+
+.. code-block:: cpp
+
+ PYBIND11_MODULE(example, m) {
+ py::options options;
+ options.disable_function_signatures();
+
+ m.def("add", [](int a, int b) { return a + b; }, "A function which adds two numbers");
+ }
+
+Note that changes to the settings affect only function bindings created during the
+lifetime of the ``options`` instance. When it goes out of scope at the end of the module's init function,
+the default settings are restored to prevent unwanted side effects.
+
+.. [#f4] http://www.sphinx-doc.org
+.. [#f5] http://github.com/pybind/python_example
+
+.. _avoiding-cpp-types-in-docstrings:
+
+Avoiding C++ types in docstrings
+================================
+
+Docstrings are generated at the time of the declaration, e.g. when ``.def(...)`` is called.
+At this point parameter and return types should be known to pybind11.
+If a custom type is not exposed yet through a ``py::class_`` constructor or a custom type caster,
+its C++ type name will be used instead to generate the signature in the docstring:
+
+.. code-block:: text
+
+ | __init__(...)
+ | __init__(self: example.Foo, arg0: ns::Bar) -> None
+ ^^^^^^^
+
+
+This limitation can be circumvented by ensuring that C++ classes are registered with pybind11
+before they are used as a parameter or return type of a function:
+
+.. code-block:: cpp
+
+ PYBIND11_MODULE(example, m) {
+
+ auto pyFoo = py::class_(m, "Foo");
+ auto pyBar = py::class_(m, "Bar");
+
+ pyFoo.def(py::init());
+ pyBar.def(py::init());
+ }
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/index.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b87d4d60824c05e44ff9c2b0a030299e4d122d9e
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/index.rst
@@ -0,0 +1,13 @@
+Python C++ interface
+####################
+
+pybind11 exposes Python types and functions using thin C++ wrappers, which
+makes it possible to conveniently call Python code from C++ without resorting
+to Python's C API.
+
+.. toctree::
+ :maxdepth: 2
+
+ object
+ numpy
+ utilities
diff --git a/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/numpy.rst b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/numpy.rst
new file mode 100644
index 0000000000000000000000000000000000000000..848ef37e0803f5a38d3fe6890f863542bdc0b291
--- /dev/null
+++ b/third-party/DPVO/DPRetrieval/pybind11/docs/advanced/pycpp/numpy.rst
@@ -0,0 +1,463 @@
+.. _numpy:
+
+NumPy
+#####
+
+Buffer protocol
+===============
+
+Python supports an extremely general and convenient approach for exchanging
+data between plugin libraries. Types can expose a buffer view [#f2]_, which
+provides fast direct access to the raw internal data representation. Suppose we
+want to bind the following simplistic Matrix class:
+
+.. code-block:: cpp
+
+ class Matrix {
+ public:
+ Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
+ m_data = new float[rows*cols];
+ }
+ float *data() { return m_data; }
+ size_t rows() const { return m_rows; }
+ size_t cols() const { return m_cols; }
+ private:
+ size_t m_rows, m_cols;
+ float *m_data;
+ };
+
+The following binding code exposes the ``Matrix`` contents as a buffer object,
+making it possible to cast Matrices into NumPy arrays. It is even possible to
+completely avoid copy operations with Python expressions like
+``np.array(matrix_instance, copy = False)``.
+
+.. code-block:: cpp
+
+ py::class_