Spaces:

Form-Fighter
/

FormFighterAIStack

Sleeping

App Files Files Community

Techt3o commited on Oct 23, 2024

Commit

c87d1bc

verified ·

1 Parent(s): f561f8b

ca5705cc9c8581d916aca37e6759c44f0b1e70429e49ce83e658a0517cd3d6fe

Browse files

Files changed (50) hide show

LICENSE +21 -0
README.md +120 -11
lib/models/layers/__pycache__/utils.cpython-39.pyc +0 -0
lib/models/layers/modules.py +262 -0
lib/models/layers/utils.py +52 -0
lib/models/preproc/__pycache__/detector.cpython-39.pyc +0 -0
lib/models/preproc/__pycache__/extractor.cpython-39.pyc +0 -0
lib/models/preproc/__pycache__/slam.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/hmr2.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/pose_transformer.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/smpl_head.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/t_cond_mlp.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/utils.cpython-39.pyc +0 -0
lib/models/preproc/backbone/__pycache__/vit.cpython-39.pyc +0 -0
lib/models/preproc/backbone/hmr2.py +77 -0
lib/models/preproc/backbone/pose_transformer.py +357 -0
lib/models/preproc/backbone/smpl_head.py +128 -0
lib/models/preproc/backbone/t_cond_mlp.py +198 -0
lib/models/preproc/backbone/utils.py +115 -0
lib/models/preproc/backbone/vit.py +348 -0
lib/models/preproc/detector.py +146 -0
lib/models/preproc/extractor.py +112 -0
lib/models/preproc/slam.py +70 -0
lib/models/smpl.py +264 -0
lib/models/smplify/__init__.py +1 -0
lib/models/smplify/__pycache__/__init__.cpython-39.pyc +0 -0
lib/models/smplify/__pycache__/losses.cpython-39.pyc +0 -0
lib/models/smplify/__pycache__/smplify.cpython-39.pyc +0 -0
lib/models/smplify/losses.py +87 -0
lib/models/smplify/smplify.py +83 -0
lib/models/wham.py +210 -0
lib/utils/__pycache__/data_utils.cpython-39.pyc +0 -0
lib/utils/__pycache__/imutils.cpython-39.pyc +0 -0
lib/utils/__pycache__/kp_utils.cpython-39.pyc +0 -0
lib/utils/__pycache__/transforms.cpython-39.pyc +0 -0
lib/utils/data_utils.py +113 -0
lib/utils/imutils.py +363 -0
lib/utils/kp_utils.py +761 -0
lib/utils/transforms.py +828 -0
lib/utils/utils.py +265 -0
lib/vis/__pycache__/renderer.cpython-39.pyc +0 -0
lib/vis/__pycache__/run_vis.cpython-39.pyc +0 -0
lib/vis/__pycache__/tools.cpython-39.pyc +0 -0
lib/vis/renderer.py +313 -0
lib/vis/run_vis.py +92 -0
lib/vis/tools.py +822 -0
output/demo/test19/output.mp4 +0 -0
output/demo/test19/slam_results.pth +3 -0
output/demo/test19/tracking_results.pth +3 -0
output/demo/test19/wham_output.pkl +3 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Soyong Shin
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,11 +1,120 @@
----
-title: Motionbert Meta Sapiens
-emoji: 🌍
-colorFrom: green
-colorTo: indigo
-sdk: docker
-pinned: false
-short_description: Sapiens
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
+<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> [![report](https://img.shields.io/badge/arxiv-report-red)](https://arxiv.org/abs/2312.07531) <a href="https://wham.is.tue.mpg.de/"><img alt="Project" src="https://img.shields.io/badge/-Project%20Page-lightgrey?logo=Google%20Chrome&color=informational&logoColor=white"></a> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ysUtGSwidTQIdBQRhq0hj63KbseFujkn?usp=sharing)
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wham-reconstructing-world-grounded-humans/3d-human-pose-estimation-on-3dpw)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-3dpw?p=wham-reconstructing-world-grounded-humans) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wham-reconstructing-world-grounded-humans/3d-human-pose-estimation-on-emdb)](https://paperswithcode.com/sota/3d-human-pose-estimation-on-emdb?p=wham-reconstructing-world-grounded-humans)
+https://github.com/yohanshin/WHAM/assets/46889727/da4602b4-0597-4e64-8da4-ab06931b23ee
+## Introduction
+This repository is the official [Pytorch](https://pytorch.org/) implementation of [WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion](https://arxiv.org/abs/2312.07531). For more information, please visit our [project page](https://wham.is.tue.mpg.de/).
+## Installation
+Please see [Installation](docs/INSTALL.md) for details.
+## Quick Demo
+### [<img src="https://i.imgur.com/QCojoJk.png" width="30"> Google Colab for WHAM demo is now available](https://colab.research.google.com/drive/1ysUtGSwidTQIdBQRhq0hj63KbseFujkn?usp=sharing)
+### Registration
+To download SMPL body models (Neutral, Female, and Male), you need to register for [SMPL](https://smpl.is.tue.mpg.de/) and [SMPLify](https://smplify.is.tue.mpg.de/). The username and password for both homepages will be used while fetching the demo data.
+Next, run the following script to fetch demo data. This script will download all the required dependencies including trained models and demo videos.
+```bash
+bash fetch_demo_data.sh
+```
+You can try with one examplar video:
+```
+python demo.py --video examples/IMG_9732.mov --visualize
+```
+We assume camera focal length following [CLIFF](https://github.com/haofanwang/CLIFF). You can specify known camera intrinsics [fx fy cx cy] for SLAM as the demo example below:
+```
+python demo.py --video examples/drone_video.mp4 --calib examples/drone_calib.txt --visualize
+```
+You can skip SLAM if you only want to get camera-coordinate motion. You can run as:
+```
+python demo.py --video examples/IMG_9732.mov --visualize --estimate_local_only
+```
+You can further refine the results of WHAM using Temporal SMPLify as a post processing. This will allow better 2D alignment as well as 3D accuracy. All you need to do is add `--run_smplify` flag when running demo.
+## Docker
+Please refer to [Docker](docs/DOCKER.md) for details.
+## Python API
+Please refer to [API](docs/API.md) for details.
+## Dataset
+Please see [Dataset](docs/DATASET.md) for details.
+## Evaluation
+```bash
+# Evaluate on 3DPW dataset
+python -m lib.eval.evaluate_3dpw --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
+# Evaluate on RICH dataset
+python -m lib.eval.evaluate_rich --cfg configs/yamls/demo.yaml TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar
+# Evaluate on EMDB dataset (also computes W-MPJPE and WA-MPJPE)
+python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 1 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar   # EMDB 1
+python -m lib.eval.evaluate_emdb --cfg configs/yamls/demo.yaml --eval-split 2 TRAIN.CHECKPOINT checkpoints/wham_vit_w_3dpw.pth.tar   # EMDB 2
+```
+## Training
+WHAM training involves into two different stages; (1) 2D to SMPL lifting through AMASS dataset and (2) finetuning with feature integration using the video datasets. Please see [Dataset](docs/DATASET.md) for preprocessing the training datasets.
+### Stage 1.
+```bash
+python train.py --cfg configs/yamls/stage1.yaml
+```
+### Stage 2.
+Training stage 2 requires pretrained results from the stage 1. You can use your pretrained results, or download the weight from [Google Drive](https://drive.google.com/file/d/1Erjkho7O0bnZFawarntICRUCroaKabRE/view?usp=sharing) save as `checkpoints/wham_stage1.tar.pth`.
+```bash
+python train.py --cfg configs/yamls/stage2.yaml TRAIN.CHECKPOINT <PATH-TO-STAGE1-RESULTS>
+```
+### Train with BEDLAM
+TBD
+## Acknowledgement
+We would like to sincerely appreciate Hongwei Yi and Silvia Zuffi for the discussion and proofreading. Part of this work was done when Soyong Shin was an intern at the Max Planck Institute for Intelligence System.
+The base implementation is largely borrowed from [VIBE](https://github.com/mkocabas/VIBE) and [TCMR](https://github.com/hongsukchoi/TCMR_RELEASE). We use [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) for 2D keypoints detection and [DPVO](https://github.com/princeton-vl/DPVO), [DROID-SLAM](https://github.com/princeton-vl/DROID-SLAM) for extracting camera motion. Please visit their official websites for more details.
+## TODO
+- [ ] Data preprocessing
+- [x] Training implementation
+- [x] Colab demo release
+- [x] Demo for custom videos
+## Citation
+```
+@InProceedings{shin2023wham,
+title={WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion},
+author={Shin, Soyong and Kim, Juyong and Halilaj, Eni and Black, Michael J.},
+booktitle={Computer Vision and Pattern Recognition (CVPR)},
+year={2024}
+}
+```
+## License
+Please see [License](./LICENSE) for details.
+## Contact
+Please contact [email protected] for any questions related to this work.

lib/models/layers/__pycache__/utils.cpython-39.pyc ADDED Viewed

Binary file (2.08 kB). View file

lib/models/layers/modules.py ADDED Viewed

	@@ -0,0 +1,262 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import torch
+import numpy as np
+from torch import nn
+from configs import constants as _C
+from .utils import rollout_global_motion
+from lib.utils.transforms import axis_angle_to_matrix
+class Regressor(nn.Module):
+    def __init__(self, in_dim, hid_dim, out_dims, init_dim, layer='LSTM', n_layers=2, n_iters=1):
+        super().__init__()
+        self.n_outs = len(out_dims)
+        self.rnn = getattr(nn, layer.upper())(
+            in_dim + init_dim, hid_dim, n_layers,
+            bidirectional=False, batch_first=True, dropout=0.3)
+        for i, out_dim in enumerate(out_dims):
+            setattr(self, 'declayer%d'%i, nn.Linear(hid_dim, out_dim))
+            nn.init.xavier_uniform_(getattr(self, 'declayer%d'%i).weight, gain=0.01)
+    def forward(self, x, inits, h0):
+        xc = torch.cat([x, *inits], dim=-1)
+        xc, h0 = self.rnn(xc, h0)
+        preds = []
+        for j in range(self.n_outs):
+            out = getattr(self, 'declayer%d'%j)(xc)
+            preds.append(out)
+        return preds, xc, h0
+class NeuralInitialization(nn.Module):
+    def __init__(self, in_dim, hid_dim, layer, n_layers):
+        super().__init__()
+        out_dim = hid_dim
+        self.n_layers = n_layers
+        self.num_inits = int(layer.upper() == 'LSTM') + 1
+        out_dim *= self.num_inits * n_layers
+        self.linear1 = nn.Linear(in_dim, hid_dim)
+        self.linear2 = nn.Linear(hid_dim, hid_dim * self.n_layers)
+        self.linear3 = nn.Linear(hid_dim * self.n_layers, out_dim)
+        self.relu1 = nn.ReLU()
+        self.relu2 = nn.ReLU()
+    def forward(self, x):
+        b = x.shape[0]
+        out = self.linear3(self.relu2(self.linear2(self.relu1(self.linear1(x)))))
+        out = out.view(b, self.num_inits, self.n_layers, -1).permute(1, 2, 0, 3).contiguous()
+        if self.num_inits == 2:
+            return tuple([_ for _ in out])
+        return out[0]
+class Integrator(nn.Module):
+    def __init__(self, in_channel, out_channel, hid_channel=1024):
+        super().__init__()
+        self.layer1 = nn.Linear(in_channel, hid_channel)
+        self.relu1 = nn.ReLU()
+        self.dr1 = nn.Dropout(0.1)
+        self.layer2 = nn.Linear(hid_channel, hid_channel)
+        self.relu2 = nn.ReLU()
+        self.dr2 = nn.Dropout(0.1)
+        self.layer3 = nn.Linear(hid_channel, out_channel)
+    def forward(self, x, feat):
+        res = x
+        mask = (feat != 0).all(dim=-1).all(dim=-1)
+        out = torch.cat((x, feat), dim=-1)
+        out = self.layer1(out)
+        out = self.relu1(out)
+        out = self.dr1(out)
+        out = self.layer2(out)
+        out = self.relu2(out)
+        out = self.dr2(out)
+        out = self.layer3(out)
+        out[mask] = out[mask] + res[mask]
+        return out
+class MotionEncoder(nn.Module):
+    def __init__(self,
+                 in_dim,
+                 d_embed,
+                 pose_dr,
+                 rnn_type,
+                 n_layers,
+                 n_joints):
+        super().__init__()
+        self.n_joints = n_joints
+        self.embed_layer = nn.Linear(in_dim, d_embed)
+        self.pos_drop = nn.Dropout(pose_dr)
+        # Keypoints initializer
+        self.neural_init = NeuralInitialization(n_joints * 3 + in_dim, d_embed, rnn_type, n_layers)
+        # 3d keypoints regressor
+        self.regressor = Regressor(
+            d_embed, d_embed, [n_joints * 3], n_joints * 3, rnn_type, n_layers)
+    def forward(self, x, init):
+        """ Forward pass of motion encoder.
+        """
+        self.b, self.f = x.shape[:2]
+        x = self.embed_layer(x.reshape(self.b, self.f, -1))
+        x = self.pos_drop(x)
+        h0 = self.neural_init(init)
+        pred_list = [init[..., :self.n_joints * 3]]
+        motion_context_list = []
+        for i in range(self.f):
+            (pred_kp3d, ), motion_context, h0 = self.regressor(x[:, [i]], pred_list[-1:], h0)
+            motion_context_list.append(motion_context)
+            pred_list.append(pred_kp3d)
+        pred_kp3d = torch.cat(pred_list[1:], dim=1).view(self.b, self.f, -1, 3)
+        motion_context = torch.cat(motion_context_list, dim=1)
+        # Merge 3D keypoints with motion context
+        motion_context = torch.cat((motion_context, pred_kp3d.reshape(self.b, self.f, -1)), dim=-1)
+        return pred_kp3d, motion_context
+class TrajectoryDecoder(nn.Module):
+    def __init__(self,
+                 d_embed,
+                 rnn_type,
+                 n_layers):
+        super().__init__()
+        # Trajectory regressor
+        self.regressor = Regressor(
+            d_embed, d_embed, [3, 6], 12, rnn_type, n_layers, )
+    def forward(self, x, root, cam_a, h0=None):
+        """ Forward pass of trajectory decoder.
+        """
+        b, f = x.shape[:2]
+        pred_root_list, pred_vel_list = [root[:, :1]], []
+        for i in range(f):
+            # Global coordinate estimation
+            (pred_rootv, pred_rootr), _, h0 = self.regressor(
+                x[:, [i]], [pred_root_list[-1], cam_a[:, [i]]], h0)
+            pred_root_list.append(pred_rootr)
+            pred_vel_list.append(pred_rootv)
+        pred_root = torch.cat(pred_root_list, dim=1).view(b, f + 1, -1)
+        pred_vel = torch.cat(pred_vel_list, dim=1).view(b, f, -1)
+        return pred_root, pred_vel
+class MotionDecoder(nn.Module):
+    def __init__(self,
+                 d_embed,
+                 rnn_type,
+                 n_layers):
+        super().__init__()
+        self.n_pose = 24
+        # SMPL pose initialization
+        self.neural_init = NeuralInitialization(len(_C.BMODEL.MAIN_JOINTS) * 6, d_embed, rnn_type, n_layers)
+        # 3d keypoints regressor
+        self.regressor = Regressor(
+            d_embed, d_embed, [self.n_pose * 6, 10, 3, 4], self.n_pose * 6, rnn_type, n_layers)
+    def forward(self, x, init):
+        """ Forward pass of motion decoder.
+        """
+        b, f = x.shape[:2]
+        h0 = self.neural_init(init[:, :, _C.BMODEL.MAIN_JOINTS].reshape(b, 1, -1))
+        # Recursive prediction of SMPL parameters
+        pred_pose_list = [init.reshape(b, 1, -1)]
+        pred_shape_list, pred_cam_list, pred_contact_list = [], [], []
+        for i in range(f):
+            # Camera coordinate estimation
+            (pred_pose, pred_shape, pred_cam, pred_contact), _, h0 = self.regressor(x[:, [i]], pred_pose_list[-1:], h0)
+            pred_pose_list.append(pred_pose)
+            pred_shape_list.append(pred_shape)
+            pred_cam_list.append(pred_cam)
+            pred_contact_list.append(pred_contact)
+        pred_pose = torch.cat(pred_pose_list[1:], dim=1).view(b, f, -1)
+        pred_shape = torch.cat(pred_shape_list, dim=1).view(b, f, -1)
+        pred_cam = torch.cat(pred_cam_list, dim=1).view(b, f, -1)
+        pred_contact = torch.cat(pred_contact_list, dim=1).view(b, f, -1)
+        return pred_pose, pred_shape, pred_cam, pred_contact
+class TrajectoryRefiner(nn.Module):
+    def __init__(self,
+                 d_embed,
+                 d_hidden,
+                 rnn_type,
+                 n_layers):
+        super().__init__()
+        d_input = d_embed + 12
+        self.refiner = Regressor(
+            d_input, d_hidden, [6, 3], 9, rnn_type, n_layers)
+    def forward(self, context, pred_vel, output, cam_angvel, return_y_up):
+        b, f = context.shape[:2]
+        # Register values
+        pred_root = output['poses_root_r6d'].clone().detach()
+        feet = output['feet'].clone().detach()
+        contact = output['contact'].clone().detach()
+        feet_vel = torch.cat((torch.zeros_like(feet[:, :1]), feet[:, 1:] - feet[:, :-1]), dim=1) * 30   # Normalize to 30 times
+        feet = (feet_vel * contact.unsqueeze(-1)).reshape(b, f, -1)  # Velocity input
+        inpt_feat = torch.cat([context, feet], dim=-1)
+        (delta_root, delta_vel), _, _ = self.refiner(inpt_feat, [pred_root[:, 1:], pred_vel], h0=None)
+        pred_root[:, 1:] = pred_root[:, 1:] + delta_root
+        pred_vel = pred_vel + delta_vel
+        # root_world, trans_world = rollout_global_motion(pred_root, pred_vel)
+        # if return_y_up:
+        #     yup2ydown = axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float().to(root_world.device)
+        #     root_world = yup2ydown.mT @ root_world
+        #     trans_world = (yup2ydown.mT @ trans_world.unsqueeze(-1)).squeeze(-1)
+        output.update({
+            'poses_root_r6d_refined': pred_root,
+            'vel_root_refined': pred_vel,
+            # 'poses_root_world': root_world,
+            # 'trans_world': trans_world,
+        })
+        return output

lib/models/layers/utils.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import torch
+from lib.utils import transforms
+def rollout_global_motion(root_r, root_v, init_trans=None):
+    b, f = root_v.shape[:2]
+    root = transforms.rotation_6d_to_matrix(root_r[:])
+    vel_world = (root[:, :-1] @ root_v.unsqueeze(-1)).squeeze(-1)
+    trans = torch.cumsum(vel_world, dim=1)
+    if init_trans is not None: trans = trans + init_trans
+    return root[:, 1:], trans
+def compute_camera_motion(output, root_c_d6d, root_w, trans, pred_cam):
+    root_c = transforms.rotation_6d_to_matrix(root_c_d6d)  # Root orient in cam coord
+    cam_R = root_c @ root_w.mT
+    pelvis_cam = output.full_cam.view_as(pred_cam)
+    pelvis_world = (cam_R.mT @ pelvis_cam.unsqueeze(-1)).squeeze(-1)
+    cam_T_world = pelvis_world - trans
+    cam_T = (cam_R @ cam_T_world.unsqueeze(-1)).squeeze(-1)
+    return cam_R, cam_T
+def compute_camera_pose(root_c_d6d, root_w):
+    root_c = transforms.rotation_6d_to_matrix(root_c_d6d)  # Root orient in cam coord
+    cam_R = root_c @ root_w.mT
+    return cam_R
+def reset_root_velocity(smpl, output, stationary, pred_ori, pred_vel, thr=0.7):
+    b, f = pred_vel.shape[:2]
+    stationary_mask = (stationary.clone().detach() > thr).unsqueeze(-1).float()
+    poses_root = transforms.rotation_6d_to_matrix(pred_ori.clone().detach())
+    vel_world = (poses_root[:, 1:] @ pred_vel.clone().detach().unsqueeze(-1)).squeeze(-1)
+    output = smpl.get_output(body_pose=output.body_pose.clone().detach(),
+                             global_orient=poses_root[:, 1:].reshape(-1, 1, 3, 3),
+                             betas=output.betas.clone().detach(),
+                             pose2rot=False)
+    feet = output.feet.reshape(b, f, 4, 3)
+    feet_vel = feet[:, 1:] - feet[:, :-1] + vel_world[:, 1:].unsqueeze(-2)
+    feet_vel = torch.cat((torch.zeros_like(feet_vel[:, :1]), feet_vel), dim=1)
+    stationary_vel = feet_vel * stationary_mask
+    del_vel = stationary_vel.sum(dim=2) / ((stationary_vel != 0).sum(dim=2) + 1e-4)
+    vel_world_update = vel_world - del_vel
+    vel_root = (poses_root[:, 1:].mT @ vel_world_update.unsqueeze(-1)).squeeze(-1)
+    return vel_root

lib/models/preproc/__pycache__/detector.cpython-39.pyc ADDED Viewed

Binary file (4.77 kB). View file

lib/models/preproc/__pycache__/extractor.cpython-39.pyc ADDED Viewed

Binary file (3.48 kB). View file

lib/models/preproc/__pycache__/slam.cpython-39.pyc ADDED Viewed

Binary file (2.6 kB). View file

lib/models/preproc/backbone/__pycache__/hmr2.cpython-39.pyc ADDED Viewed

Binary file (2.43 kB). View file

lib/models/preproc/backbone/__pycache__/pose_transformer.cpython-39.pyc ADDED Viewed

Binary file (10.8 kB). View file

lib/models/preproc/backbone/__pycache__/smpl_head.cpython-39.pyc ADDED Viewed

Binary file (4.46 kB). View file

lib/models/preproc/backbone/__pycache__/t_cond_mlp.cpython-39.pyc ADDED Viewed

Binary file (6.04 kB). View file

lib/models/preproc/backbone/__pycache__/utils.cpython-39.pyc ADDED Viewed

Binary file (3.59 kB). View file

lib/models/preproc/backbone/__pycache__/vit.cpython-39.pyc ADDED Viewed

Binary file (11.2 kB). View file

lib/models/preproc/backbone/hmr2.py ADDED Viewed

	@@ -0,0 +1,77 @@

+import os
+import torch
+import einops
+import torch.nn as nn
+# import pytorch_lightning as pl
+from yacs.config import CfgNode
+from .vit import vit
+from .smpl_head import SMPLTransformerDecoderHead
+# class HMR2(pl.LightningModule):
+class HMR2(nn.Module):
+    def __init__(self):
+        """
+        Setup HMR2 model
+        Args:
+            cfg (CfgNode): Config file as a yacs CfgNode
+        """
+        super().__init__()
+        # Create backbone feature extractor
+        self.backbone = vit()
+        # Create SMPL head
+        self.smpl_head = SMPLTransformerDecoderHead()
+    def decode(self, x):
+        batch_size = x.shape[0]
+        pred_smpl_params, pred_cam, _ = self.smpl_head(x)
+        # Compute model vertices, joints and the projected joints
+        pred_smpl_params['global_orient'] = pred_smpl_params['global_orient'].reshape(batch_size, -1, 3, 3)
+        pred_smpl_params['body_pose'] = pred_smpl_params['body_pose'].reshape(batch_size, -1, 3, 3)
+        pred_smpl_params['betas'] = pred_smpl_params['betas'].reshape(batch_size, -1)
+        return pred_smpl_params['global_orient'], pred_smpl_params['body_pose'], pred_smpl_params['betas'], pred_cam
+    def forward(self, x, encode=False, **kwargs):
+        """
+        Run a forward step of the network
+        Args:
+            batch (Dict): Dictionary containing batch data
+            train (bool): Flag indicating whether it is training or validation mode
+        Returns:
+            Dict: Dictionary containing the regression output
+        """
+        # Use RGB image as input
+        batch_size = x.shape[0]
+        # Compute conditioning features using the backbone
+        # if using ViT backbone, we need to use a different aspect ratio
+        conditioning_feats = self.backbone(x[:,:,:,32:-32])
+        if encode:
+            conditioning_feats = einops.rearrange(conditioning_feats, 'b c h w -> b (h w) c')
+            token = torch.zeros(batch_size, 1, 1).to(x.device)
+            token_out = self.smpl_head.transformer(token, context=conditioning_feats)
+            return token_out.squeeze(1)
+        pred_smpl_params, pred_cam, _ = self.smpl_head(conditioning_feats)
+        # Compute model vertices, joints and the projected joints
+        pred_smpl_params['global_orient'] = pred_smpl_params['global_orient'].reshape(batch_size, -1, 3, 3)
+        pred_smpl_params['body_pose'] = pred_smpl_params['body_pose'].reshape(batch_size, -1, 3, 3)
+        pred_smpl_params['betas'] = pred_smpl_params['betas'].reshape(batch_size, -1)
+        return pred_smpl_params['global_orient'], pred_smpl_params['body_pose'], pred_smpl_params['betas'], pred_cam
+def hmr2(checkpoint_pth):
+    model = HMR2()
+    if os.path.exists(checkpoint_pth):
+        model.load_state_dict(torch.load(checkpoint_pth, map_location='cpu')['state_dict'], strict=False)
+        print(f'Load backbone weight: {checkpoint_pth}')
+    return model

lib/models/preproc/backbone/pose_transformer.py ADDED Viewed

	@@ -0,0 +1,357 @@

+from inspect import isfunction
+from typing import Callable, Optional
+import torch
+from einops import rearrange
+from einops.layers.torch import Rearrange
+from torch import nn
+from .t_cond_mlp import (
+    AdaptiveLayerNorm1D,
+    FrequencyEmbedder,
+    normalization_layer,
+)
+# from .vit import Attention, FeedForward
+def exists(val):
+    return val is not None
+def default(val, d):
+    if exists(val):
+        return val
+    return d() if isfunction(d) else d
+class PreNorm(nn.Module):
+    def __init__(self, dim: int, fn: Callable, norm: str = "layer", norm_cond_dim: int = -1):
+        super().__init__()
+        self.norm = normalization_layer(norm, dim, norm_cond_dim)
+        self.fn = fn
+    def forward(self, x: torch.Tensor, *args, **kwargs):
+        if isinstance(self.norm, AdaptiveLayerNorm1D):
+            return self.fn(self.norm(x, *args), **kwargs)
+        else:
+            return self.fn(self.norm(x), **kwargs)
+class FeedForward(nn.Module):
+    def __init__(self, dim, hidden_dim, dropout=0.0):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(dim, hidden_dim),
+            nn.GELU(),
+            nn.Dropout(dropout),
+            nn.Linear(hidden_dim, dim),
+            nn.Dropout(dropout),
+        )
+    def forward(self, x):
+        return self.net(x)
+class Attention(nn.Module):
+    def __init__(self, dim, heads=8, dim_head=64, dropout=0.0):
+        super().__init__()
+        inner_dim = dim_head * heads
+        project_out = not (heads == 1 and dim_head == dim)
+        self.heads = heads
+        self.scale = dim_head**-0.5
+        self.attend = nn.Softmax(dim=-1)
+        self.dropout = nn.Dropout(dropout)
+        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)
+        self.to_out = (
+            nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+            if project_out
+            else nn.Identity()
+        )
+    def forward(self, x):
+        qkv = self.to_qkv(x).chunk(3, dim=-1)
+        q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), qkv)
+        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+        attn = self.attend(dots)
+        attn = self.dropout(attn)
+        out = torch.matmul(attn, v)
+        out = rearrange(out, "b h n d -> b n (h d)")
+        return self.to_out(out)
+class CrossAttention(nn.Module):
+    def __init__(self, dim, context_dim=None, heads=8, dim_head=64, dropout=0.0):
+        super().__init__()
+        inner_dim = dim_head * heads
+        project_out = not (heads == 1 and dim_head == dim)
+        self.heads = heads
+        self.scale = dim_head**-0.5
+        self.attend = nn.Softmax(dim=-1)
+        self.dropout = nn.Dropout(dropout)
+        context_dim = default(context_dim, dim)
+        self.to_kv = nn.Linear(context_dim, inner_dim * 2, bias=False)
+        self.to_q = nn.Linear(dim, inner_dim, bias=False)
+        self.to_out = (
+            nn.Sequential(nn.Linear(inner_dim, dim), nn.Dropout(dropout))
+            if project_out
+            else nn.Identity()
+        )
+    def forward(self, x, context=None):
+        context = default(context, x)
+        k, v = self.to_kv(context).chunk(2, dim=-1)
+        q = self.to_q(x)
+        q, k, v = map(lambda t: rearrange(t, "b n (h d) -> b h n d", h=self.heads), [q, k, v])
+        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale
+        attn = self.attend(dots)
+        attn = self.dropout(attn)
+        out = torch.matmul(attn, v)
+        out = rearrange(out, "b h n d -> b n (h d)")
+        return self.to_out(out)
+class Transformer(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        depth: int,
+        heads: int,
+        dim_head: int,
+        mlp_dim: int,
+        dropout: float = 0.0,
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+    ):
+        super().__init__()
+        self.layers = nn.ModuleList([])
+        for _ in range(depth):
+            sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+            ff = FeedForward(dim, mlp_dim, dropout=dropout)
+            self.layers.append(
+                nn.ModuleList(
+                    [
+                        PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+                    ]
+                )
+            )
+    def forward(self, x: torch.Tensor, *args):
+        for attn, ff in self.layers:
+            x = attn(x, *args) + x
+            x = ff(x, *args) + x
+        return x
+class TransformerCrossAttn(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        depth: int,
+        heads: int,
+        dim_head: int,
+        mlp_dim: int,
+        dropout: float = 0.0,
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+        context_dim: Optional[int] = None,
+    ):
+        super().__init__()
+        self.layers = nn.ModuleList([])
+        for _ in range(depth):
+            sa = Attention(dim, heads=heads, dim_head=dim_head, dropout=dropout)
+            ca = CrossAttention(
+                dim, context_dim=context_dim, heads=heads, dim_head=dim_head, dropout=dropout
+            )
+            ff = FeedForward(dim, mlp_dim, dropout=dropout)
+            self.layers.append(
+                nn.ModuleList(
+                    [
+                        PreNorm(dim, sa, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ca, norm=norm, norm_cond_dim=norm_cond_dim),
+                        PreNorm(dim, ff, norm=norm, norm_cond_dim=norm_cond_dim),
+                    ]
+                )
+            )
+    def forward(self, x: torch.Tensor, *args, context=None, context_list=None):
+        if context_list is None:
+            context_list = [context] * len(self.layers)
+        if len(context_list) != len(self.layers):
+            raise ValueError(f"len(context_list) != len(self.layers) ({len(context_list)} != {len(self.layers)})")
+        for i, (self_attn, cross_attn, ff) in enumerate(self.layers):
+            x = self_attn(x, *args) + x
+            x = cross_attn(x, *args, context=context_list[i]) + x
+            x = ff(x, *args) + x
+        return x
+class DropTokenDropout(nn.Module):
+    def __init__(self, p: float = 0.1):
+        super().__init__()
+        if p < 0 or p > 1:
+            raise ValueError(
+                "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+            )
+        self.p = p
+    def forward(self, x: torch.Tensor):
+        # x: (batch_size, seq_len, dim)
+        if self.training and self.p > 0:
+            zero_mask = torch.full_like(x[0, :, 0], self.p).bernoulli().bool()
+            # TODO: permutation idx for each batch using torch.argsort
+            if zero_mask.any():
+                x = x[:, ~zero_mask, :]
+        return x
+class ZeroTokenDropout(nn.Module):
+    def __init__(self, p: float = 0.1):
+        super().__init__()
+        if p < 0 or p > 1:
+            raise ValueError(
+                "dropout probability has to be between 0 and 1, " "but got {}".format(p)
+            )
+        self.p = p
+    def forward(self, x: torch.Tensor):
+        # x: (batch_size, seq_len, dim)
+        if self.training and self.p > 0:
+            zero_mask = torch.full_like(x[:, :, 0], self.p).bernoulli().bool()
+            # Zero-out the masked tokens
+            x[zero_mask, :] = 0
+        return x
+class TransformerEncoder(nn.Module):
+    def __init__(
+        self,
+        num_tokens: int,
+        token_dim: int,
+        dim: int,
+        depth: int,
+        heads: int,
+        mlp_dim: int,
+        dim_head: int = 64,
+        dropout: float = 0.0,
+        emb_dropout: float = 0.0,
+        emb_dropout_type: str = "drop",
+        emb_dropout_loc: str = "token",
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+        token_pe_numfreq: int = -1,
+    ):
+        super().__init__()
+        if token_pe_numfreq > 0:
+            token_dim_new = token_dim * (2 * token_pe_numfreq + 1)
+            self.to_token_embedding = nn.Sequential(
+                Rearrange("b n d -> (b n) d", n=num_tokens, d=token_dim),
+                FrequencyEmbedder(token_pe_numfreq, token_pe_numfreq - 1),
+                Rearrange("(b n) d -> b n d", n=num_tokens, d=token_dim_new),
+                nn.Linear(token_dim_new, dim),
+            )
+        else:
+            self.to_token_embedding = nn.Linear(token_dim, dim)
+        self.pos_embedding = nn.Parameter(torch.randn(1, num_tokens, dim))
+        if emb_dropout_type == "drop":
+            self.dropout = DropTokenDropout(emb_dropout)
+        elif emb_dropout_type == "zero":
+            self.dropout = ZeroTokenDropout(emb_dropout)
+        else:
+            raise ValueError(f"Unknown emb_dropout_type: {emb_dropout_type}")
+        self.emb_dropout_loc = emb_dropout_loc
+        self.transformer = Transformer(
+            dim, depth, heads, dim_head, mlp_dim, dropout, norm=norm, norm_cond_dim=norm_cond_dim
+        )
+    def forward(self, inp: torch.Tensor, *args, **kwargs):
+        x = inp
+        if self.emb_dropout_loc == "input":
+            x = self.dropout(x)
+        x = self.to_token_embedding(x)
+        if self.emb_dropout_loc == "token":
+            x = self.dropout(x)
+        b, n, _ = x.shape
+        x += self.pos_embedding[:, :n]
+        if self.emb_dropout_loc == "token_afterpos":
+            x = self.dropout(x)
+        x = self.transformer(x, *args)
+        return x
+class TransformerDecoder(nn.Module):
+    def __init__(
+        self,
+        num_tokens: int,
+        token_dim: int,
+        dim: int,
+        depth: int,
+        heads: int,
+        mlp_dim: int,
+        dim_head: int = 64,
+        dropout: float = 0.0,
+        emb_dropout: float = 0.0,
+        emb_dropout_type: str = 'drop',
+        norm: str = "layer",
+        norm_cond_dim: int = -1,
+        context_dim: Optional[int] = None,
+        skip_token_embedding: bool = False,
+    ):
+        super().__init__()
+        if not skip_token_embedding:
+            self.to_token_embedding = nn.Linear(token_dim, dim)
+        else:
+            self.to_token_embedding = nn.Identity()
+            if token_dim != dim:
+                raise ValueError(
+                    f"token_dim ({token_dim}) != dim ({dim}) when skip_token_embedding is True"
+                )
+        self.pos_embedding = nn.Parameter(torch.randn(1, num_tokens, dim))
+        if emb_dropout_type == "drop":
+            self.dropout = DropTokenDropout(emb_dropout)
+        elif emb_dropout_type == "zero":
+            self.dropout = ZeroTokenDropout(emb_dropout)
+        elif emb_dropout_type == "normal":
+            self.dropout = nn.Dropout(emb_dropout)
+        self.transformer = TransformerCrossAttn(
+            dim,
+            depth,
+            heads,
+            dim_head,
+            mlp_dim,
+            dropout,
+            norm=norm,
+            norm_cond_dim=norm_cond_dim,
+            context_dim=context_dim,
+        )
+    def forward(self, inp: torch.Tensor, *args, context=None, context_list=None):
+        x = self.to_token_embedding(inp)
+        b, n, _ = x.shape
+        x = self.dropout(x)
+        x += self.pos_embedding[:, :n]
+        x = self.transformer(x, *args, context=context, context_list=context_list)
+        return x

lib/models/preproc/backbone/smpl_head.py ADDED Viewed

	@@ -0,0 +1,128 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+import einops
+from configs import constants as _C
+from lib.utils.transforms import axis_angle_to_matrix
+from .pose_transformer import TransformerDecoder
+def rot6d_to_rotmat(x: torch.Tensor) -> torch.Tensor:
+    """
+    Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Args:
+        x (torch.Tensor): (B,6) Batch of 6-D rotation representations.
+    Returns:
+        torch.Tensor: Batch of corresponding rotation matrices with shape (B,3,3).
+    """
+    x = x.reshape(-1,2,3).permute(0, 2, 1).contiguous()
+    a1 = x[:, :, 0]
+    a2 = x[:, :, 1]
+    b1 = F.normalize(a1)
+    b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
+    b3 = torch.cross(b1, b2)
+    return torch.stack((b1, b2, b3), dim=-1)
+def build_smpl_head(cfg):
+    smpl_head_type = 'transformer_decoder'
+    if  smpl_head_type == 'transformer_decoder':
+        return SMPLTransformerDecoderHead(cfg)
+    else:
+        raise ValueError('Unknown SMPL head type: {}'.format(smpl_head_type))
+class SMPLTransformerDecoderHead(nn.Module):
+    """ Cross-attention based SMPL Transformer decoder
+    """
+    def __init__(self):
+        super().__init__()
+        self.joint_rep_type = '6d'
+        self.joint_rep_dim = {'6d': 6, 'aa': 3}[self.joint_rep_type]
+        npose = self.joint_rep_dim * 24
+        self.npose = npose
+        self.input_is_mean_shape = False
+        transformer_args = dict(
+            num_tokens=1,
+            token_dim=(npose + 10 + 3) if self.input_is_mean_shape else 1,
+            dim=1024,
+        )
+        transformer_args_from_cfg = dict(
+            depth=6, heads=8, mlp_dim=1024, dim_head=64, dropout=0.0, emb_dropout=0.0, norm='layer', context_dim=1280
+        )
+        transformer_args = (transformer_args | transformer_args_from_cfg)
+        self.transformer = TransformerDecoder(
+            **transformer_args
+        )
+        dim=transformer_args['dim']
+        self.decpose = nn.Linear(dim, npose)
+        self.decshape = nn.Linear(dim, 10)
+        self.deccam = nn.Linear(dim, 3)
+        mean_params = np.load(_C.BMODEL.MEAN_PARAMS)
+        init_body_pose = torch.from_numpy(mean_params['pose'].astype(np.float32)).unsqueeze(0)
+        init_betas = torch.from_numpy(mean_params['shape'].astype('float32')).unsqueeze(0)
+        init_cam = torch.from_numpy(mean_params['cam'].astype(np.float32)).unsqueeze(0)
+        self.register_buffer('init_body_pose', init_body_pose)
+        self.register_buffer('init_betas', init_betas)
+        self.register_buffer('init_cam', init_cam)
+    def forward(self, x, **kwargs):
+        batch_size = x.shape[0]
+        # vit pretrained backbone is channel-first. Change to token-first
+        init_body_pose = self.init_body_pose.expand(batch_size, -1)
+        init_betas = self.init_betas.expand(batch_size, -1)
+        init_cam = self.init_cam.expand(batch_size, -1)
+        # TODO: Convert init_body_pose to aa rep if needed
+        if self.joint_rep_type == 'aa':
+            raise NotImplementedError
+        pred_body_pose = init_body_pose
+        pred_betas = init_betas
+        pred_cam = init_cam
+        pred_body_pose_list = []
+        pred_betas_list = []
+        pred_cam_list = []
+        # Input token to transformer is zero token
+        if len(x.shape) > 2:
+            x = einops.rearrange(x, 'b c h w -> b (h w) c')
+            if self.input_is_mean_shape:
+                token = torch.cat([pred_body_pose, pred_betas, pred_cam], dim=1)[:,None,:]
+            else:
+                token = torch.zeros(batch_size, 1, 1).to(x.device)
+            # Pass through transformer
+            token_out = self.transformer(token, context=x)
+            token_out = token_out.squeeze(1) # (B, C)
+        else:
+            token_out = x
+        # Readout from token_out
+        pred_body_pose = self.decpose(token_out) + pred_body_pose
+        pred_betas = self.decshape(token_out) + pred_betas
+        pred_cam = self.deccam(token_out) + pred_cam
+        pred_body_pose_list.append(pred_body_pose)
+        pred_betas_list.append(pred_betas)
+        pred_cam_list.append(pred_cam)
+        # Convert self.joint_rep_type -> rotmat
+        joint_conversion_fn = {
+            '6d': rot6d_to_rotmat,
+            'aa': lambda x: axis_angle_to_matrix(x.view(-1, 3).contiguous())
+        }[self.joint_rep_type]
+        pred_smpl_params_list = {}
+        pred_smpl_params_list['body_pose'] = torch.cat([joint_conversion_fn(pbp).view(batch_size, -1, 3, 3)[:, 1:, :, :] for pbp in pred_body_pose_list], dim=0)
+        pred_smpl_params_list['betas'] = torch.cat(pred_betas_list, dim=0)
+        pred_smpl_params_list['cam'] = torch.cat(pred_cam_list, dim=0)
+        pred_body_pose = joint_conversion_fn(pred_body_pose).view(batch_size, 24, 3, 3)
+        pred_smpl_params = {'global_orient': pred_body_pose[:, [0]],
+                            'body_pose': pred_body_pose[:, 1:],
+                            'betas': pred_betas}
+        return pred_smpl_params, pred_cam, pred_smpl_params_list

lib/models/preproc/backbone/t_cond_mlp.py ADDED Viewed

	@@ -0,0 +1,198 @@

+import copy
+from typing import List, Optional
+import torch
+class AdaptiveLayerNorm1D(torch.nn.Module):
+    def __init__(self, data_dim: int, norm_cond_dim: int):
+        super().__init__()
+        if data_dim <= 0:
+            raise ValueError(f"data_dim must be positive, but got {data_dim}")
+        if norm_cond_dim <= 0:
+            raise ValueError(f"norm_cond_dim must be positive, but got {norm_cond_dim}")
+        self.norm = torch.nn.LayerNorm(
+            data_dim
+        )  # TODO: Check if elementwise_affine=True is correct
+        self.linear = torch.nn.Linear(norm_cond_dim, 2 * data_dim)
+        torch.nn.init.zeros_(self.linear.weight)
+        torch.nn.init.zeros_(self.linear.bias)
+    def forward(self, x: torch.Tensor, t: torch.Tensor) -> torch.Tensor:
+        # x: (batch, ..., data_dim)
+        # t: (batch, norm_cond_dim)
+        # return: (batch, data_dim)
+        x = self.norm(x)
+        alpha, beta = self.linear(t).chunk(2, dim=-1)
+        # Add singleton dimensions to alpha and beta
+        if x.dim() > 2:
+            alpha = alpha.view(alpha.shape[0], *([1] * (x.dim() - 2)), alpha.shape[1])
+            beta = beta.view(beta.shape[0], *([1] * (x.dim() - 2)), beta.shape[1])
+        return x * (1 + alpha) + beta
+class SequentialCond(torch.nn.Sequential):
+    def forward(self, input, *args, **kwargs):
+        for module in self:
+            if isinstance(module, (AdaptiveLayerNorm1D, SequentialCond, ResidualMLPBlock)):
+                # print(f'Passing on args to {module}', [a.shape for a in args])
+                input = module(input, *args, **kwargs)
+            else:
+                # print(f'Skipping passing args to {module}', [a.shape for a in args])
+                input = module(input)
+        return input
+def normalization_layer(norm: Optional[str], dim: int, norm_cond_dim: int = -1):
+    if norm == "batch":
+        return torch.nn.BatchNorm1d(dim)
+    elif norm == "layer":
+        return torch.nn.LayerNorm(dim)
+    elif norm == "ada":
+        assert norm_cond_dim > 0, f"norm_cond_dim must be positive, got {norm_cond_dim}"
+        return AdaptiveLayerNorm1D(dim, norm_cond_dim)
+    elif norm is None:
+        return torch.nn.Identity()
+    else:
+        raise ValueError(f"Unknown norm: {norm}")
+def linear_norm_activ_dropout(
+    input_dim: int,
+    output_dim: int,
+    activation: torch.nn.Module = torch.nn.ReLU(),
+    bias: bool = True,
+    norm: Optional[str] = "layer",  # Options: ada/batch/layer
+    dropout: float = 0.0,
+    norm_cond_dim: int = -1,
+) -> SequentialCond:
+    layers = []
+    layers.append(torch.nn.Linear(input_dim, output_dim, bias=bias))
+    if norm is not None:
+        layers.append(normalization_layer(norm, output_dim, norm_cond_dim))
+    layers.append(copy.deepcopy(activation))
+    if dropout > 0.0:
+        layers.append(torch.nn.Dropout(dropout))
+    return SequentialCond(*layers)
+def create_simple_mlp(
+    input_dim: int,
+    hidden_dims: List[int],
+    output_dim: int,
+    activation: torch.nn.Module = torch.nn.ReLU(),
+    bias: bool = True,
+    norm: Optional[str] = "layer",  # Options: ada/batch/layer
+    dropout: float = 0.0,
+    norm_cond_dim: int = -1,
+) -> SequentialCond:
+    layers = []
+    prev_dim = input_dim
+    for hidden_dim in hidden_dims:
+        layers.extend(
+            linear_norm_activ_dropout(
+                prev_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+            )
+        )
+        prev_dim = hidden_dim
+    layers.append(torch.nn.Linear(prev_dim, output_dim, bias=bias))
+    return SequentialCond(*layers)
+class ResidualMLPBlock(torch.nn.Module):
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dim: int,
+        num_hidden_layers: int,
+        output_dim: int,
+        activation: torch.nn.Module = torch.nn.ReLU(),
+        bias: bool = True,
+        norm: Optional[str] = "layer",  # Options: ada/batch/layer
+        dropout: float = 0.0,
+        norm_cond_dim: int = -1,
+    ):
+        super().__init__()
+        if not (input_dim == output_dim == hidden_dim):
+            raise NotImplementedError(
+                f"input_dim {input_dim} != output_dim {output_dim} is not implemented"
+            )
+        layers = []
+        prev_dim = input_dim
+        for i in range(num_hidden_layers):
+            layers.append(
+                linear_norm_activ_dropout(
+                    prev_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+                )
+            )
+            prev_dim = hidden_dim
+        self.model = SequentialCond(*layers)
+        self.skip = torch.nn.Identity()
+    def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        return x + self.model(x, *args, **kwargs)
+class ResidualMLP(torch.nn.Module):
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dim: int,
+        num_hidden_layers: int,
+        output_dim: int,
+        activation: torch.nn.Module = torch.nn.ReLU(),
+        bias: bool = True,
+        norm: Optional[str] = "layer",  # Options: ada/batch/layer
+        dropout: float = 0.0,
+        num_blocks: int = 1,
+        norm_cond_dim: int = -1,
+    ):
+        super().__init__()
+        self.input_dim = input_dim
+        self.model = SequentialCond(
+            linear_norm_activ_dropout(
+                input_dim, hidden_dim, activation, bias, norm, dropout, norm_cond_dim
+            ),
+            *[
+                ResidualMLPBlock(
+                    hidden_dim,
+                    hidden_dim,
+                    num_hidden_layers,
+                    hidden_dim,
+                    activation,
+                    bias,
+                    norm,
+                    dropout,
+                    norm_cond_dim,
+                )
+                for _ in range(num_blocks)
+            ],
+            torch.nn.Linear(hidden_dim, output_dim, bias=bias),
+        )
+    def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
+        return self.model(x, *args, **kwargs)
+class FrequencyEmbedder(torch.nn.Module):
+    def __init__(self, num_frequencies, max_freq_log2):
+        super().__init__()
+        frequencies = 2 ** torch.linspace(0, max_freq_log2, steps=num_frequencies)
+        self.register_buffer("frequencies", frequencies)
+    def forward(self, x):
+        # x should be of size (N,) or (N, D)
+        N = x.size(0)
+        if x.dim() == 1:  # (N,)
+            x = x.unsqueeze(1)  # (N, D) where D=1
+        x_unsqueezed = x.unsqueeze(-1)  # (N, D, 1)
+        scaled = self.frequencies.view(1, 1, -1) * x_unsqueezed  # (N, D, num_frequencies)
+        s = torch.sin(scaled)
+        c = torch.cos(scaled)
+        embedded = torch.cat([s, c, x_unsqueezed], dim=-1).view(
+            N, -1
+        )  # (N, D * 2 * num_frequencies + D)
+        return embedded

lib/models/preproc/backbone/utils.py ADDED Viewed

	@@ -0,0 +1,115 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import os
+import os.path as osp
+from collections import OrderedDict
+import cv2
+import numpy as np
+from skimage.filters import gaussian
+def get_transform(center, scale, res, rot=0):
+    """Generate transformation matrix."""
+    # res: (height, width), (rows, cols)
+    crop_aspect_ratio = res[0] / float(res[1])
+    h = 200 * scale
+    w = h / crop_aspect_ratio
+    t = np.zeros((3, 3))
+    t[0, 0] = float(res[1]) / w
+    t[1, 1] = float(res[0]) / h
+    t[0, 2] = res[1] * (-float(center[0]) / w + .5)
+    t[1, 2] = res[0] * (-float(center[1]) / h + .5)
+    t[2, 2] = 1
+    if not rot == 0:
+        rot = -rot  # To match direction of rotation from cropping
+        rot_mat = np.zeros((3, 3))
+        rot_rad = rot * np.pi / 180
+        sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+        rot_mat[0, :2] = [cs, -sn]
+        rot_mat[1, :2] = [sn, cs]
+        rot_mat[2, 2] = 1
+        # Need to rotate around center
+        t_mat = np.eye(3)
+        t_mat[0, 2] = -res[1] / 2
+        t_mat[1, 2] = -res[0] / 2
+        t_inv = t_mat.copy()
+        t_inv[:2, 2] *= -1
+        t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t)))
+    return t
+def transform(pt, center, scale, res, invert=0, rot=0):
+    """Transform pixel location to different reference."""
+    t = get_transform(center, scale, res, rot=rot)
+    if invert:
+        t = np.linalg.inv(t)
+    new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return np.array([round(new_pt[0]), round(new_pt[1])], dtype=int) + 1
+def crop(img, center, scale, res):
+    """
+    Crop image according to the supplied bounding box.
+    res: [rows, cols]
+    """
+    # Upper left point
+    ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+    # Bottom right point
+    br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+    new_shape = [br[1] - ul[1], br[0] - ul[0]]
+    if len(img.shape) > 2:
+        new_shape += [img.shape[2]]
+    new_img = np.zeros(new_shape, dtype=np.float32)
+    # Range to fill new array
+    new_x = max(0, -ul[0]), min(br[0], len(img[0])) - ul[0]
+    new_y = max(0, -ul[1]), min(br[1], len(img)) - ul[1]
+    # Range to sample from original image
+    old_x = max(0, ul[0]), min(len(img[0]), br[0])
+    old_y = max(0, ul[1]), min(len(img), br[1])
+    try:
+        new_img[new_y[0]:new_y[1], new_x[0]:new_x[1]] = img[old_y[0]:old_y[1], old_x[0]:old_x[1]]
+    except Exception as e:
+        print(e)
+    new_img = cv2.resize(new_img, (res[1], res[0]))  # (cols, rows)
+    return new_img, ul, br
+def process_image(orig_img_rgb, center, scale, crop_height=256, crop_width=192, blur=False, do_crop=True):
+    """
+    Read image, do preprocessing and possibly crop it according to the bounding box.
+    If there are bounding box annotations, use them to crop the image.
+    If no bounding box is specified but openpose detections are available, use them to get the bounding box.
+    """
+    if blur:
+        # Blur image to avoid aliasing artifacts
+        downsampling_factor = ((scale * 200 * 1.0) / crop_height)
+        downsampling_factor = downsampling_factor / 2.0
+        if downsampling_factor > 1.1:
+            orig_img_rgb  = gaussian(orig_img_rgb, sigma=(downsampling_factor-1)/2, channel_axis=2, preserve_range=True)
+    IMG_NORM_MEAN = [0.485, 0.456, 0.406]
+    IMG_NORM_STD = [0.229, 0.224, 0.225]
+    if do_crop:
+        img, ul, br = crop(orig_img_rgb, center, scale, (crop_height, crop_width))
+    else:
+        img = orig_img_rgb.copy()
+    crop_img = img.copy()
+    img = img / 255.
+    mean = np.array(IMG_NORM_MEAN, dtype=np.float32)
+    std = np.array(IMG_NORM_STD, dtype=np.float32)
+    norm_img = (img - mean) / std
+    norm_img = np.transpose(norm_img, (2, 0, 1))
+    return norm_img, crop_img

lib/models/preproc/backbone/vit.py ADDED Viewed

	@@ -0,0 +1,348 @@

+# Copyright (c) OpenMMLab. All rights reserved.
+import math
+import torch
+from functools import partial
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.utils.checkpoint as checkpoint
+from timm.models.layers import drop_path, to_2tuple, trunc_normal_
+def vit():
+    return ViT(
+                img_size=(256, 192),
+                patch_size=16,
+                embed_dim=1280,
+                depth=32,
+                num_heads=16,
+                ratio=1,
+                use_checkpoint=False,
+                mlp_ratio=4,
+                qkv_bias=True,
+                drop_path_rate=0.55,
+            )
+def get_abs_pos(abs_pos, h, w, ori_h, ori_w, has_cls_token=True):
+    """
+    Calculate absolute positional embeddings. If needed, resize embeddings and remove cls_token
+        dimension for the original embeddings.
+    Args:
+        abs_pos (Tensor): absolute positional embeddings with (1, num_position, C).
+        has_cls_token (bool): If true, has 1 embedding in abs_pos for cls token.
+        hw (Tuple): size of input image tokens.
+    Returns:
+        Absolute positional embeddings after processing with shape (1, H, W, C)
+    """
+    cls_token = None
+    B, L, C = abs_pos.shape
+    if has_cls_token:
+        cls_token = abs_pos[:, 0:1]
+        abs_pos = abs_pos[:, 1:]
+    if ori_h != h or ori_w != w:
+        new_abs_pos = F.interpolate(
+            abs_pos.reshape(1, ori_h, ori_w, -1).permute(0, 3, 1, 2),
+            size=(h, w),
+            mode="bicubic",
+            align_corners=False,
+        ).permute(0, 2, 3, 1).reshape(B, -1, C)
+    else:
+        new_abs_pos = abs_pos
+    if cls_token is not None:
+        new_abs_pos = torch.cat([cls_token, new_abs_pos], dim=1)
+    return new_abs_pos
+class DropPath(nn.Module):
+    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
+    """
+    def __init__(self, drop_prob=None):
+        super(DropPath, self).__init__()
+        self.drop_prob = drop_prob
+    def forward(self, x):
+        return drop_path(x, self.drop_prob, self.training)
+    def extra_repr(self):
+        return 'p={}'.format(self.drop_prob)
+class Mlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+class Attention(nn.Module):
+    def __init__(
+            self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0.,
+            proj_drop=0., attn_head_dim=None,):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.dim = dim
+        if attn_head_dim is not None:
+            head_dim = attn_head_dim
+        all_head_dim = head_dim * self.num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+        self.qkv = nn.Linear(dim, all_head_dim * 3, bias=qkv_bias)
+        self.attn_drop = nn.Dropout(attn_drop)
+        self.proj = nn.Linear(all_head_dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+    def forward(self, x):
+        B, N, C = x.shape
+        qkv = self.qkv(x)
+        qkv = qkv.reshape(B, N, 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)
+        q = q * self.scale
+        attn = (q @ k.transpose(-2, -1))
+        attn = attn.softmax(dim=-1)
+        attn = self.attn_drop(attn)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, -1)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+class Block(nn.Module):
+    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None,
+                 drop=0., attn_drop=0., drop_path=0., act_layer=nn.GELU,
+                 norm_layer=nn.LayerNorm, attn_head_dim=None
+                 ):
+        super().__init__()
+        self.norm1 = norm_layer(dim)
+        self.attn = Attention(
+            dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
+            attn_drop=attn_drop, proj_drop=drop, attn_head_dim=attn_head_dim
+            )
+        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = norm_layer(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+    def forward(self, x):
+        x = x + self.drop_path(self.attn(self.norm1(x)))
+        x = x + self.drop_path(self.mlp(self.norm2(x)))
+        return x
+class PatchEmbed(nn.Module):
+    """ Image to Patch Embedding
+    """
+    def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, ratio=1):
+        super().__init__()
+        img_size = to_2tuple(img_size)
+        patch_size = to_2tuple(patch_size)
+        num_patches = (img_size[1] // patch_size[1]) * (img_size[0] // patch_size[0]) * (ratio ** 2)
+        self.patch_shape = (int(img_size[0] // patch_size[0] * ratio), int(img_size[1] // patch_size[1] * ratio))
+        self.origin_patch_shape = (int(img_size[0] // patch_size[0]), int(img_size[1] // patch_size[1]))
+        self.img_size = img_size
+        self.patch_size = patch_size
+        self.num_patches = num_patches
+        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=(patch_size[0] // ratio), padding=4 + 2 * (ratio//2-1))
+    def forward(self, x, **kwargs):
+        B, C, H, W = x.shape
+        x = self.proj(x)
+        Hp, Wp = x.shape[2], x.shape[3]
+        x = x.flatten(2).transpose(1, 2)
+        return x, (Hp, Wp)
+class HybridEmbed(nn.Module):
+    """ CNN Feature Map Embedding
+    Extract feature map from CNN, flatten, project to embedding dim.
+    """
+    def __init__(self, backbone, img_size=224, feature_size=None, in_chans=3, embed_dim=768):
+        super().__init__()
+        assert isinstance(backbone, nn.Module)
+        img_size = to_2tuple(img_size)
+        self.img_size = img_size
+        self.backbone = backbone
+        if feature_size is None:
+            with torch.no_grad():
+                training = backbone.training
+                if training:
+                    backbone.eval()
+                o = self.backbone(torch.zeros(1, in_chans, img_size[0], img_size[1]))[-1]
+                feature_size = o.shape[-2:]
+                feature_dim = o.shape[1]
+                backbone.train(training)
+        else:
+            feature_size = to_2tuple(feature_size)
+            feature_dim = self.backbone.feature_info.channels()[-1]
+        self.num_patches = feature_size[0] * feature_size[1]
+        self.proj = nn.Linear(feature_dim, embed_dim)
+    def forward(self, x):
+        x = self.backbone(x)[-1]
+        x = x.flatten(2).transpose(1, 2)
+        x = self.proj(x)
+        return x
+class ViT(nn.Module):
+    def __init__(self,
+                 img_size=224, patch_size=16, in_chans=3, num_classes=80, embed_dim=768, depth=12,
+                 num_heads=12, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
+                 drop_path_rate=0., hybrid_backbone=None, norm_layer=None, use_checkpoint=False,
+                 frozen_stages=-1, ratio=1, last_norm=True,
+                 patch_padding='pad', freeze_attn=False, freeze_ffn=False,
+                 ):
+        # Protect mutable default arguments
+        super(ViT, self).__init__()
+        norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)
+        self.num_classes = num_classes
+        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models
+        self.frozen_stages = frozen_stages
+        self.use_checkpoint = use_checkpoint
+        self.patch_padding = patch_padding
+        self.freeze_attn = freeze_attn
+        self.freeze_ffn = freeze_ffn
+        self.depth = depth
+        if hybrid_backbone is not None:
+            self.patch_embed = HybridEmbed(
+                hybrid_backbone, img_size=img_size, in_chans=in_chans, embed_dim=embed_dim)
+        else:
+            self.patch_embed = PatchEmbed(
+                img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim, ratio=ratio)
+        num_patches = self.patch_embed.num_patches
+        # since the pretraining model has class token
+        self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))
+        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay rule
+        self.blocks = nn.ModuleList([
+            Block(
+                dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
+                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,
+                )
+            for i in range(depth)])
+        self.last_norm = norm_layer(embed_dim) if last_norm else nn.Identity()
+        if self.pos_embed is not None:
+            trunc_normal_(self.pos_embed, std=.02)
+        self._freeze_stages()
+    def _freeze_stages(self):
+        """Freeze parameters."""
+        if self.frozen_stages >= 0:
+            self.patch_embed.eval()
+            for param in self.patch_embed.parameters():
+                param.requires_grad = False
+        for i in range(1, self.frozen_stages + 1):
+            m = self.blocks[i]
+            m.eval()
+            for param in m.parameters():
+                param.requires_grad = False
+        if self.freeze_attn:
+            for i in range(0, self.depth):
+                m = self.blocks[i]
+                m.attn.eval()
+                m.norm1.eval()
+                for param in m.attn.parameters():
+                    param.requires_grad = False
+                for param in m.norm1.parameters():
+                    param.requires_grad = False
+        if self.freeze_ffn:
+            self.pos_embed.requires_grad = False
+            self.patch_embed.eval()
+            for param in self.patch_embed.parameters():
+                param.requires_grad = False
+            for i in range(0, self.depth):
+                m = self.blocks[i]
+                m.mlp.eval()
+                m.norm2.eval()
+                for param in m.mlp.parameters():
+                    param.requires_grad = False
+                for param in m.norm2.parameters():
+                    param.requires_grad = False
+    def init_weights(self):
+        """Initialize the weights in backbone.
+        Args:
+            pretrained (str, optional): Path to pre-trained weights.
+                Defaults to None.
+        """
+        def _init_weights(m):
+            if isinstance(m, nn.Linear):
+                trunc_normal_(m.weight, std=.02)
+                if isinstance(m, nn.Linear) and m.bias is not None:
+                    nn.init.constant_(m.bias, 0)
+            elif isinstance(m, nn.LayerNorm):
+                nn.init.constant_(m.bias, 0)
+                nn.init.constant_(m.weight, 1.0)
+        self.apply(_init_weights)
+    def get_num_layers(self):
+        return len(self.blocks)
+    @torch.jit.ignore
+    def no_weight_decay(self):
+        return {'pos_embed', 'cls_token'}
+    def forward_features(self, x):
+        B, C, H, W = x.shape
+        x, (Hp, Wp) = self.patch_embed(x)
+        if self.pos_embed is not None:
+            # fit for multiple GPU training
+            # since the first element for pos embed (sin-cos manner) is zero, it will cause no difference
+            x = x + self.pos_embed[:, 1:] + self.pos_embed[:, :1]
+        for blk in self.blocks:
+            if self.use_checkpoint:
+                x = checkpoint.checkpoint(blk, x)
+            else:
+                x = blk(x)
+        x = self.last_norm(x)
+        xp = x.permute(0, 2, 1).reshape(B, -1, Hp, Wp).contiguous()
+        return xp
+    def forward(self, x):
+        x = self.forward_features(x)
+        return x
+    def train(self, mode=True):
+        """Convert the model into training mode."""
+        super().train(mode)
+        self._freeze_stages()

lib/models/preproc/detector.py ADDED Viewed

	@@ -0,0 +1,146 @@

+from __future__ import annotations
+import os
+import os.path as osp
+from collections import defaultdict
+import numpy as np
+import torch
+import torch.nn as nn
+import scipy.signal as signal
+from progress.bar import Bar
+from ultralytics import YOLO
+from mmpose.apis import (
+    inference_top_down_pose_model,
+    init_pose_model,
+    get_track_id,
+    vis_pose_result,
+)
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+VIT_DIR = osp.join(ROOT_DIR, "third-party/ViTPose")
+VIS_THRESH = 0.3
+BBOX_CONF = 0.5
+TRACKING_THR = 0.1
+MINIMUM_FRMAES = 30
+MINIMUM_JOINTS = 6
+class DetectionModel(object):
+    def __init__(self, device):
+        # ViTPose
+        pose_model_cfg = osp.join(VIT_DIR, 'configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/ViTPose_huge_coco_256x192.py')
+        pose_model_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'vitpose-h-multi-coco.pth')
+        self.pose_model = init_pose_model(pose_model_cfg, pose_model_ckpt, device=device.lower())
+        # YOLO
+        bbox_model_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'yolov8x.pt')
+        self.bbox_model = YOLO(bbox_model_ckpt)
+        self.device = device
+        self.initialize_tracking()
+    def initialize_tracking(self, ):
+        self.next_id = 0
+        self.frame_id = 0
+        self.pose_results_last = []
+        self.tracking_results = {
+            'id': [],
+            'frame_id': [],
+            'bbox': [],
+            'keypoints': []
+        }
+    def xyxy_to_cxcys(self, bbox, s_factor=1.05):
+        cx, cy = bbox[[0, 2]].mean(), bbox[[1, 3]].mean()
+        scale = max(bbox[2] - bbox[0], bbox[3] - bbox[1]) / 200 * s_factor
+        return np.array([[cx, cy, scale]])
+    def compute_bboxes_from_keypoints(self, s_factor=1.2):
+        X = self.tracking_results['keypoints'].copy()
+        mask = X[..., -1] > VIS_THRESH
+        bbox = np.zeros((len(X), 3))
+        for i, (kp, m) in enumerate(zip(X, mask)):
+            bb = [kp[m, 0].min(), kp[m, 1].min(),
+                  kp[m, 0].max(), kp[m, 1].max()]
+            cx, cy = [(bb[2]+bb[0])/2, (bb[3]+bb[1])/2]
+            bb_w = bb[2] - bb[0]
+            bb_h = bb[3] - bb[1]
+            s = np.stack((bb_w, bb_h)).max()
+            bb = np.array((cx, cy, s))
+            bbox[i] = bb
+        bbox[:, 2] = bbox[:, 2] * s_factor / 200.0
+        self.tracking_results['bbox'] = bbox
+    def track(self, img, fps, length):
+        # bbox detection
+        bboxes = self.bbox_model.predict(
+            img, device=self.device, classes=0, conf=BBOX_CONF, save=False, verbose=False
+        )[0].boxes.xyxy.detach().cpu().numpy()
+        bboxes = [{'bbox': bbox} for bbox in bboxes]
+        # keypoints detection
+        pose_results, returned_outputs = inference_top_down_pose_model(
+            self.pose_model,
+            img,
+            person_results=bboxes,
+            format='xyxy',
+            return_heatmap=False,
+            outputs=None)
+        # person identification
+        pose_results, self.next_id = get_track_id(
+            pose_results,
+            self.pose_results_last,
+            self.next_id,
+            use_oks=False,
+            tracking_thr=TRACKING_THR,
+            use_one_euro=True,
+            fps=fps)
+        for pose_result in pose_results:
+            n_valid = (pose_result['keypoints'][:, -1] > VIS_THRESH).sum()
+            if n_valid < MINIMUM_JOINTS: continue
+            _id = pose_result['track_id']
+            xyxy = pose_result['bbox']
+            bbox = self.xyxy_to_cxcys(xyxy)
+            self.tracking_results['id'].append(_id)
+            self.tracking_results['frame_id'].append(self.frame_id)
+            self.tracking_results['bbox'].append(bbox)
+            self.tracking_results['keypoints'].append(pose_result['keypoints'])
+        self.frame_id += 1
+        self.pose_results_last = pose_results
+    def process(self, fps):
+        for key in ['id', 'frame_id', 'keypoints']:
+            self.tracking_results[key] = np.array(self.tracking_results[key])
+        self.compute_bboxes_from_keypoints()
+        output = defaultdict(lambda: defaultdict(list))
+        ids = np.unique(self.tracking_results['id'])
+        for _id in ids:
+            idxs = np.where(self.tracking_results['id'] == _id)[0]
+            for key, val in self.tracking_results.items():
+                if key == 'id': continue
+                output[_id][key] = val[idxs]
+        # Smooth bounding box detection
+        ids = list(output.keys())
+        for _id in ids:
+            if len(output[_id]['bbox']) < MINIMUM_FRMAES:
+                del output[_id]
+                continue
+            kernel = int(int(fps/2) / 2) * 2 + 1
+            smoothed_bbox = np.array([signal.medfilt(param, kernel) for param in output[_id]['bbox'].T]).T
+            output[_id]['bbox'] = smoothed_bbox
+        return output

lib/models/preproc/extractor.py ADDED Viewed

	@@ -0,0 +1,112 @@

+from __future__ import annotations
+import os
+import os.path as osp
+from collections import defaultdict
+import cv2
+import torch
+import numpy as np
+import scipy.signal as signal
+from progress.bar import Bar
+from scipy.ndimage.filters import gaussian_filter1d
+from configs import constants as _C
+from .backbone.hmr2 import hmr2
+from .backbone.utils import process_image
+from ...utils.imutils import flip_kp, flip_bbox
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+class FeatureExtractor(object):
+    def __init__(self, device, flip_eval=False, max_batch_size=64):
+        self.device = device
+        self.flip_eval = flip_eval
+        self.max_batch_size = max_batch_size
+        ckpt = osp.join(ROOT_DIR, 'checkpoints', 'hmr2a.ckpt')
+        self.model = hmr2(ckpt).to(device).eval()
+    def run(self, video, tracking_results, patch_h=256, patch_w=256):
+        if osp.isfile(video):
+            cap = cv2.VideoCapture(video)
+            is_video = True
+            length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+            width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+        else:   # Image list
+            cap = video
+            is_video = False
+            length = len(video)
+            height, width = cv2.imread(video[0]).shape[:2]
+        frame_id = 0
+        bar = Bar('Feature extraction ...', fill='#', max=length)
+        while True:
+            if is_video:
+                flag, img = cap.read()
+                if not flag:
+                    break
+            else:
+                if frame_id >= len(cap):
+                    break
+                img = cv2.imread(cap[frame_id])
+            for _id, val in tracking_results.items():
+                if not frame_id in val['frame_id']: continue
+                frame_id2 = np.where(val['frame_id'] == frame_id)[0][0]
+                bbox = val['bbox'][frame_id2]
+                cx, cy, scale = bbox
+                norm_img, crop_img = process_image(img[..., ::-1], [cx, cy], scale, patch_h, patch_w)
+                norm_img = torch.from_numpy(norm_img).unsqueeze(0).to(self.device)
+                feature = self.model(norm_img, encode=True)
+                tracking_results[_id]['features'].append(feature.cpu())
+                if frame_id2 == 0: # First frame of this subject
+                    tracking_results = self.predict_init(norm_img, tracking_results, _id, flip_eval=False)
+                if self.flip_eval:
+                    flipped_bbox = flip_bbox(bbox, width, height)
+                    tracking_results[_id]['flipped_bbox'].append(flipped_bbox)
+                    keypoints = val['keypoints'][frame_id2]
+                    flipped_keypoints = flip_kp(keypoints, width)
+                    tracking_results[_id]['flipped_keypoints'].append(flipped_keypoints)
+                    flipped_features = self.model(torch.flip(norm_img, (3, )), encode=True)
+                    tracking_results[_id]['flipped_features'].append(flipped_features.cpu())
+                    if frame_id2 == 0:
+                        tracking_results = self.predict_init(torch.flip(norm_img, (3, )), tracking_results, _id, flip_eval=True)
+            bar.next()
+            frame_id += 1
+        return self.process(tracking_results)
+    def predict_init(self, norm_img, tracking_results, _id, flip_eval=False):
+        prefix = 'flipped_' if flip_eval else ''
+        pred_global_orient, pred_body_pose, pred_betas, _ = self.model(norm_img, encode=False)
+        tracking_results[_id][prefix + 'init_global_orient'] = pred_global_orient.cpu()
+        tracking_results[_id][prefix + 'init_body_pose'] = pred_body_pose.cpu()
+        tracking_results[_id][prefix + 'init_betas'] = pred_betas.cpu()
+        return tracking_results
+    def process(self, tracking_results):
+        output = defaultdict(dict)
+        for _id, results in tracking_results.items():
+            for key, val in results.items():
+                if isinstance(val, list):
+                    if isinstance(val[0], torch.Tensor):
+                        val = torch.cat(val)
+                    elif isinstance(val[0], np.ndarray):
+                        val = np.array(val)
+                output[_id][key] = val
+        return output

lib/models/preproc/slam.py ADDED Viewed

	@@ -0,0 +1,70 @@

+import cv2
+import numpy as np
+import glob
+import os.path as osp
+import os
+import time
+import torch
+from pathlib import Path
+from multiprocessing import Process, Queue
+from dpvo.utils import Timer
+from dpvo.dpvo import DPVO
+from dpvo.config import cfg
+from dpvo.stream import image_stream, video_stream
+ROOT_DIR = osp.abspath(f"{__file__}/../../../../")
+DPVO_DIR = osp.join(ROOT_DIR, "third-party/DPVO")
+class SLAMModel(object):
+    def __init__(self, video, output_pth, width, height, calib=None, stride=1, skip=0, buffer=2048):
+        if calib == None or not osp.exists(calib):
+            calib = osp.join(output_pth, 'calib.txt')
+        if not osp.exists(calib):
+            self.estimate_intrinsics(width, height, calib)
+        self.dpvo_cfg = osp.join(DPVO_DIR, 'config/default.yaml')
+        self.dpvo_ckpt = osp.join(ROOT_DIR, 'checkpoints', 'dpvo.pth')
+        self.buffer = buffer
+        self.times = []
+        self.slam = None
+        self.queue = Queue(maxsize=8)
+        self.reader = Process(target=video_stream, args=(self.queue, video, calib, stride, skip))
+        self.reader.start()
+    def estimate_intrinsics(self, width, height, calib):
+        focal_length = (height ** 2 + width ** 2) ** 0.5
+        center_x = width / 2
+        center_y = height / 2
+        with open(calib, 'w') as fopen:
+            line = f'{focal_length} {focal_length} {center_x} {center_y}'
+            fopen.write(line)
+    def track(self, ):
+        (t, image, intrinsics) = self.queue.get()
+        if t < 0: return
+        image = torch.from_numpy(image).permute(2,0,1).cuda()
+        intrinsics = torch.from_numpy(intrinsics).cuda()
+        if self.slam is None:
+            cfg.merge_from_file(self.dpvo_cfg)
+            cfg.BUFFER_SIZE = self.buffer
+            self.slam = DPVO(cfg, self.dpvo_ckpt, ht=image.shape[1], wd=image.shape[2], viz=False)
+        with Timer("SLAM", enabled=False):
+            t = time.time()
+            self.slam(t, image, intrinsics)
+            self.times.append(time.time() - t)
+    def process(self, ):
+        for _ in range(12):
+            self.slam.update()
+        self.reader.join()
+        return self.slam.terminate()[0]

lib/models/smpl.py ADDED Viewed

	@@ -0,0 +1,264 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import os, sys
+import torch
+import numpy as np
+from lib.utils import transforms
+from smplx import SMPL as _SMPL
+from smplx.utils import SMPLOutput as ModelOutput
+from smplx.lbs import vertices2joints
+from configs import constants as _C
+class SMPL(_SMPL):
+    """ Extension of the official SMPL implementation to support more joints """
+    def __init__(self, *args, **kwargs):
+        sys.stdout = open(os.devnull, 'w')
+        super(SMPL, self).__init__(*args, **kwargs)
+        sys.stdout = sys.__stdout__
+        J_regressor_wham = np.load(_C.BMODEL.JOINTS_REGRESSOR_WHAM)
+        J_regressor_eval = np.load(_C.BMODEL.JOINTS_REGRESSOR_H36M)
+        self.register_buffer('J_regressor_wham', torch.tensor(
+            J_regressor_wham, dtype=torch.float32))
+        self.register_buffer('J_regressor_eval', torch.tensor(
+            J_regressor_eval, dtype=torch.float32))
+        self.register_buffer('J_regressor_feet', torch.from_numpy(
+            np.load(_C.BMODEL.JOINTS_REGRESSOR_FEET)
+        ).float())
+    def get_local_pose_from_reduced_global_pose(self, reduced_pose):
+        full_pose = torch.eye(
+            3, device=reduced_pose.device
+        )[(None, ) * 2].repeat(reduced_pose.shape[0], 24, 1, 1)
+        full_pose[:, _C.BMODEL.MAIN_JOINTS] = reduced_pose
+        return full_pose
+    def forward(self,
+                pred_rot6d,
+                betas,
+                cam=None,
+                cam_intrinsics=None,
+                bbox=None,
+                res=None,
+                return_full_pose=False,
+                **kwargs):
+        rotmat = transforms.rotation_6d_to_matrix(pred_rot6d.reshape(*pred_rot6d.shape[:2], -1, 6)
+        ).reshape(-1, 24, 3, 3)
+        output = self.get_output(body_pose=rotmat[:, 1:],
+                                 global_orient=rotmat[:, :1],
+                                 betas=betas.view(-1, 10),
+                                 pose2rot=False,
+                                 return_full_pose=return_full_pose)
+        if cam is not None:
+            joints3d = output.joints.reshape(*cam.shape[:2], -1, 3)
+            # Weak perspective projection (for InstaVariety)
+            weak_cam = convert_weak_perspective_to_perspective(cam)
+            weak_joints2d = weak_perspective_projection(
+                joints3d,
+                rotation=torch.eye(3, device=cam.device).unsqueeze(0).unsqueeze(0).expand(*cam.shape[:2], -1, -1),
+                translation=weak_cam,
+                focal_length=5000.,
+                camera_center=torch.zeros(*cam.shape[:2], 2, device=cam.device)
+            )
+            output.weak_joints2d = weak_joints2d
+            # Full perspective projection
+            full_cam = convert_pare_to_full_img_cam(
+                cam,
+                bbox[:, :, 2] * 200.,
+                bbox[:, :, :2],
+                res[:, 0].unsqueeze(-1),
+                res[:, 1].unsqueeze(-1),
+                focal_length=cam_intrinsics[:, :, 0, 0]
+            )
+            full_joints2d = full_perspective_projection(
+                joints3d,
+                translation=full_cam,
+                cam_intrinsics=cam_intrinsics,
+            )
+            output.full_joints2d = full_joints2d
+            output.full_cam = full_cam.reshape(-1, 3)
+        return output
+    def forward_nd(self,
+                pred_rot6d,
+                root,
+                betas,
+                return_full_pose=False):
+        rotmat = transforms.rotation_6d_to_matrix(pred_rot6d.reshape(*pred_rot6d.shape[:2], -1, 6)
+        ).reshape(-1, 24, 3, 3)
+        output = self.get_output(body_pose=rotmat[:, 1:],
+                                 global_orient=root.reshape(-1, 1, 3, 3),
+                                 betas=betas.view(-1, 10),
+                                 pose2rot=False,
+                                 return_full_pose=return_full_pose)
+        return output
+    def get_output(self, *args, **kwargs):
+        kwargs['get_skin'] = True
+        smpl_output = super(SMPL, self).forward(*args, **kwargs)
+        joints = vertices2joints(self.J_regressor_wham, smpl_output.vertices)
+        feet = vertices2joints(self.J_regressor_feet, smpl_output.vertices)
+        offset = joints[..., [11, 12], :].mean(-2)
+        if 'transl' in kwargs:
+            offset = offset - kwargs['transl']
+        vertices = smpl_output.vertices - offset.unsqueeze(-2)
+        joints = joints - offset.unsqueeze(-2)
+        feet = feet - offset.unsqueeze(-2)
+        output = ModelOutput(vertices=vertices,
+                             global_orient=smpl_output.global_orient,
+                             body_pose=smpl_output.body_pose,
+                             joints=joints,
+                             betas=smpl_output.betas,
+                             full_pose=smpl_output.full_pose)
+        output.feet = feet
+        output.offset = offset
+        return output
+    def get_offset(self, *args, **kwargs):
+        kwargs['get_skin'] = True
+        smpl_output = super(SMPL, self).forward(*args, **kwargs)
+        joints = vertices2joints(self.J_regressor_wham, smpl_output.vertices)
+        offset = joints[..., [11, 12], :].mean(-2)
+        return offset
+    def get_faces(self):
+        return np.array(self.faces)
+def convert_weak_perspective_to_perspective(
+        weak_perspective_camera,
+        focal_length=5000.,
+        img_res=224,
+):
+    perspective_camera = torch.stack(
+        [
+            weak_perspective_camera[..., 1],
+            weak_perspective_camera[..., 2],
+            2 * focal_length / (img_res * weak_perspective_camera[..., 0] + 1e-9)
+        ],
+        dim=-1
+    )
+    return perspective_camera
+def weak_perspective_projection(
+        points,
+        rotation,
+        translation,
+        focal_length,
+        camera_center,
+        img_res=224,
+        normalize_joints2d=True,
+):
+    """
+    This function computes the perspective projection of a set of points.
+    Input:
+        points (b, f, N, 3): 3D points
+        rotation (b, f, 3, 3): Camera rotation
+        translation (b, f, 3): Camera translation
+        focal_length (b, f,) or scalar: Focal length
+        camera_center (b, f, 2): Camera center
+    """
+    K = torch.zeros([*points.shape[:2], 3, 3], device=points.device)
+    K[:,:,0,0] = focal_length
+    K[:,:,1,1] = focal_length
+    K[:,:,2,2] = 1.
+    K[:,:,:-1, -1] = camera_center
+    # Transform points
+    points = torch.einsum('bfij,bfkj->bfki', rotation, points)
+    points = points + translation.unsqueeze(-2)
+    # Apply perspective distortion
+    projected_points = points / points[...,-1].unsqueeze(-1)
+    # Apply camera intrinsics
+    projected_points = torch.einsum('bfij,bfkj->bfki', K, projected_points)
+    if normalize_joints2d:
+        projected_points = projected_points / (img_res / 2.)
+    return projected_points[..., :-1]
+def full_perspective_projection(
+        points,
+        cam_intrinsics,
+        rotation=None,
+        translation=None,
+):
+    K = cam_intrinsics
+    if rotation is not None:
+        points = (rotation @ points.transpose(-1, -2)).transpose(-1, -2)
+    if translation is not None:
+        points = points + translation.unsqueeze(-2)
+    projected_points = points / points[..., -1].unsqueeze(-1)
+    projected_points = (K @ projected_points.transpose(-1, -2)).transpose(-1, -2)
+    return projected_points[..., :-1]
+def convert_pare_to_full_img_cam(
+        pare_cam,
+        bbox_height,
+        bbox_center,
+        img_w,
+        img_h,
+        focal_length,
+        crop_res=224
+):
+    s, tx, ty = pare_cam[..., 0], pare_cam[..., 1], pare_cam[..., 2]
+    res = crop_res
+    r = bbox_height / res
+    tz = 2 * focal_length / (r * res * s)
+    cx = 2 * (bbox_center[..., 0] - (img_w / 2.)) / (s * bbox_height)
+    cy = 2 * (bbox_center[..., 1] - (img_h / 2.)) / (s * bbox_height)
+    cam_t = torch.stack([tx + cx, ty + cy, tz], dim=-1)
+    return cam_t
+def cam_crop2full(crop_cam, center, scale, full_img_shape, focal_length):
+    """
+    convert the camera parameters from the crop camera to the full camera
+    :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+    :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+    :param scale: shape=(N) square bbox resolution  (b / 200)
+    :param full_img_shape: shape=(N, 2) original image height and width
+    :param focal_length: shape=(N,)
+    :return:
+    """
+    img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+    cx, cy, b = center[:, 0], center[:, 1], scale * 200
+    w_2, h_2 = img_w / 2., img_h / 2.
+    bs = b * crop_cam[:, 0] + 1e-9
+    tz = 2 * focal_length / bs
+    tx = (2 * (cx - w_2) / bs) + crop_cam[:, 1]
+    ty = (2 * (cy - h_2) / bs) + crop_cam[:, 2]
+    full_cam = torch.stack([tx, ty, tz], dim=-1)
+    return full_cam

lib/models/smplify/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .smplify import TemporalSMPLify

lib/models/smplify/__pycache__/__init__.cpython-39.pyc ADDED Viewed

Binary file (241 Bytes). View file

lib/models/smplify/__pycache__/losses.cpython-39.pyc ADDED Viewed

Binary file (2.63 kB). View file

lib/models/smplify/__pycache__/smplify.cpython-39.pyc ADDED Viewed

Binary file (2.09 kB). View file

lib/models/smplify/losses.py ADDED Viewed

	@@ -0,0 +1,87 @@

+import torch
+def gmof(x, sigma):
+    """
+    Geman-McClure error function
+    """
+    x_squared = x ** 2
+    sigma_squared = sigma ** 2
+    return (sigma_squared * x_squared) / (sigma_squared + x_squared)
+def compute_jitter(x):
+    """
+    Compute jitter for the input tensor
+    """
+    return torch.linalg.norm(x[:, 2:] + x[:, :-2] - 2 * x[:, 1:-1], dim=-1)
+class SMPLifyLoss(torch.nn.Module):
+    def __init__(self,
+                 res,
+                 cam_intrinsics,
+                 init_pose,
+                 device,
+                 **kwargs
+                 ):
+        super().__init__()
+        self.res = res
+        self.cam_intrinsics = cam_intrinsics
+        self.init_pose = torch.from_numpy(init_pose).float().to(device)
+    def forward(self, output, params, input_keypoints, bbox,
+                reprojection_weight=100., regularize_weight=60.0,
+                consistency_weight=10.0, sprior_weight=0.04,
+                smooth_weight=20.0, sigma=100):
+        pose, shape, cam = params
+        scale = bbox[..., 2:].unsqueeze(-1) * 200.
+        # Loss 1. Data term
+        pred_keypoints = output.full_joints2d[..., :17, :]
+        joints_conf = input_keypoints[..., -1:]
+        reprojection_error = gmof(pred_keypoints - input_keypoints[..., :-1], sigma)
+        reprojection_error = ((reprojection_error * joints_conf) / scale).mean()
+        # Loss 2. Regularization term
+        regularize_error = torch.linalg.norm(pose - self.init_pose, dim=-1).mean()
+        # Loss 3. Shape prior and consistency error
+        consistency_error = shape.std(dim=1).mean()
+        sprior_error = torch.linalg.norm(shape, dim=-1).mean()
+        shape_error = sprior_weight * sprior_error + consistency_weight * consistency_error
+        # Loss 4. Smooth loss
+        pose_diff = compute_jitter(pose).mean()
+        cam_diff = compute_jitter(cam).mean()
+        smooth_error = pose_diff + cam_diff
+        # Sum up losses
+        loss = {
+            'reprojection': reprojection_weight * reprojection_error,
+            'regularize': regularize_weight * regularize_error,
+            'shape': shape_error,
+            'smooth': smooth_weight * smooth_error
+        }
+        return loss
+    def create_closure(self,
+                       optimizer,
+                       smpl,
+                       params,
+                       bbox,
+                       input_keypoints):
+        def closure():
+            optimizer.zero_grad()
+            output = smpl(*params, cam_intrinsics=self.cam_intrinsics, bbox=bbox, res=self.res)
+            loss_dict = self.forward(output, params, input_keypoints, bbox)
+            loss = sum(loss_dict.values())
+            loss.backward()
+            return loss
+        return closure

lib/models/smplify/smplify.py ADDED Viewed

	@@ -0,0 +1,83 @@

+import os
+import torch
+from tqdm import tqdm
+from lib.models import build_body_model
+from .losses import SMPLifyLoss
+class TemporalSMPLify():
+    def __init__(self,
+                 smpl=None,
+                 lr=1e-2,
+                 num_iters=5,
+                 num_steps=10,
+                 img_w=None,
+                 img_h=None,
+                 device=None
+                 ):
+        self.smpl = smpl
+        self.lr = lr
+        self.num_iters = num_iters
+        self.num_steps = num_steps
+        self.img_w = img_w
+        self.img_h = img_h
+        self.device = device
+    def fit(self, init_pred, keypoints, bbox, **kwargs):
+        def to_params(param):
+            return torch.from_numpy(param).float().to(self.device).requires_grad_(True)
+        pose = init_pred['pose'].detach().cpu().numpy()
+        betas = init_pred['betas'].detach().cpu().numpy()
+        cam = init_pred['cam'].detach().cpu().numpy()
+        keypoints = torch.from_numpy(keypoints).float().unsqueeze(0).to(self.device)
+        BN = pose.shape[1]
+        lr = self.lr
+        # Stage 1. Optimize translation
+        params = [to_params(pose), to_params(betas), to_params(cam)]
+        optim_params = [params[2]]
+        optimizer = torch.optim.LBFGS(
+            optim_params,
+            lr=lr,
+            max_iter=self.num_iters,
+            line_search_fn='strong_wolfe')
+        loss_fn = SMPLifyLoss(init_pose=pose, device=self.device, **kwargs)
+        closure = loss_fn.create_closure(optimizer,
+                       self.smpl,
+                       params,
+                       bbox,
+                       keypoints)
+        for j in (j_bar := tqdm(range(self.num_steps), leave=False)):
+            optimizer.zero_grad()
+            loss = optimizer.step(closure)
+            msg = f'Loss: {loss.item():.1f}'
+            j_bar.set_postfix_str(msg)
+        # Stage 2. Optimize all params
+        optimizer = torch.optim.LBFGS(
+            params,
+            lr=lr * BN,
+            max_iter=self.num_iters,
+            line_search_fn='strong_wolfe')
+        for j in (j_bar := tqdm(range(self.num_steps), leave=False)):
+            optimizer.zero_grad()
+            loss = optimizer.step(closure)
+            msg = f'Loss: {loss.item():.1f}'
+            j_bar.set_postfix_str(msg)
+        init_pred['pose'] = params[0].detach()
+        init_pred['betas'] = params[1].detach()
+        init_pred['cam'] = params[2].detach()
+        return init_pred

lib/models/wham.py ADDED Viewed

	@@ -0,0 +1,210 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import torch
+from torch import nn
+import numpy as np
+from configs import constants as _C
+from lib.models.layers import (MotionEncoder, MotionDecoder, TrajectoryDecoder, TrajectoryRefiner, Integrator,
+                               rollout_global_motion, reset_root_velocity, compute_camera_motion)
+from lib.utils.transforms import axis_angle_to_matrix
+class Network(nn.Module):
+    def __init__(self,
+                 smpl,
+                 pose_dr=0.1,
+                 d_embed=512,
+                 n_layers=3,
+                 d_feat=2048,
+                 rnn_type='LSTM',
+                 **kwargs
+                 ):
+        super().__init__()
+        n_joints = _C.KEYPOINTS.NUM_JOINTS
+        self.smpl = smpl
+        in_dim = n_joints * 2 + 3
+        d_context = d_embed + n_joints * 3
+        self.mask_embedding = nn.Parameter(torch.zeros(1, 1, n_joints, 2))
+        # Module 1. Motion Encoder
+        self.motion_encoder = MotionEncoder(in_dim=in_dim,
+                                            d_embed=d_embed,
+                                            pose_dr=pose_dr,
+                                            rnn_type=rnn_type,
+                                            n_layers=n_layers,
+                                            n_joints=n_joints)
+        self.trajectory_decoder = TrajectoryDecoder(d_embed=d_context,
+                                                    rnn_type=rnn_type,
+                                                    n_layers=n_layers)
+        # Module 3. Feature Integrator
+        self.integrator = Integrator(in_channel=d_feat + d_context,
+                                     out_channel=d_context)
+        # Module 4. Motion Decoder
+        self.motion_decoder = MotionDecoder(d_embed=d_context,
+                                            rnn_type=rnn_type,
+                                            n_layers=n_layers)
+        # Module 5. Trajectory Refiner
+        self.trajectory_refiner = TrajectoryRefiner(d_embed=d_context,
+                                                    d_hidden=d_embed,
+                                                    rnn_type=rnn_type,
+                                                    n_layers=2)
+    def compute_global_feet(self, root_world, trans):
+        # # Compute world-coordinate motion
+        cam_R, cam_T = compute_camera_motion(self.output, self.pred_pose[:, :, :6], root_world, trans, self.pred_cam)
+        feet_cam = self.output.feet.reshape(self.b, self.f, -1, 3) + self.output.full_cam.reshape(self.b, self.f, 1, 3)
+        feet_world = (cam_R.mT @ (feet_cam - cam_T.unsqueeze(-2)).mT).mT
+        return feet_world, cam_R
+    def forward_smpl(self, **kwargs):
+        self.output = self.smpl(self.pred_pose,
+                                self.pred_shape,
+                                cam=self.pred_cam,
+                                return_full_pose=not self.training,
+                                **kwargs,
+                                )
+        from loguru import logger
+        logger.info(f"Output Joints: {self.output.joints}")
+        logger.info(f"Output Vertices: {self.output.vertices}")
+        # Save joints and vertices as .npy arrays
+        np.save('joints.npy', self.output.joints.cpu().numpy())
+        np.save('vertices.npy', self.output.vertices.cpu().numpy())
+        # Feet location in global coordinate
+        root_world, trans = rollout_global_motion(self.pred_root, self.pred_vel)
+        feet_world, cam_R = self.compute_global_feet(root_world, trans)
+        # Return output
+        output = {'feet': feet_world,
+                  'contact': self.pred_contact,
+                  'pose': self.pred_pose,
+                  'betas': self.pred_shape,
+                  'cam': self.pred_cam,
+                  'poses_root_cam': self.output.global_orient,
+                  'poses_root_r6d': self.pred_root,
+                  'vel_root': self.pred_vel,
+                  'pose_root': self.pred_root,
+                  'verts_cam': self.output.vertices}
+        if self.training:
+            output.update({
+                'kp3d': self.output.joints,
+                'kp3d_nn': self.pred_kp3d,
+                'full_kp2d': self.output.full_joints2d,
+                'weak_kp2d': self.output.weak_joints2d,
+                'R': cam_R,
+            })
+        else:
+            output.update({
+                'poses_root_r6d': self.pred_root,
+                'trans_cam': self.output.full_cam,
+                'poses_body': self.output.body_pose})
+        return output
+    def preprocess(self, x, mask):
+        self.b, self.f = x.shape[:2]
+        # Treat masked keypoints
+        mask_embedding = mask.unsqueeze(-1) * self.mask_embedding
+        _mask = mask.unsqueeze(-1).repeat(1, 1, 1, 2).reshape(self.b, self.f, -1)
+        _mask = torch.cat((_mask, torch.zeros_like(_mask[..., :3])), dim=-1)
+        _mask_embedding = mask_embedding.reshape(self.b, self.f, -1)
+        _mask_embedding = torch.cat((_mask_embedding, torch.zeros_like(_mask_embedding[..., :3])), dim=-1)
+        x[_mask] = 0.0
+        x = x + _mask_embedding
+        return x
+    def rollout(self, output, pred_root, pred_vel, return_y_up):
+        root_world, trans_world = rollout_global_motion(pred_root, pred_vel)
+        if return_y_up:
+            yup2ydown = axis_angle_to_matrix(torch.tensor([[np.pi, 0, 0]])).float().to(root_world.device)
+            root_world = yup2ydown.mT @ root_world
+            trans_world = (yup2ydown.mT @ trans_world.unsqueeze(-1)).squeeze(-1)
+        output.update({
+            'poses_root_world': root_world,
+            'trans_world': trans_world,
+        })
+        return output
+    def refine_trajectory(self, output, cam_angvel, return_y_up, **kwargs):
+        # --------- Refine trajectory --------- #
+        update_vel = reset_root_velocity(self.smpl, self.output, self.pred_contact, self.pred_root, self.pred_vel, thr=0.5)
+        output = self.trajectory_refiner(self.old_motion_context, update_vel, output, cam_angvel, return_y_up=return_y_up)
+        # --------- #
+        # Do rollout
+        output = self.rollout(output, output['poses_root_r6d_refined'], output['vel_root_refined'], return_y_up)
+        # ---------  Compute refined feet --------- #
+        if self.training:
+            feet_world, cam_R = self.compute_global_feet(output['poses_root_world'], output['trans_world'])
+            output.update({'feet_refined': feet_world})
+        return output
+    def forward(self, x, inits, img_features=None, mask=None, init_root=None, cam_angvel=None,
+                cam_intrinsics=None, bbox=None, res=None, return_y_up=False, refine_traj=True, **kwargs):
+        x = self.preprocess(x, mask)
+        init_kp, init_smpl = inits
+        # --------- Inference --------- #
+        # Stage 1. Encode motion
+        pred_kp3d, motion_context = self.motion_encoder(x, init_kp)
+        self.old_motion_context = motion_context.detach().clone()
+        # Stage 2. Decode global trajectory
+        pred_root, pred_vel = self.trajectory_decoder(motion_context, init_root, cam_angvel)
+        # Stage 3. Integrate features
+        if img_features is not None and self.integrator is not None:
+            motion_context = self.integrator(motion_context, img_features)
+        # Stage 4. Decode SMPL motion
+        pred_pose, pred_shape, pred_cam, pred_contact = self.motion_decoder(motion_context, init_smpl)
+        # --------- #
+        # --------- Register predictions --------- #
+        self.pred_kp3d = pred_kp3d
+        self.pred_root = pred_root
+        self.pred_vel = pred_vel
+        self.pred_pose = pred_pose
+        self.pred_shape = pred_shape
+        self.pred_cam = pred_cam
+        self.pred_contact = pred_contact
+        # --------- #
+        # --------- Build SMPL --------- #
+        output = self.forward_smpl(cam_intrinsics=cam_intrinsics, bbox=bbox, res=res)
+        # --------- #
+        # --------- Refine trajectory --------- #
+        if refine_traj:
+            output = self.refine_trajectory(output, cam_angvel, return_y_up)
+        else:
+            output = self.rollout(output, self.pred_root, self.pred_vel, return_y_up)
+        # --------- #
+        return output

lib/utils/__pycache__/data_utils.cpython-39.pyc ADDED Viewed

Binary file (3.57 kB). View file

lib/utils/__pycache__/imutils.cpython-39.pyc ADDED Viewed

Binary file (10.6 kB). View file

lib/utils/__pycache__/kp_utils.cpython-39.pyc ADDED Viewed

Binary file (9.99 kB). View file

lib/utils/__pycache__/transforms.cpython-39.pyc ADDED Viewed

Binary file (23.2 kB). View file

lib/utils/data_utils.py ADDED Viewed

	@@ -0,0 +1,113 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import torch
+import numpy as np
+from lib.utils import transforms
+def make_collate_fn():
+    def collate_fn(items):
+        items = list(filter(lambda x: x is not None , items))
+        batch = dict()
+        try: batch['vid'] = [item['vid'] for item in items]
+        except: pass
+        try: batch['gender'] = [item['gender'] for item in items]
+        except: pass
+        for key in items[0].keys():
+            try: batch[key] = torch.stack([item[key] for item in items])
+            except: pass
+        return batch
+    return collate_fn
+def prepare_keypoints_data(target):
+    """Prepare keypoints data"""
+    # Prepare 2D keypoints
+    target['init_kp2d'] = target['kp2d'][:1]
+    target['kp2d'] = target['kp2d'][1:]
+    if 'kp3d' in target:
+        target['kp3d'] = target['kp3d'][1:]
+    return target
+def prepare_smpl_data(target):
+    if 'pose' in target.keys():
+        # Use only the main joints
+        pose = target['pose'][:]
+        # 6-D Rotation representation
+        pose6d = transforms.matrix_to_rotation_6d(pose)
+        target['pose'] = pose6d[1:]
+    if 'betas' in target.keys():
+        target['betas'] = target['betas'][1:]
+    # Translation and shape parameters
+    if 'transl' in target.keys():
+        target['cam'] = target['transl'][1:]
+    # Initial pose and translation
+    target['init_pose'] = transforms.matrix_to_rotation_6d(target['init_pose'])
+    return target
+def append_target(target, label, key_list, idx1, idx2=None, pad=True):
+    for key in key_list:
+        if idx2 is None: data = label[key][idx1]
+        else: data = label[key][idx1:idx2+1]
+        if not pad: data = data[2:]
+        target[key] = data
+    return target
+def map_dmpl_to_smpl(pose):
+    """ Map AMASS DMPL pose representation to SMPL pose representation
+    Args:
+        pose - tensor / array with shape of (n_frames, 156)
+    Return:
+        pose - tensor / array with shape of (n_frames, 24, 3)
+    """
+    pose = pose.reshape(pose.shape[0], -1, 3)
+    pose[:, 23] = pose[:, 37]     # right hand
+    if isinstance(pose, np.ndarray): pose = pose[:, :24].copy()
+    else: pose = pose[:, :24].clone()
+    return pose
+def transform_global_coordinate(pose, T, transl=None):
+    """ Transform global coordinate of dataset with respect to the given matrix.
+    Various datasets have different global coordinate system,
+    thus we united all datasets to the cronical coordinate system.
+    Args:
+        pose - SMPL pose; tensor / array
+        T - Transformation matrix
+        transl - SMPL translation
+    """
+    return_to_numpy = False
+    if isinstance(pose, np.ndarray):
+        return_to_numpy = True
+        pose = torch.from_numpy(pose).float()
+        if transl is not None: transl = torch.from_numpy(transl).float()
+    pose = transforms.axis_angle_to_matrix(pose)
+    pose[:, 0] = T @ pose[:, 0]
+    pose = transforms.matrix_to_axis_angle(pose)
+    if transl is not None:
+        transl = (T @ transl.T).squeeze().T
+    if return_to_numpy:
+        pose = pose.detach().numpy()
+        if transl is not None: transl = transl.detach().numpy()
+    return pose, transl

lib/utils/imutils.py ADDED Viewed

	@@ -0,0 +1,363 @@

+import cv2
+import torch
+import random
+import numpy as np
+from . import transforms
+def do_augmentation(scale_factor=0.2, trans_factor=0.1):
+    scale = random.uniform(1.2 - scale_factor, 1.2 + scale_factor)
+    trans_x = random.uniform(-trans_factor, trans_factor)
+    trans_y = random.uniform(-trans_factor, trans_factor)
+    return scale, trans_x, trans_y
+def get_transform(center, scale, res, rot=0):
+    """Generate transformation matrix."""
+    # res: (height, width), (rows, cols)
+    crop_aspect_ratio = res[0] / float(res[1])
+    h = 200 * scale
+    w = h / crop_aspect_ratio
+    t = np.zeros((3, 3))
+    t[0, 0] = float(res[1]) / w
+    t[1, 1] = float(res[0]) / h
+    t[0, 2] = res[1] * (-float(center[0]) / w + .5)
+    t[1, 2] = res[0] * (-float(center[1]) / h + .5)
+    t[2, 2] = 1
+    if not rot == 0:
+        rot = -rot  # To match direction of rotation from cropping
+        rot_mat = np.zeros((3, 3))
+        rot_rad = rot * np.pi / 180
+        sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+        rot_mat[0, :2] = [cs, -sn]
+        rot_mat[1, :2] = [sn, cs]
+        rot_mat[2, 2] = 1
+        # Need to rotate around center
+        t_mat = np.eye(3)
+        t_mat[0, 2] = -res[1] / 2
+        t_mat[1, 2] = -res[0] / 2
+        t_inv = t_mat.copy()
+        t_inv[:2, 2] *= -1
+        t = np.dot(t_inv, np.dot(rot_mat, np.dot(t_mat, t)))
+    return t
+def transform(pt, center, scale, res, invert=0, rot=0):
+    """Transform pixel location to different reference."""
+    t = get_transform(center, scale, res, rot=rot)
+    if invert:
+        t = np.linalg.inv(t)
+    new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return np.array([round(new_pt[0]), round(new_pt[1])], dtype=int) + 1
+def crop_cliff(img, center, scale, res):
+    """
+    Crop image according to the supplied bounding box.
+    res: [rows, cols]
+    """
+    # Upper left point
+    ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+    # Bottom right point
+    br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+    # Padding so that when rotated proper amount of context is included
+    pad = int(np.linalg.norm(br - ul) / 2 - float(br[1] - ul[1]) / 2)
+    new_shape = [br[1] - ul[1], br[0] - ul[0]]
+    if len(img.shape) > 2:
+        new_shape += [img.shape[2]]
+    new_img = np.zeros(new_shape, dtype=np.float32)
+    # Range to fill new array
+    new_x = max(0, -ul[0]), min(br[0], len(img[0])) - ul[0]
+    new_y = max(0, -ul[1]), min(br[1], len(img)) - ul[1]
+    # Range to sample from original image
+    old_x = max(0, ul[0]), min(len(img[0]), br[0])
+    old_y = max(0, ul[1]), min(len(img), br[1])
+    try:
+        new_img[new_y[0]:new_y[1], new_x[0]:new_x[1]] = img[old_y[0]:old_y[1], old_x[0]:old_x[1]]
+    except Exception as e:
+        print(e)
+    new_img = cv2.resize(new_img, (res[1], res[0]))  # (cols, rows)
+    return new_img, ul, br
+def obtain_bbox(center, scale, res, org_res):
+    # Upper left point
+    ul = np.array(transform([1, 1], center, scale, res, invert=1)) - 1
+    # Bottom right point
+    br = np.array(transform([res[1] + 1, res[0] + 1], center, scale, res, invert=1)) - 1
+    # Padding so that when rotated proper amount of context is included
+    pad = int(np.linalg.norm(br - ul) / 2 - float(br[1] - ul[1]) / 2)
+    # Range to sample from original image
+    old_x = max(0, ul[0]), min(org_res[0], br[0])
+    old_y = max(0, ul[1]), min(org_res[1], br[1])
+    return old_x, old_y
+def cam_crop2full(crop_cam, bbox, full_img_shape, focal_length=None):
+    """
+    convert the camera parameters from the crop camera to the full camera
+    :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+    :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+    :param scale: shape=(N, 1) square bbox resolution  (b / 200)
+    :param full_img_shape: shape=(N, 2) original image height and width
+    :param focal_length: shape=(N,)
+    :return:
+    """
+    cx = bbox[..., 0].clone(); cy = bbox[..., 1].clone(); b = bbox[..., 2].clone() * 200
+    img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+    w_2, h_2 = img_w / 2., img_h / 2.
+    bs = b * crop_cam[:, :, 0] + 1e-9
+    if focal_length is None:
+        focal_length = (img_w * img_w + img_h * img_h) ** 0.5
+    tz = 2 * focal_length.unsqueeze(-1) / bs
+    tx = (2 * (cx - w_2.unsqueeze(-1)) / bs) + crop_cam[:, :, 1]
+    ty = (2 * (cy - h_2.unsqueeze(-1)) / bs) + crop_cam[:, :, 2]
+    full_cam = torch.stack([tx, ty, tz], dim=-1)
+    return full_cam
+def cam_pred2full(crop_cam, center, scale, full_img_shape, focal_length=2000.,):
+    """
+    Reference CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation
+    convert the camera parameters from the crop camera to the full camera
+    :param crop_cam: shape=(N, 3) weak perspective camera in cropped img coordinates (s, tx, ty)
+    :param center: shape=(N, 2) bbox coordinates (c_x, c_y)
+    :param scale: shape=(N, ) square bbox resolution  (b / 200)
+    :param full_img_shape: shape=(N, 2) original image height and width
+    :param focal_length: shape=(N,)
+    :return:
+    """
+    # img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+    img_w, img_h = full_img_shape[:, 0], full_img_shape[:, 1]
+    cx, cy, b = center[:, 0], center[:, 1], scale * 200
+    w_2, h_2 = img_w / 2., img_h / 2.
+    bs = b * crop_cam[:, 0] + 1e-9
+    tz = 2 * focal_length / bs
+    tx = (2 * (cx - w_2) / bs) + crop_cam[:, 1]
+    ty = (2 * (cy - h_2) / bs) + crop_cam[:, 2]
+    full_cam = torch.stack([tx, ty, tz], dim=-1)
+    return full_cam
+def cam_full2pred(full_cam, center, scale, full_img_shape, focal_length=2000.):
+    # img_h, img_w = full_img_shape[:, 0], full_img_shape[:, 1]
+    img_w, img_h = full_img_shape[:, 0], full_img_shape[:, 1]
+    cx, cy, b = center[:, 0], center[:, 1], scale * 200
+    w_2, h_2 = img_w / 2., img_h / 2.
+    bs = (2 * focal_length / full_cam[:, 2])
+    _s = bs / b
+    _tx = full_cam[:, 0] - (2 * (cx - w_2) / bs)
+    _ty = full_cam[:, 1] - (2 * (cy - h_2) / bs)
+    crop_cam = torch.stack([_s, _tx, _ty], dim=-1)
+    return crop_cam
+def obtain_camera_intrinsics(image_shape, focal_length):
+    res_w = image_shape[..., 0].clone()
+    res_h = image_shape[..., 1].clone()
+    K = torch.eye(3).unsqueeze(0).expand(focal_length.shape[0], -1, -1).to(focal_length.device)
+    K[..., 0, 0] = focal_length.clone()
+    K[..., 1, 1] = focal_length.clone()
+    K[..., 0, 2] = res_w / 2
+    K[..., 1, 2] = res_h / 2
+    return K.unsqueeze(1)
+def trans_point2d(pt_2d, trans):
+    src_pt = np.array([pt_2d[0], pt_2d[1], 1.]).T
+    dst_pt = np.dot(trans, src_pt)
+    return dst_pt[0:2]
+def rotate_2d(pt_2d, rot_rad):
+    x = pt_2d[0]
+    y = pt_2d[1]
+    sn, cs = np.sin(rot_rad), np.cos(rot_rad)
+    xx = x * cs - y * sn
+    yy = x * sn + y * cs
+    return np.array([xx, yy], dtype=np.float32)
+def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False):
+    # augment size with scale
+    src_w = src_width * scale
+    src_h = src_height * scale
+    src_center = np.zeros(2)
+    src_center[0] = c_x
+    src_center[1] = c_y # np.array([c_x, c_y], dtype=np.float32)
+    # augment rotation
+    rot_rad = np.pi * rot / 180
+    src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
+    src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)
+    dst_w = dst_width
+    dst_h = dst_height
+    dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
+    dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
+    dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)
+    src = np.zeros((3, 2), dtype=np.float32)
+    src[0, :] = src_center
+    src[1, :] = src_center + src_downdir
+    src[2, :] = src_center + src_rightdir
+    dst = np.zeros((3, 2), dtype=np.float32)
+    dst[0, :] = dst_center
+    dst[1, :] = dst_center + dst_downdir
+    dst[2, :] = dst_center + dst_rightdir
+    if inv:
+        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
+    else:
+        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
+    return trans
+def transform_keypoints(kp_2d, bbox, patch_width, patch_height):
+    center_x, center_y, scale = bbox[:3]
+    width = height = scale * 200
+    # scale, rot = 1.2, 0
+    scale, rot = 1.0, 0
+    # generate transformation
+    trans = gen_trans_from_patch_cv(
+        center_x,
+        center_y,
+        width,
+        height,
+        patch_width,
+        patch_height,
+        scale,
+        rot,
+        inv=False,
+    )
+    for n_jt in range(kp_2d.shape[0]):
+        kp_2d[n_jt] = trans_point2d(kp_2d[n_jt], trans)
+    return kp_2d, trans
+def transform(pt, center, scale, res, invert=0, rot=0):
+    """Transform pixel location to different reference."""
+    t = get_transform(center, scale, res, rot=rot)
+    if invert:
+        t = np.linalg.inv(t)
+    new_pt = np.array([pt[0] - 1, pt[1] - 1, 1.]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2].astype(int) + 1
+def compute_cam_intrinsics(res):
+    img_w, img_h = res
+    focal_length = (img_w * img_w + img_h * img_h) ** 0.5
+    cam_intrinsics = torch.eye(3).repeat(1, 1, 1).float()
+    cam_intrinsics[:, 0, 0] = focal_length
+    cam_intrinsics[:, 1, 1] = focal_length
+    cam_intrinsics[:, 0, 2] = img_w/2.
+    cam_intrinsics[:, 1, 2] = img_h/2.
+    return cam_intrinsics
+def flip_kp(kp, img_w=None):
+    """Flip keypoints."""
+    flipped_parts = [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
+    kp = kp[..., flipped_parts, :]
+    if img_w is not None:
+        # Assume 2D keypoints
+        kp[...,0] = img_w - kp[...,0]
+    return kp
+def flip_bbox(bbox, img_w, img_h):
+    center = bbox[..., :2]
+    scale = bbox[..., -1:]
+    WH = np.ones_like(center)
+    WH[..., 0] *= img_w
+    WH[..., 1] *= img_h
+    center = center - WH/2
+    center[...,0] = - center[...,0]
+    center = center + WH/2
+    flipped_bbox = np.concatenate((center, scale), axis=-1)
+    return flipped_bbox
+def flip_pose(rotation, representation='rotation_6d'):
+    """Flip pose.
+    The flipping is based on SMPL parameters.
+    """
+    BN = rotation.shape[0]
+    if representation == 'axis_angle':
+        pose = rotation.reshape(BN, -1).transpose(0, 1)
+    elif representation == 'matrix':
+        pose = transforms.matrix_to_axis_angle(rotation).reshape(BN, -1).transpose(0, 1)
+    elif representation == 'rotation_6d':
+        pose = transforms.matrix_to_axis_angle(
+            transforms.rotation_6d_to_matrix(rotation)
+        ).reshape(BN, -1).transpose(0, 1)
+    else:
+        raise ValueError(f"Unknown representation: {representation}")
+    SMPL_JOINTS_FLIP_PERM = [0, 2, 1, 3, 5, 4, 6, 8, 7, 9, 11, 10, 12, 14, 13, 15, 17, 16, 19, 18, 21, 20, 23, 22]
+    SMPL_POSE_FLIP_PERM = []
+    for i in SMPL_JOINTS_FLIP_PERM:
+        SMPL_POSE_FLIP_PERM.append(3*i)
+        SMPL_POSE_FLIP_PERM.append(3*i+1)
+        SMPL_POSE_FLIP_PERM.append(3*i+2)
+    pose = pose[SMPL_POSE_FLIP_PERM]
+    # we also negate the second and the third dimension of the axis-angle
+    pose[1::3] = -pose[1::3]
+    pose[2::3] = -pose[2::3]
+    pose = pose.transpose(0, 1).reshape(BN, -1, 3)
+    if representation == 'aa':
+        return pose
+    elif representation == 'rotmat':
+        return transforms.axis_angle_to_matrix(pose)
+    else:
+        return transforms.matrix_to_rotation_6d(
+            transforms.axis_angle_to_matrix(pose)
+        )
+def avg_preds(rotation, shape, flipped_rotation, flipped_shape, representation='rotation_6d'):
+    # Rotation
+    flipped_rotation = flip_pose(flipped_rotation, representation=representation)
+    if representation != 'matrix':
+        flipped_rotation = eval(f'transforms.{representation}_to_matrix')(flipped_rotation)
+        rotation = eval(f'transforms.{representation}_to_matrix')(rotation)
+    avg_rotation = torch.stack([rotation, flipped_rotation])
+    avg_rotation = transforms.avg_rot(avg_rotation)
+    if representation != 'matrix':
+        avg_rotation = eval(f'transforms.matrix_to_{representation}')(avg_rotation)
+    # Shape
+    avg_shape = (shape + flipped_shape) / 2.0
+    return avg_rotation, avg_shape

lib/utils/kp_utils.py ADDED Viewed

	@@ -0,0 +1,761 @@

+from __future__ import absolute_import
+from __future__ import print_function
+from __future__ import division
+import torch
+import numpy as np
+from configs import constants as _C
+def root_centering(X, joint_type='coco'):
+    """Center the root joint to the pelvis."""
+    if joint_type != 'common' and X.shape[-2] == 14: return X
+    conf = None
+    if X.shape[-1] == 4:
+        conf = X[..., -1:]
+        X = X[..., :-1]
+    if X.shape[-2] == 31:
+        X[..., :17, :] = X[..., :17, :] - X[..., [12, 11], :].mean(-2, keepdims=True)
+        X[..., 17:, :]  = X[..., 17:, :] - X[..., [19, 20], :].mean(-2, keepdims=True)
+    elif joint_type == 'coco':
+        X = X - X[..., [12, 11], :].mean(-2, keepdims=True)
+    elif joint_type == 'common':
+        X = X - X[..., [2, 3], :].mean(-2, keepdims=True)
+    if conf is not None:
+        X = torch.cat((X, conf), dim=-1)
+    return X
+def convert_kps(joints2d, src, dst):
+    src_names = eval(f'get_{src}_joint_names')()
+    dst_names = eval(f'get_{dst}_joint_names')()
+    if isinstance(joints2d, np.ndarray):
+        out_joints2d = np.zeros((*joints2d.shape[:-2], len(dst_names), joints2d.shape[-1]))
+    else:
+        out_joints2d = torch.zeros((*joints2d.shape[:-2], len(dst_names), joints2d.shape[-1]), device=joints2d.device)
+    for idx, jn in enumerate(dst_names):
+        if jn in src_names:
+            out_joints2d[..., idx, :] = joints2d[..., src_names.index(jn), :]
+    return out_joints2d
+def get_perm_idxs(src, dst):
+    src_names = eval(f'get_{src}_joint_names')()
+    dst_names = eval(f'get_{dst}_joint_names')()
+    idxs = [src_names.index(h) for h in dst_names if h in src_names]
+    return idxs
+def get_mpii3d_test_joint_names():
+    return [
+        'headtop', # 'head_top',
+        'neck',
+        'rshoulder',# 'right_shoulder',
+        'relbow',# 'right_elbow',
+        'rwrist',# 'right_wrist',
+        'lshoulder',# 'left_shoulder',
+        'lelbow', # 'left_elbow',
+        'lwrist', # 'left_wrist',
+        'rhip', # 'right_hip',
+        'rknee', # 'right_knee',
+        'rankle',# 'right_ankle',
+        'lhip',# 'left_hip',
+        'lknee',# 'left_knee',
+        'lankle',# 'left_ankle'
+        'hip',# 'pelvis',
+        'Spine (H36M)',# 'spine',
+        'Head (H36M)',# 'head'
+    ]
+def get_mpii3d_joint_names():
+    return [
+        'spine3', # 0,
+        'spine4', # 1,
+        'spine2', # 2,
+        'Spine (H36M)', #'spine', # 3,
+        'hip', # 'pelvis', # 4,
+        'neck', # 5,
+        'Head (H36M)', # 'head', # 6,
+        "headtop", # 'head_top', # 7,
+        'left_clavicle', # 8,
+        "lshoulder", # 'left_shoulder', # 9,
+        "lelbow", # 'left_elbow',# 10,
+        "lwrist", # 'left_wrist',# 11,
+        'left_hand',# 12,
+        'right_clavicle',# 13,
+        'rshoulder',# 'right_shoulder',# 14,
+        'relbow',# 'right_elbow',# 15,
+        'rwrist',# 'right_wrist',# 16,
+        'right_hand',# 17,
+        'lhip', # left_hip',# 18,
+        'lknee', # 'left_knee',# 19,
+        'lankle', #left ankle # 20
+        'left_foot', # 21
+        'left_toe', # 22
+        "rhip", # 'right_hip',# 23
+        "rknee", # 'right_knee',# 24
+        "rankle", #'right_ankle', # 25
+        'right_foot',# 26
+        'right_toe' # 27
+    ]
+def get_insta_joint_names():
+    return [
+        'OP RHeel',
+        'OP RKnee',
+        'OP RHip',
+        'OP LHip',
+        'OP LKnee',
+        'OP LHeel',
+        'OP RWrist',
+        'OP RElbow',
+        'OP RShoulder',
+        'OP LShoulder',
+        'OP LElbow',
+        'OP LWrist',
+        'OP Neck',
+        'headtop',
+        'OP Nose',
+        'OP LEye',
+        'OP REye',
+        'OP LEar',
+        'OP REar',
+        'OP LBigToe',
+        'OP RBigToe',
+        'OP LSmallToe',
+        'OP RSmallToe',
+        'OP LAnkle',
+        'OP RAnkle',
+    ]
+def get_insta_skeleton():
+    return np.array(
+        [
+            [0 , 1],
+            [1 , 2],
+            [2 , 3],
+            [3 , 4],
+            [4 , 5],
+            [6 , 7],
+            [7 , 8],
+            [8 , 9],
+            [9 ,10],
+            [2 , 8],
+            [3 , 9],
+            [10,11],
+            [8 ,12],
+            [9 ,12],
+            [12,13],
+            [12,14],
+            [14,15],
+            [14,16],
+            [15,17],
+            [16,18],
+            [0 ,20],
+            [20,22],
+            [5 ,19],
+            [19,21],
+            [5 ,23],
+            [0 ,24],
+        ])
+def get_staf_skeleton():
+    return np.array(
+        [
+            [0, 1],
+            [1, 2],
+            [2, 3],
+            [3, 4],
+            [1, 5],
+            [5, 6],
+            [6, 7],
+            [1, 8],
+            [8, 9],
+            [9, 10],
+            [10, 11],
+            [8, 12],
+            [12, 13],
+            [13, 14],
+            [0, 15],
+            [0, 16],
+            [15, 17],
+            [16, 18],
+            [2, 9],
+            [5, 12],
+            [1, 19],
+            [20, 19],
+        ]
+    )
+def get_staf_joint_names():
+    return [
+        'OP Nose', # 0,
+        'OP Neck', # 1,
+        'OP RShoulder', # 2,
+        'OP RElbow', # 3,
+        'OP RWrist', # 4,
+        'OP LShoulder', # 5,
+        'OP LElbow', # 6,
+        'OP LWrist', # 7,
+        'OP MidHip', # 8,
+        'OP RHip', # 9,
+        'OP RKnee', # 10,
+        'OP RAnkle', # 11,
+        'OP LHip', # 12,
+        'OP LKnee', # 13,
+        'OP LAnkle', # 14,
+        'OP REye', # 15,
+        'OP LEye', # 16,
+        'OP REar', # 17,
+        'OP LEar', # 18,
+        'Neck (LSP)', # 19,
+        'Top of Head (LSP)', # 20,
+    ]
+def get_spin_joint_names():
+    return [
+        'OP Nose',        # 0
+        'OP Neck',        # 1
+        'OP RShoulder',   # 2
+        'OP RElbow',      # 3
+        'OP RWrist',      # 4
+        'OP LShoulder',   # 5
+        'OP LElbow',      # 6
+        'OP LWrist',      # 7
+        'OP MidHip',      # 8
+        'OP RHip',        # 9
+        'OP RKnee',       # 10
+        'OP RAnkle',      # 11
+        'OP LHip',        # 12
+        'OP LKnee',       # 13
+        'OP LAnkle',      # 14
+        'OP REye',        # 15
+        'OP LEye',        # 16
+        'OP REar',        # 17
+        'OP LEar',        # 18
+        'OP LBigToe',     # 19
+        'OP LSmallToe',   # 20
+        'OP LHeel',       # 21
+        'OP RBigToe',     # 22
+        'OP RSmallToe',   # 23
+        'OP RHeel',       # 24
+        'rankle',         # 25
+        'rknee',          # 26
+        'rhip',           # 27
+        'lhip',           # 28
+        'lknee',          # 29
+        'lankle',         # 30
+        'rwrist',         # 31
+        'relbow',         # 32
+        'rshoulder',      # 33
+        'lshoulder',      # 34
+        'lelbow',         # 35
+        'lwrist',         # 36
+        'neck',           # 37
+        'headtop',        # 38
+        'hip',            # 39 'Pelvis (MPII)', # 39
+        'thorax',         # 40 'Thorax (MPII)', # 40
+        'Spine (H36M)',   # 41
+        'Jaw (H36M)',     # 42
+        'Head (H36M)',    # 43
+        'nose',           # 44
+        'leye',           # 45 'Left Eye', # 45
+        'reye',           # 46 'Right Eye', # 46
+        'lear',           # 47 'Left Ear', # 47
+        'rear',           # 48 'Right Ear', # 48
+    ]
+def get_h36m_joint_names():
+    return [
+        'hip',  # 0
+        'lhip',  # 1
+        'lknee',  # 2
+        'lankle',  # 3
+        'rhip',  # 4
+        'rknee',  # 5
+        'rankle',  # 6
+        'Spine (H36M)',  # 7
+        'neck',  # 8
+        'Head (H36M)',  # 9
+        'headtop',  # 10
+        'lshoulder',  # 11
+        'lelbow',  # 12
+        'lwrist',  # 13
+        'rshoulder',  # 14
+        'relbow',  # 15
+        'rwrist',  # 16
+    ]
+'Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head_top', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist'
+def get_spin_skeleton():
+    return np.array(
+        [
+            [0 , 1],
+            [1 , 2],
+            [2 , 3],
+            [3 , 4],
+            [1 , 5],
+            [5 , 6],
+            [6 , 7],
+            [1 , 8],
+            [8 , 9],
+            [9 ,10],
+            [10,11],
+            [8 ,12],
+            [12,13],
+            [13,14],
+            [0 ,15],
+            [0 ,16],
+            [15,17],
+            [16,18],
+            [21,19],
+            [19,20],
+            [14,21],
+            [11,24],
+            [24,22],
+            [22,23],
+            [0 ,38],
+        ]
+    )
+def get_posetrack_joint_names():
+    return [
+        "nose",
+        "neck",
+        "headtop",
+        "lear",
+        "rear",
+        "lshoulder",
+        "rshoulder",
+        "lelbow",
+        "relbow",
+        "lwrist",
+        "rwrist",
+        "lhip",
+        "rhip",
+        "lknee",
+        "rknee",
+        "lankle",
+        "rankle"
+    ]
+def get_posetrack_original_kp_names():
+    return [
+        'nose',
+        'head_bottom',
+        'head_top',
+        'left_ear',
+        'right_ear',
+        'left_shoulder',
+        'right_shoulder',
+        'left_elbow',
+        'right_elbow',
+        'left_wrist',
+        'right_wrist',
+        'left_hip',
+        'right_hip',
+        'left_knee',
+        'right_knee',
+        'left_ankle',
+        'right_ankle'
+    ]
+def get_pennaction_joint_names():
+   return [
+       "headtop",   # 0
+       "lshoulder", # 1
+       "rshoulder", # 2
+       "lelbow",    # 3
+       "relbow",    # 4
+       "lwrist",    # 5
+       "rwrist",    # 6
+       "lhip" ,     # 7
+       "rhip" ,     # 8
+       "lknee",     # 9
+       "rknee" ,    # 10
+       "lankle",    # 11
+       "rankle"     # 12
+   ]
+def get_common_joint_names():
+    return [
+        "rankle",    # 0  "lankle",    # 0
+        "rknee",     # 1  "lknee",     # 1
+        "rhip",      # 2  "lhip",      # 2
+        "lhip",      # 3  "rhip",      # 3
+        "lknee",     # 4  "rknee",     # 4
+        "lankle",    # 5  "rankle",    # 5
+        "rwrist",    # 6  "lwrist",    # 6
+        "relbow",    # 7  "lelbow",    # 7
+        "rshoulder", # 8  "lshoulder", # 8
+        "lshoulder", # 9  "rshoulder", # 9
+        "lelbow",    # 10  "relbow",    # 10
+        "lwrist",    # 11  "rwrist",    # 11
+        "neck",      # 12  "neck",      # 12
+        "headtop",   # 13  "headtop",   # 13
+    ]
+def get_coco_common_joint_names():
+    return [
+        "nose",      # 0
+        "leye",      # 1
+        "reye",      # 2
+        "lear",      # 3
+        "rear",      # 4
+        "lshoulder", # 5
+        "rshoulder", # 6
+        "lelbow",    # 7
+        "relbow",    # 8
+        "lwrist",    # 9
+        "rwrist",    # 10
+        "lhip",      # 11
+        "rhip",      # 12
+        "lknee",     # 13
+        "rknee",     # 14
+        "lankle",    # 15
+        "rankle",    # 16
+        "neck",      # 17  "neck",      # 12
+        "headtop",   # 18  "headtop",   # 13
+    ]
+def get_common_skeleton():
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 1, 2 ],
+            [ 3, 4 ],
+            [ 4, 5 ],
+            [ 6, 7 ],
+            [ 7, 8 ],
+            [ 8, 2 ],
+            [ 8, 9 ],
+            [ 9, 3 ],
+            [ 2, 3 ],
+            [ 8, 12],
+            [ 9, 10],
+            [12, 9 ],
+            [10, 11],
+            [12, 13],
+        ]
+    )
+def get_coco_joint_names():
+    return [
+        "nose",      # 0
+        "leye",      # 1
+        "reye",      # 2
+        "lear",      # 3
+        "rear",      # 4
+        "lshoulder", # 5
+        "rshoulder", # 6
+        "lelbow",    # 7
+        "relbow",    # 8
+        "lwrist",    # 9
+        "rwrist",    # 10
+        "lhip",      # 11
+        "rhip",      # 12
+        "lknee",     # 13
+        "rknee",     # 14
+        "lankle",    # 15
+        "rankle",    # 16
+    ]
+def get_coco_skeleton():
+    # 0  - nose,
+    # 1  - leye,
+    # 2  - reye,
+    # 3  - lear,
+    # 4  - rear,
+    # 5  - lshoulder,
+    # 6  - rshoulder,
+    # 7  - lelbow,
+    # 8  - relbow,
+    # 9  - lwrist,
+    # 10 - rwrist,
+    # 11 - lhip,
+    # 12 - rhip,
+    # 13 - lknee,
+    # 14 - rknee,
+    # 15 - lankle,
+    # 16 - rankle,
+    return np.array(
+        [
+            [15, 13],
+            [13, 11],
+            [16, 14],
+            [14, 12],
+            [11, 12],
+            [ 5, 11],
+            [ 6, 12],
+            [ 5, 6 ],
+            [ 5, 7 ],
+            [ 6, 8 ],
+            [ 7, 9 ],
+            [ 8, 10],
+            [ 1, 2 ],
+            [ 0, 1 ],
+            [ 0, 2 ],
+            [ 1, 3 ],
+            [ 2, 4 ],
+            [ 3, 5 ],
+            [ 4, 6 ]
+        ]
+    )
+def get_mpii_joint_names():
+    return [
+        "rankle",    # 0
+        "rknee",     # 1
+        "rhip",      # 2
+        "lhip",      # 3
+        "lknee",     # 4
+        "lankle",    # 5
+        "hip",       # 6
+        "thorax",    # 7
+        "neck",      # 8
+        "headtop",   # 9
+        "rwrist",    # 10
+        "relbow",    # 11
+        "rshoulder", # 12
+        "lshoulder", # 13
+        "lelbow",    # 14
+        "lwrist",    # 15
+    ]
+def get_mpii_skeleton():
+    # 0  - rankle,
+    # 1  - rknee,
+    # 2  - rhip,
+    # 3  - lhip,
+    # 4  - lknee,
+    # 5  - lankle,
+    # 6  - hip,
+    # 7  - thorax,
+    # 8  - neck,
+    # 9  - headtop,
+    # 10 - rwrist,
+    # 11 - relbow,
+    # 12 - rshoulder,
+    # 13 - lshoulder,
+    # 14 - lelbow,
+    # 15 - lwrist,
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 1, 2 ],
+            [ 2, 6 ],
+            [ 6, 3 ],
+            [ 3, 4 ],
+            [ 4, 5 ],
+            [ 6, 7 ],
+            [ 7, 8 ],
+            [ 8, 9 ],
+            [ 7, 12],
+            [12, 11],
+            [11, 10],
+            [ 7, 13],
+            [13, 14],
+            [14, 15]
+        ]
+    )
+def get_aich_joint_names():
+    return [
+        "rshoulder", # 0
+        "relbow",    # 1
+        "rwrist",    # 2
+        "lshoulder", # 3
+        "lelbow",    # 4
+        "lwrist",    # 5
+        "rhip",      # 6
+        "rknee",     # 7
+        "rankle",    # 8
+        "lhip",      # 9
+        "lknee",     # 10
+        "lankle",    # 11
+        "headtop",   # 12
+        "neck",      # 13
+    ]
+def get_aich_skeleton():
+    # 0  - rshoulder,
+    # 1  - relbow,
+    # 2  - rwrist,
+    # 3  - lshoulder,
+    # 4  - lelbow,
+    # 5  - lwrist,
+    # 6  - rhip,
+    # 7  - rknee,
+    # 8  - rankle,
+    # 9  - lhip,
+    # 10 - lknee,
+    # 11 - lankle,
+    # 12 - headtop,
+    # 13 - neck,
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 1, 2 ],
+            [ 3, 4 ],
+            [ 4, 5 ],
+            [ 6, 7 ],
+            [ 7, 8 ],
+            [ 9, 10],
+            [10, 11],
+            [12, 13],
+            [13, 0 ],
+            [13, 3 ],
+            [ 0, 6 ],
+            [ 3, 9 ]
+        ]
+    )
+def get_3dpw_joint_names():
+    return [
+        "nose",      # 0
+        "thorax",    # 1
+        "rshoulder", # 2
+        "relbow",    # 3
+        "rwrist",    # 4
+        "lshoulder", # 5
+        "lelbow",    # 6
+        "lwrist",    # 7
+        "rhip",      # 8
+        "rknee",     # 9
+        "rankle",    # 10
+        "lhip",      # 11
+        "lknee",     # 12
+        "lankle",    # 13
+    ]
+def get_3dpw_skeleton():
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 1, 2 ],
+            [ 2, 3 ],
+            [ 3, 4 ],
+            [ 1, 5 ],
+            [ 5, 6 ],
+            [ 6, 7 ],
+            [ 2, 8 ],
+            [ 5, 11],
+            [ 8, 11],
+            [ 8, 9 ],
+            [ 9, 10],
+            [11, 12],
+            [12, 13]
+        ]
+    )
+def get_smplcoco_joint_names():
+    return [
+        "rankle",    # 0
+        "rknee",     # 1
+        "rhip",      # 2
+        "lhip",      # 3
+        "lknee",     # 4
+        "lankle",    # 5
+        "rwrist",    # 6
+        "relbow",    # 7
+        "rshoulder", # 8
+        "lshoulder", # 9
+        "lelbow",    # 10
+        "lwrist",    # 11
+        "neck",      # 12
+        "headtop",   # 13
+        "nose",      # 14
+        "leye",      # 15
+        "reye",      # 16
+        "lear",      # 17
+        "rear",      # 18
+    ]
+def get_smplcoco_skeleton():
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 1, 2 ],
+            [ 3, 4 ],
+            [ 4, 5 ],
+            [ 6, 7 ],
+            [ 7, 8 ],
+            [ 8, 12],
+            [12, 9 ],
+            [ 9, 10],
+            [10, 11],
+            [12, 13],
+            [14, 15],
+            [15, 17],
+            [16, 18],
+            [14, 16],
+            [ 8, 2 ],
+            [ 9, 3 ],
+            [ 2, 3 ],
+        ]
+    )
+def get_smpl_joint_names():
+    return [
+        'hips',            # 0
+        'leftUpLeg',       # 1
+        'rightUpLeg',      # 2
+        'spine',           # 3
+        'leftLeg',         # 4
+        'rightLeg',        # 5
+        'spine1',          # 6
+        'leftFoot',        # 7
+        'rightFoot',       # 8
+        'spine2',          # 9
+        'leftToeBase',     # 10
+        'rightToeBase',    # 11
+        'neck',            # 12
+        'leftShoulder',    # 13
+        'rightShoulder',   # 14
+        'head',            # 15
+        'leftArm',         # 16
+        'rightArm',        # 17
+        'leftForeArm',     # 18
+        'rightForeArm',    # 19
+        'leftHand',        # 20
+        'rightHand',       # 21
+        'leftHandIndex1',  # 22
+        'rightHandIndex1', # 23
+    ]
+def get_smpl_skeleton():
+    return np.array(
+        [
+            [ 0, 1 ],
+            [ 0, 2 ],
+            [ 0, 3 ],
+            [ 1, 4 ],
+            [ 2, 5 ],
+            [ 3, 6 ],
+            [ 4, 7 ],
+            [ 5, 8 ],
+            [ 6, 9 ],
+            [ 7, 10],
+            [ 8, 11],
+            [ 9, 12],
+            [ 9, 13],
+            [ 9, 14],
+            [12, 15],
+            [13, 16],
+            [14, 17],
+            [16, 18],
+            [17, 19],
+            [18, 20],
+            [19, 21],
+            [20, 22],
+            [21, 23],
+        ]
+    )

lib/utils/transforms.py ADDED Viewed

	@@ -0,0 +1,828 @@

+"""This transforms function is mainly borrowed from PyTorch3D"""
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+from typing import Optional, Union
+import torch
+import torch.nn.functional as F
+Device = Union[str, torch.device]
+"""
+The transformation matrices returned from the functions in this file assume
+the points on which the transformation will be applied are column vectors.
+i.e. the R matrix is structured as
+    R = [
+            [Rxx, Rxy, Rxz],
+            [Ryx, Ryy, Ryz],
+            [Rzx, Rzy, Rzz],
+        ]  # (3, 3)
+This matrix can be applied to column vectors by post multiplication
+by the points e.g.
+    points = [[0], [1], [2]]  # (3 x 1) xyz coordinates of a point
+    transformed_points = R * points
+To apply the same matrix to points which are row vectors, the R matrix
+can be transposed and pre multiplied by the points:
+e.g.
+    points = [[0, 1, 2]]  # (1 x 3) xyz coordinates of a point
+    transformed_points = points * R.transpose(1, 0)
+"""
+def quaternion_to_matrix(quaternions: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as quaternions to rotation matrices.
+    Args:
+        quaternions: quaternions with real part first,
+            as tensor of shape (..., 4).
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    r, i, j, k = torch.unbind(quaternions, -1)
+    # pyre-fixme[58]: `/` is not supported for operand types `float` and `Tensor`.
+    two_s = 2.0 / (quaternions * quaternions).sum(-1)
+    o = torch.stack(
+        (
+            1 - two_s * (j * j + k * k),
+            two_s * (i * j - k * r),
+            two_s * (i * k + j * r),
+            two_s * (i * j + k * r),
+            1 - two_s * (i * i + k * k),
+            two_s * (j * k - i * r),
+            two_s * (i * k - j * r),
+            two_s * (j * k + i * r),
+            1 - two_s * (i * i + j * j),
+        ),
+        -1,
+    )
+    return o.reshape(quaternions.shape[:-1] + (3, 3))
+def _copysign(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+    """
+    Return a tensor where each element has the absolute value taken from the,
+    corresponding element of a, with sign taken from the corresponding
+    element of b. This is like the standard copysign floating-point operation,
+    but is not careful about negative 0 and NaN.
+    Args:
+        a: source tensor.
+        b: tensor whose signs will be used, of the same shape as a.
+    Returns:
+        Tensor of the same shape as a with the signs of b.
+    """
+    signs_differ = (a < 0) != (b < 0)
+    return torch.where(signs_differ, -a, a)
+def _sqrt_positive_part(x: torch.Tensor) -> torch.Tensor:
+    """
+    Returns torch.sqrt(torch.max(0, x))
+    but with a zero subgradient where x is 0.
+    """
+    ret = torch.zeros_like(x)
+    positive_mask = x > 0
+    ret[positive_mask] = torch.sqrt(x[positive_mask])
+    return ret
+def matrix_to_quaternion(matrix: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as rotation matrices to quaternions.
+    Args:
+        matrix: Rotation matrices as tensor of shape (..., 3, 3).
+    Returns:
+        quaternions with real part first, as tensor of shape (..., 4).
+    """
+    if matrix.size(-1) != 3 or matrix.size(-2) != 3:
+        raise ValueError(f"Invalid rotation matrix shape {matrix.shape}.")
+    batch_dim = matrix.shape[:-2]
+    m00, m01, m02, m10, m11, m12, m20, m21, m22 = torch.unbind(
+        matrix.reshape(batch_dim + (9,)), dim=-1
+    )
+    q_abs = _sqrt_positive_part(
+        torch.stack(
+            [
+                1.0 + m00 + m11 + m22,
+                1.0 + m00 - m11 - m22,
+                1.0 - m00 + m11 - m22,
+                1.0 - m00 - m11 + m22,
+            ],
+            dim=-1,
+        )
+    )
+    # we produce the desired quaternion multiplied by each of r, i, j, k
+    quat_by_rijk = torch.stack(
+        [
+            # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+            #  `int`.
+            torch.stack([q_abs[..., 0] ** 2, m21 - m12, m02 - m20, m10 - m01], dim=-1),
+            # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+            #  `int`.
+            torch.stack([m21 - m12, q_abs[..., 1] ** 2, m10 + m01, m02 + m20], dim=-1),
+            # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+            #  `int`.
+            torch.stack([m02 - m20, m10 + m01, q_abs[..., 2] ** 2, m12 + m21], dim=-1),
+            # pyre-fixme[58]: `**` is not supported for operand types `Tensor` and
+            #  `int`.
+            torch.stack([m10 - m01, m20 + m02, m21 + m12, q_abs[..., 3] ** 2], dim=-1),
+        ],
+        dim=-2,
+    )
+    # We floor here at 0.1 but the exact level is not important; if q_abs is small,
+    # the candidate won't be picked.
+    flr = torch.tensor(0.1).to(dtype=q_abs.dtype, device=q_abs.device)
+    quat_candidates = quat_by_rijk / (2.0 * q_abs[..., None].max(flr))
+    # if not for numerical problems, quat_candidates[i] should be same (up to a sign),
+    # forall i; we pick the best-conditioned one (with the largest denominator)
+    return quat_candidates[
+        F.one_hot(q_abs.argmax(dim=-1), num_classes=4) > 0.5, :
+    ].reshape(batch_dim + (4,))
+def _axis_angle_rotation(axis: str, angle: torch.Tensor) -> torch.Tensor:
+    """
+    Return the rotation matrices for one of the rotations about an axis
+    of which Euler angles describe, for each value of the angle given.
+    Args:
+        axis: Axis label "X" or "Y or "Z".
+        angle: any shape tensor of Euler angles in radians
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    cos = torch.cos(angle)
+    sin = torch.sin(angle)
+    one = torch.ones_like(angle)
+    zero = torch.zeros_like(angle)
+    if axis == "X":
+        R_flat = (one, zero, zero, zero, cos, -sin, zero, sin, cos)
+    elif axis == "Y":
+        R_flat = (cos, zero, sin, zero, one, zero, -sin, zero, cos)
+    elif axis == "Z":
+        R_flat = (cos, -sin, zero, sin, cos, zero, zero, zero, one)
+    else:
+        raise ValueError("letter must be either X, Y or Z.")
+    return torch.stack(R_flat, -1).reshape(angle.shape + (3, 3))
+def euler_angles_to_matrix(euler_angles: torch.Tensor, convention: str) -> torch.Tensor:
+    """
+    Convert rotations given as Euler angles in radians to rotation matrices.
+    Args:
+        euler_angles: Euler angles in radians as tensor of shape (..., 3).
+        convention: Convention string of three uppercase letters from
+            {"X", "Y", and "Z"}.
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    if euler_angles.dim() == 0 or euler_angles.shape[-1] != 3:
+        raise ValueError("Invalid input euler angles.")
+    if len(convention) != 3:
+        raise ValueError("Convention must have 3 letters.")
+    if convention[1] in (convention[0], convention[2]):
+        raise ValueError(f"Invalid convention {convention}.")
+    for letter in convention:
+        if letter not in ("X", "Y", "Z"):
+            raise ValueError(f"Invalid letter {letter} in convention string.")
+    matrices = [
+        _axis_angle_rotation(c, e)
+        for c, e in zip(convention, torch.unbind(euler_angles, -1))
+    ]
+    # return functools.reduce(torch.matmul, matrices)
+    return torch.matmul(torch.matmul(matrices[0], matrices[1]), matrices[2])
+def _angle_from_tan(
+    axis: str, other_axis: str, data, horizontal: bool, tait_bryan: bool
+) -> torch.Tensor:
+    """
+    Extract the first or third Euler angle from the two members of
+    the matrix which are positive constant times its sine and cosine.
+    Args:
+        axis: Axis label "X" or "Y or "Z" for the angle we are finding.
+        other_axis: Axis label "X" or "Y or "Z" for the middle axis in the
+            convention.
+        data: Rotation matrices as tensor of shape (..., 3, 3).
+        horizontal: Whether we are looking for the angle for the third axis,
+            which means the relevant entries are in the same row of the
+            rotation matrix. If not, they are in the same column.
+        tait_bryan: Whether the first and third axes in the convention differ.
+    Returns:
+        Euler Angles in radians for each matrix in data as a tensor
+        of shape (...).
+    """
+    i1, i2 = {"X": (2, 1), "Y": (0, 2), "Z": (1, 0)}[axis]
+    if horizontal:
+        i2, i1 = i1, i2
+    even = (axis + other_axis) in ["XY", "YZ", "ZX"]
+    if horizontal == even:
+        return torch.atan2(data[..., i1], data[..., i2])
+    if tait_bryan:
+        return torch.atan2(-data[..., i2], data[..., i1])
+    return torch.atan2(data[..., i2], -data[..., i1])
+def _index_from_letter(letter: str) -> int:
+    if letter == "X":
+        return 0
+    if letter == "Y":
+        return 1
+    if letter == "Z":
+        return 2
+    raise ValueError("letter must be either X, Y or Z.")
+def matrix_to_euler_angles(matrix: torch.Tensor, convention: str) -> torch.Tensor:
+    """
+    Convert rotations given as rotation matrices to Euler angles in radians.
+    Args:
+        matrix: Rotation matrices as tensor of shape (..., 3, 3).
+        convention: Convention string of three uppercase letters.
+    Returns:
+        Euler angles in radians as tensor of shape (..., 3).
+    """
+    if len(convention) != 3:
+        raise ValueError("Convention must have 3 letters.")
+    if convention[1] in (convention[0], convention[2]):
+        raise ValueError(f"Invalid convention {convention}.")
+    for letter in convention:
+        if letter not in ("X", "Y", "Z"):
+            raise ValueError(f"Invalid letter {letter} in convention string.")
+    if matrix.size(-1) != 3 or matrix.size(-2) != 3:
+        raise ValueError(f"Invalid rotation matrix shape {matrix.shape}.")
+    i0 = _index_from_letter(convention[0])
+    i2 = _index_from_letter(convention[2])
+    tait_bryan = i0 != i2
+    if tait_bryan:
+        central_angle = torch.asin(
+            matrix[..., i0, i2] * (-1.0 if i0 - i2 in [-1, 2] else 1.0)
+        )
+    else:
+        central_angle = torch.acos(matrix[..., i0, i0])
+    o = (
+        _angle_from_tan(
+            convention[0], convention[1], matrix[..., i2], False, tait_bryan
+        ),
+        central_angle,
+        _angle_from_tan(
+            convention[2], convention[1], matrix[..., i0, :], True, tait_bryan
+        ),
+    )
+    return torch.stack(o, -1)
+def random_quaternions(
+    n: int, dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+    """
+    Generate random quaternions representing rotations,
+    i.e. versors with nonnegative real part.
+    Args:
+        n: Number of quaternions in a batch to return.
+        dtype: Type to return.
+        device: Desired device of returned tensor. Default:
+            uses the current device for the default tensor type.
+    Returns:
+        Quaternions as tensor of shape (N, 4).
+    """
+    if isinstance(device, str):
+        device = torch.device(device)
+    o = torch.randn((n, 4), dtype=dtype, device=device)
+    s = (o * o).sum(1)
+    o = o / _copysign(torch.sqrt(s), o[:, 0])[:, None]
+    return o
+def random_rotations(
+    n: int, dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+    """
+    Generate random rotations as 3x3 rotation matrices.
+    Args:
+        n: Number of rotation matrices in a batch to return.
+        dtype: Type to return.
+        device: Device of returned tensor. Default: if None,
+            uses the current device for the default tensor type.
+    Returns:
+        Rotation matrices as tensor of shape (n, 3, 3).
+    """
+    quaternions = random_quaternions(n, dtype=dtype, device=device)
+    return quaternion_to_matrix(quaternions)
+def random_rotation(
+    dtype: Optional[torch.dtype] = None, device: Optional[Device] = None
+) -> torch.Tensor:
+    """
+    Generate a single random 3x3 rotation matrix.
+    Args:
+        dtype: Type to return
+        device: Device of returned tensor. Default: if None,
+            uses the current device for the default tensor type
+    Returns:
+        Rotation matrix as tensor of shape (3, 3).
+    """
+    return random_rotations(1, dtype, device)[0]
+def standardize_quaternion(quaternions: torch.Tensor) -> torch.Tensor:
+    """
+    Convert a unit quaternion to a standard form: one in which the real
+    part is non negative.
+    Args:
+        quaternions: Quaternions with real part first,
+            as tensor of shape (..., 4).
+    Returns:
+        Standardized quaternions as tensor of shape (..., 4).
+    """
+    return torch.where(quaternions[..., 0:1] < 0, -quaternions, quaternions)
+def quaternion_raw_multiply(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+    """
+    Multiply two quaternions.
+    Usual torch rules for broadcasting apply.
+    Args:
+        a: Quaternions as tensor of shape (..., 4), real part first.
+        b: Quaternions as tensor of shape (..., 4), real part first.
+    Returns:
+        The product of a and b, a tensor of quaternions shape (..., 4).
+    """
+    aw, ax, ay, az = torch.unbind(a, -1)
+    bw, bx, by, bz = torch.unbind(b, -1)
+    ow = aw * bw - ax * bx - ay * by - az * bz
+    ox = aw * bx + ax * bw + ay * bz - az * by
+    oy = aw * by - ax * bz + ay * bw + az * bx
+    oz = aw * bz + ax * by - ay * bx + az * bw
+    return torch.stack((ow, ox, oy, oz), -1)
+def quaternion_multiply(a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
+    """
+    Multiply two quaternions representing rotations, returning the quaternion
+    representing their composition, i.e. the versor with nonnegative real part.
+    Usual torch rules for broadcasting apply.
+    Args:
+        a: Quaternions as tensor of shape (..., 4), real part first.
+        b: Quaternions as tensor of shape (..., 4), real part first.
+    Returns:
+        The product of a and b, a tensor of quaternions of shape (..., 4).
+    """
+    ab = quaternion_raw_multiply(a, b)
+    return standardize_quaternion(ab)
+def quaternion_invert(quaternion: torch.Tensor) -> torch.Tensor:
+    """
+    Given a quaternion representing rotation, get the quaternion representing
+    its inverse.
+    Args:
+        quaternion: Quaternions as tensor of shape (..., 4), with real part
+            first, which must be versors (unit quaternions).
+    Returns:
+        The inverse, a tensor of quaternions of shape (..., 4).
+    """
+    scaling = torch.tensor([1, -1, -1, -1], device=quaternion.device)
+    return quaternion * scaling
+def quaternion_apply(quaternion: torch.Tensor, point: torch.Tensor) -> torch.Tensor:
+    """
+    Apply the rotation given by a quaternion to a 3D point.
+    Usual torch rules for broadcasting apply.
+    Args:
+        quaternion: Tensor of quaternions, real part first, of shape (..., 4).
+        point: Tensor of 3D points of shape (..., 3).
+    Returns:
+        Tensor of rotated points of shape (..., 3).
+    """
+    if point.size(-1) != 3:
+        raise ValueError(f"Points are not in 3D, {point.shape}.")
+    real_parts = point.new_zeros(point.shape[:-1] + (1,))
+    point_as_quaternion = torch.cat((real_parts, point), -1)
+    out = quaternion_raw_multiply(
+        quaternion_raw_multiply(quaternion, point_as_quaternion),
+        quaternion_invert(quaternion),
+    )
+    return out[..., 1:]
+def axis_angle_to_matrix(axis_angle: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as axis/angle to rotation matrices.
+    Args:
+        axis_angle: Rotations given as a vector in axis angle form,
+            as a tensor of shape (..., 3), where the magnitude is
+            the angle turned anticlockwise in radians around the
+            vector's direction.
+    Returns:
+        Rotation matrices as tensor of shape (..., 3, 3).
+    """
+    return quaternion_to_matrix(axis_angle_to_quaternion(axis_angle))
+def matrix_to_axis_angle(matrix: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as rotation matrices to axis/angle.
+    Args:
+        matrix: Rotation matrices as tensor of shape (..., 3, 3).
+    Returns:
+        Rotations given as a vector in axis angle form, as a tensor
+            of shape (..., 3), where the magnitude is the angle
+            turned anticlockwise in radians around the vector's
+            direction.
+    """
+    return quaternion_to_axis_angle(matrix_to_quaternion(matrix))
+def axis_angle_to_quaternion(axis_angle: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as axis/angle to quaternions.
+    Args:
+        axis_angle: Rotations given as a vector in axis angle form,
+            as a tensor of shape (..., 3), where the magnitude is
+            the angle turned anticlockwise in radians around the
+            vector's direction.
+    Returns:
+        quaternions with real part first, as tensor of shape (..., 4).
+    """
+    angles = torch.norm(axis_angle, p=2, dim=-1, keepdim=True)
+    half_angles = angles * 0.5
+    eps = 1e-6
+    small_angles = angles.abs() < eps
+    sin_half_angles_over_angles = torch.empty_like(angles)
+    sin_half_angles_over_angles[~small_angles] = (
+        torch.sin(half_angles[~small_angles]) / angles[~small_angles]
+    )
+    # for x small, sin(x/2) is about x/2 - (x/2)^3/6
+    # so sin(x/2)/x is about 1/2 - (x*x)/48
+    sin_half_angles_over_angles[small_angles] = (
+        0.5 - (angles[small_angles] * angles[small_angles]) / 48
+    )
+    quaternions = torch.cat(
+        [torch.cos(half_angles), axis_angle * sin_half_angles_over_angles], dim=-1
+    )
+    return quaternions
+def quaternion_to_axis_angle(quaternions: torch.Tensor) -> torch.Tensor:
+    """
+    Convert rotations given as quaternions to axis/angle.
+    Args:
+        quaternions: quaternions with real part first,
+            as tensor of shape (..., 4).
+    Returns:
+        Rotations given as a vector in axis angle form, as a tensor
+            of shape (..., 3), where the magnitude is the angle
+            turned anticlockwise in radians around the vector's
+            direction.
+    """
+    norms = torch.norm(quaternions[..., 1:], p=2, dim=-1, keepdim=True)
+    half_angles = torch.atan2(norms, quaternions[..., :1])
+    angles = 2 * half_angles
+    eps = 1e-6
+    small_angles = angles.abs() < eps
+    sin_half_angles_over_angles = torch.empty_like(angles)
+    sin_half_angles_over_angles[~small_angles] = (
+        torch.sin(half_angles[~small_angles]) / angles[~small_angles]
+    )
+    # for x small, sin(x/2) is about x/2 - (x/2)^3/6
+    # so sin(x/2)/x is about 1/2 - (x*x)/48
+    sin_half_angles_over_angles[small_angles] = (
+        0.5 - (angles[small_angles] * angles[small_angles]) / 48
+    )
+    return quaternions[..., 1:] / sin_half_angles_over_angles
+def rotation_6d_to_matrix(d6: torch.Tensor) -> torch.Tensor:
+    """
+    Converts 6D rotation representation by Zhou et al. [1] to rotation matrix
+    using Gram--Schmidt orthogonalization per Section B of [1].
+    Args:
+        d6: 6D rotation representation, of size (*, 6)
+    Returns:
+        batch of rotation matrices of size (*, 3, 3)
+    [1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H.
+    On the Continuity of Rotation Representations in Neural Networks.
+    IEEE Conference on Computer Vision and Pattern Recognition, 2019.
+    Retrieved from http://arxiv.org/abs/1812.07035
+    """
+    a1, a2 = d6[..., :3], d6[..., 3:]
+    b1 = F.normalize(a1, dim=-1)
+    b2 = a2 - (b1 * a2).sum(-1, keepdim=True) * b1
+    b2 = F.normalize(b2, dim=-1)
+    b3 = torch.cross(b1, b2, dim=-1)
+    return torch.stack((b1, b2, b3), dim=-2)
+def matrix_to_rotation_6d(matrix: torch.Tensor) -> torch.Tensor:
+    """
+    Converts rotation matrices to 6D rotation representation by Zhou et al. [1]
+    by dropping the last row. Note that 6D representation is not unique.
+    Args:
+        matrix: batch of rotation matrices of size (*, 3, 3)
+    Returns:
+        6D rotation representation, of size (*, 6)
+    [1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H.
+    On the Continuity of Rotation Representations in Neural Networks.
+    IEEE Conference on Computer Vision and Pattern Recognition, 2019.
+    Retrieved from http://arxiv.org/abs/1812.07035
+    """
+    batch_dim = matrix.size()[:-2]
+    return matrix[..., :2, :].clone().reshape(batch_dim + (6,))
+def clean_rotation_6d(d6d: torch.Tensor) -> torch.Tensor:
+    """
+    Clean rotation 6d by converting it to matrix and then reconvert to d6
+    """
+    matrix = rotation_6d_to_matrix(d6d)
+    d6d = matrix_to_rotation_6d(matrix)
+    return d6d
+def rot6d_to_rotmat(x):
+    """Convert 6D rotation representation to 3x3 rotation matrix.
+    Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
+    Input:
+        (B,6) Batch of 6-D rotation representations
+    Output:
+        (B,3,3) Batch of corresponding rotation matrices
+    """
+    if x.shape[-1] == 6:
+        batch_dim = x.size()[:-1]
+    else:
+        x = x.reshape(*x.shape[:-1], -1, 6)
+        batch_dim = x.size()[:-1]
+    x = x.reshape(*batch_dim, 3, 2)
+    a1, a2 = x[..., 0], x[..., 1]
+    b1 = F.normalize(a1, dim=-1)
+    b2 = a2 - (b1 * a2).sum(-1, keepdim=True) * b1
+    b2 = F.normalize(b2, dim=-1)
+    b3 = torch.cross(b1, b2, dim=-1)
+    return torch.stack((b1, b2, b3), dim=-1)
+def rotmat_to_rot6d(x):
+    """Inverse computation of rot6d_to_rotmat."""
+    batch_dim = x.size()[:-2]
+    return x[..., :2].clone().reshape(batch_dim + (6,))
+def convert_rotation_matrix_to_homogeneous(rotation_matrix):
+    "Add empty translation vector to Rotation matrix"""
+    transl = torch.zeros_like(rotation_matrix[...,:1])
+    rotation_matrix_hom = torch.cat((rotation_matrix, transl), dim=-1)
+    return rotation_matrix_hom
+def rotation_matrix_to_angle_axis(rotation_matrix):
+    """Convert 3x4 rotation matrix to Rodrigues vector
+    Args:
+        rotation_matrix (Tensor): rotation matrix.
+    Returns:
+        Tensor: Rodrigues vector transformation.
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 3)`
+    Example:
+        >>> input = torch.rand(2, 3, 4)  # Nx3x4
+        >>> output = tgm.rotation_matrix_to_angle_axis(input)  # Nx3
+    """
+    if rotation_matrix.size(-1) == 3:
+        rotation_matrix = convert_rotation_matrix_to_homogeneous(rotation_matrix)
+    quaternion = rotation_matrix_to_quaternion(rotation_matrix)
+    return quaternion_to_angle_axis(quaternion)
+def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
+    """Convert 3x4 rotation matrix to 4d quaternion vector
+    This algorithm is based on algorithm described in
+    https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
+    Args:
+        rotation_matrix (Tensor): the rotation matrix to convert.
+    Return:
+        Tensor: the rotation in quaternion
+    Shape:
+        - Input: :math:`(N, 3, 4)`
+        - Output: :math:`(N, 4)`
+    Example:
+        >>> input = torch.rand(4, 3, 4)  # Nx3x4
+        >>> output = tgm.rotation_matrix_to_quaternion(input)  # Nx4
+    """
+    if not torch.is_tensor(rotation_matrix):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(rotation_matrix)))
+    if len(rotation_matrix.shape) > 3:
+        raise ValueError(
+            "Input size must be a three dimensional tensor. Got {}".format(
+                rotation_matrix.shape))
+    if not rotation_matrix.shape[-2:] == (3, 4):
+        raise ValueError(
+            "Input size must be a N x 3 x 4  tensor. Got {}".format(
+                rotation_matrix.shape))
+    rmat_t = torch.transpose(rotation_matrix, 1, 2)
+    mask_d2 = rmat_t[:, 2, 2] < eps
+    mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
+    mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
+    t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
+    t0_rep = t0.repeat(4, 1).t()
+    t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
+    q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
+                      t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
+    t1_rep = t1.repeat(4, 1).t()
+    t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
+                      rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
+                      rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
+    t2_rep = t2.repeat(4, 1).t()
+    t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
+    q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
+                      rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
+                      rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
+    t3_rep = t3.repeat(4, 1).t()
+    mask_c0 = mask_d2 * mask_d0_d1
+    # mask_c1 = mask_d2 * (1 - mask_d0_d1)
+    mask_c1 = mask_d2 * ~mask_d0_d1
+    # mask_c2 = (1 - mask_d2) * mask_d0_nd1
+    mask_c2 = ~mask_d2 * mask_d0_nd1
+    # mask_c3 = (1 - mask_d2) * (1 - mask_d0_nd1)
+    mask_c3 = ~mask_d2 * ~mask_d0_nd1
+    mask_c0 = mask_c0.view(-1, 1).type_as(q0)
+    mask_c1 = mask_c1.view(-1, 1).type_as(q1)
+    mask_c2 = mask_c2.view(-1, 1).type_as(q2)
+    mask_c3 = mask_c3.view(-1, 1).type_as(q3)
+    q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
+    q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 +  # noqa
+                    t2_rep * mask_c2 + t3_rep * mask_c3)  # noqa
+    q *= 0.5
+    return q
+def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
+    """Convert quaternion vector to angle axis of rotation.
+    Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
+    Args:
+        quaternion (torch.Tensor): tensor with quaternions.
+    Return:
+        torch.Tensor: tensor with angle axis of rotation.
+    Shape:
+        - Input: :math:`(*, 4)` where `*` means, any number of dimensions
+        - Output: :math:`(*, 3)`
+    Example:
+        >>> quaternion = torch.rand(2, 4)  # Nx4
+        >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion)  # Nx3
+    """
+    if not torch.is_tensor(quaternion):
+        raise TypeError("Input type is not a torch.Tensor. Got {}".format(
+            type(quaternion)))
+    if not quaternion.shape[-1] == 4:
+        raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
+                         .format(quaternion.shape))
+    # unpack input and compute conversion
+    q1: torch.Tensor = quaternion[..., 1]
+    q2: torch.Tensor = quaternion[..., 2]
+    q3: torch.Tensor = quaternion[..., 3]
+    sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
+    sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
+    cos_theta: torch.Tensor = quaternion[..., 0]
+    two_theta: torch.Tensor = 2.0 * torch.where(
+        cos_theta < 0.0,
+        torch.atan2(-sin_theta, -cos_theta),
+        torch.atan2(sin_theta, cos_theta))
+    k_pos: torch.Tensor = two_theta / sin_theta
+    k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
+    k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
+    angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
+    angle_axis[..., 0] += q1 * k
+    angle_axis[..., 1] += q2 * k
+    angle_axis[..., 2] += q3 * k
+    return angle_axis
+def avg_rot(rot):
+    # input [B,...,3,3] --> output [...,3,3]
+    rot = rot.mean(dim=0)
+    U, _, V = torch.svd(rot)
+    rot = U @ V.transpose(-1, -2)
+    return rot

lib/utils/utils.py ADDED Viewed

	@@ -0,0 +1,265 @@

+# -*- coding: utf-8 -*-
+# Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. (MPG) is
+# holder of all proprietary rights on this computer program.
+# You can only use this computer program if you have closed
+# a license agreement with MPG or you get the right to use the computer
+# program from someone who is authorized to grant you that right.
+# Any use of the computer program without a valid license is prohibited and
+# liable to prosecution.
+#
+# Copyright©2019 Max-Planck-Gesellschaft zur Förderung
+# der Wissenschaften e.V. (MPG). acting on behalf of its Max Planck Institute
+# for Intelligent Systems. All rights reserved.
+#
+# Contact: [email protected]
+import os
+import yaml
+import torch
+import shutil
+import logging
+import operator
+from tqdm import tqdm
+from os import path as osp
+from functools import reduce
+from typing import List, Union
+from collections import OrderedDict
+from torch.optim.lr_scheduler import _LRScheduler
+class CustomScheduler(_LRScheduler):
+    def __init__(self, optimizer, lr_lambda):
+        self.lr_lambda = lr_lambda
+        super(CustomScheduler, self).__init__(optimizer)
+    def get_lr(self):
+        return [base_lr * self.lr_lambda(self.last_epoch)
+                for base_lr in self.base_lrs]
+def lr_decay_fn(epoch):
+    if epoch == 0: return 1.0
+    if epoch % big_epoch == 0:
+        return big_decay
+    else:
+        return small_decay
+def save_obj(v, f, file_name='output.obj'):
+    obj_file = open(file_name, 'w')
+    for i in range(len(v)):
+        obj_file.write('v ' + str(v[i][0]) + ' ' + str(v[i][1]) + ' ' + str(v[i][2]) + '\n')
+    for i in range(len(f)):
+        obj_file.write('f ' + str(f[i][0]+1) + '/' + str(f[i][0]+1) + ' ' + str(f[i][1]+1) + '/' + str(f[i][1]+1) + ' ' + str(f[i][2]+1) + '/' + str(f[i][2]+1) + '\n')
+    obj_file.close()
+def check_data_pararell(train_weight):
+    new_state_dict = OrderedDict()
+    for k, v in train_weight.items():
+        name = k[7:]  if k.startswith('module') else k  # remove `module.`
+        new_state_dict[name] = v
+    return new_state_dict
+def get_from_dict(dict, keys):
+    return reduce(operator.getitem, keys, dict)
+def tqdm_enumerate(iter):
+    i = 0
+    for y in tqdm(iter):
+        yield i, y
+        i += 1
+def iterdict(d):
+    for k,v in d.items():
+        if isinstance(v, dict):
+            d[k] = dict(v)
+            iterdict(v)
+    return d
+def accuracy(output, target):
+    _, pred = output.topk(1)
+    pred = pred.view(-1)
+    correct = pred.eq(target).sum()
+    return correct.item(), target.size(0) - correct.item()
+def lr_decay(optimizer, step, lr, decay_step, gamma):
+    lr = lr * gamma ** (step/decay_step)
+    for param_group in optimizer.param_groups:
+        param_group['lr'] = lr
+    return lr
+def step_decay(optimizer, step, lr, decay_step, gamma):
+    lr = lr * gamma ** (step / decay_step)
+    for param_group in optimizer.param_groups:
+        param_group['lr'] = lr
+    return lr
+def read_yaml(filename):
+    return yaml.load(open(filename, 'r'))
+def write_yaml(filename, object):
+    with open(filename, 'w') as f:
+        yaml.dump(object, f)
+def save_dict_to_yaml(obj, filename, mode='w'):
+    with open(filename, mode) as f:
+        yaml.dump(obj, f, default_flow_style=False)
+def save_to_file(obj, filename, mode='w'):
+    with open(filename, mode) as f:
+        f.write(obj)
+def concatenate_dicts(dict_list, dim=0):
+    rdict = dict.fromkeys(dict_list[0].keys())
+    for k in rdict.keys():
+        rdict[k] = torch.cat([d[k] for d in dict_list], dim=dim)
+    return rdict
+def bool_to_string(x: Union[List[bool],bool]) ->  Union[List[str],str]:
+    """
+    boolean to string conversion
+    :param x: list or bool to be converted
+    :return: string converted thing
+    """
+    if isinstance(x, bool):
+        return [str(x)]
+    for i, j in enumerate(x):
+        x[i]=str(j)
+    return x
+def checkpoint2model(checkpoint, key='gen_state_dict'):
+    state_dict = checkpoint[key]
+    print(f'Performance of loaded model on 3DPW is {checkpoint["performance"]:.2f}mm')
+    # del state_dict['regressor.mean_theta']
+    return state_dict
+def get_optimizer(cfg, model, optim_type, momentum, stage):
+    if stage == 'stage2':
+        param_list = [{'params': model.integrator.parameters()}]
+        for name, param in model.named_parameters():
+            # if 'integrator' not in name and 'motion_encoder' not in name and 'trajectory_decoder' not in name:
+            if 'integrator' not in name:
+                param_list.append({'params': param, 'lr': cfg.TRAIN.LR_FINETUNE})
+    else:
+        param_list = [{'params': model.parameters()}]
+    if optim_type in ['sgd', 'SGD']:
+        opt = torch.optim.SGD(lr=cfg.TRAIN.LR, params=param_list, momentum=momentum)
+    elif optim_type in ['Adam', 'adam', 'ADAM']:
+        opt = torch.optim.Adam(lr=cfg.TRAIN.LR, params=param_list, weight_decay=cfg.TRAIN.WD, betas=(0.9, 0.999))
+    else:
+        raise ModuleNotFoundError
+    return opt
+def create_logger(logdir, phase='train'):
+    os.makedirs(logdir, exist_ok=True)
+    log_file = osp.join(logdir, f'{phase}_log.txt')
+    head = '%(asctime)-15s %(message)s'
+    logging.basicConfig(filename=log_file,
+                        format=head)
+    logger = logging.getLogger()
+    logger.setLevel(logging.INFO)
+    console = logging.StreamHandler()
+    logging.getLogger('').addHandler(console)
+    return logger
+class AverageMeter(object):
+    def __init__(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
+def prepare_output_dir(cfg, cfg_file):
+    # ==== create logdir
+    logdir = osp.join(cfg.OUTPUT_DIR, cfg.EXP_NAME)
+    os.makedirs(logdir, exist_ok=True)
+    shutil.copy(src=cfg_file, dst=osp.join(cfg.OUTPUT_DIR, 'config.yaml'))
+    cfg.LOGDIR = logdir
+    # save config
+    save_dict_to_yaml(cfg, osp.join(cfg.LOGDIR, 'config.yaml'))
+    return cfg
+def prepare_groundtruth(batch, device):
+    groundtruths = dict()
+    gt_keys = ['pose', 'cam', 'betas', 'kp3d', 'bbox']          # Evaluation
+    gt_keys += ['pose_root', 'vel_root', 'weak_kp2d', 'verts',  # Training
+                'full_kp2d', 'contact', 'R', 'cam_angvel',
+                'has_smpl', 'has_traj', 'has_full_screen', 'has_verts']
+    for gt_key in gt_keys:
+        if gt_key in batch.keys():
+            dtype = torch.float32 if batch[gt_key].dtype == torch.float64 else batch[gt_key].dtype
+            groundtruths[gt_key] = batch[gt_key].to(dtype=dtype, device=device)
+    return groundtruths
+def prepare_auxiliary(batch, device):
+    aux = dict()
+    aux_keys = ['mask', 'bbox', 'res', 'cam_intrinsics', 'init_root', 'cam_angvel']
+    for key in aux_keys:
+        if key in batch.keys():
+            dtype = torch.float32 if batch[key].dtype == torch.float64 else batch[key].dtype
+            aux[key] = batch[key].to(dtype=dtype, device=device)
+    return aux
+def prepare_input(batch, device, use_features):
+    # Input keypoints data
+    kp2d = batch['kp2d'].to(device).float()
+    # Input features
+    if use_features and 'features' in batch.keys():
+        features = batch['features'].to(device).float()
+    else:
+        features = None
+    # Initial SMPL parameters
+    init_smpl = batch['init_pose'].to(device).float()
+    # Initial keypoints
+    init_kp = torch.cat((
+        batch['init_kp3d'], batch['init_kp2d']
+    ), dim=-1).to(device).float()
+    return kp2d, (init_kp, init_smpl), features
+def prepare_batch(batch, device, use_features=True):
+    x, inits, features = prepare_input(batch, device, use_features)
+    aux = prepare_auxiliary(batch, device)
+    groundtruths = prepare_groundtruth(batch, device)
+    return x, inits, features, aux, groundtruths

lib/vis/__pycache__/renderer.cpython-39.pyc ADDED Viewed

Binary file (9.31 kB). View file

lib/vis/__pycache__/run_vis.cpython-39.pyc ADDED Viewed

Binary file (3.08 kB). View file

lib/vis/__pycache__/tools.cpython-39.pyc ADDED Viewed

Binary file (15.9 kB). View file

lib/vis/renderer.py ADDED Viewed

	@@ -0,0 +1,313 @@

+import cv2
+import torch
+import numpy as np
+from pytorch3d.renderer import (
+    PerspectiveCameras,
+    TexturesVertex,
+    PointLights,
+    Materials,
+    RasterizationSettings,
+    MeshRenderer,
+    MeshRasterizer,
+    SoftPhongShader,
+)
+from pytorch3d.structures import Meshes
+from pytorch3d.structures.meshes import join_meshes_as_scene
+from pytorch3d.renderer.cameras import look_at_rotation
+from .tools import get_colors, checkerboard_geometry
+def overlay_image_onto_background(image, mask, bbox, background):
+    if isinstance(image, torch.Tensor):
+        image = image.detach().cpu().numpy()
+    if isinstance(mask, torch.Tensor):
+        mask = mask.detach().cpu().numpy()
+    out_image = background.copy()
+    bbox = bbox[0].int().cpu().numpy().copy()
+    roi_image = out_image[bbox[1]:bbox[3], bbox[0]:bbox[2]]
+    roi_image[mask] = image[mask]
+    out_image[bbox[1]:bbox[3], bbox[0]:bbox[2]] = roi_image
+    return out_image
+def update_intrinsics_from_bbox(K_org, bbox):
+    device, dtype = K_org.device, K_org.dtype
+    K = torch.zeros((K_org.shape[0], 4, 4)
+    ).to(device=device, dtype=dtype)
+    K[:, :3, :3] = K_org.clone()
+    K[:, 2, 2] = 0
+    K[:, 2, -1] = 1
+    K[:, -1, 2] = 1
+    image_sizes = []
+    for idx, bbox in enumerate(bbox):
+        left, upper, right, lower = bbox
+        cx, cy = K[idx, 0, 2], K[idx, 1, 2]
+        new_cx = cx - left
+        new_cy = cy - upper
+        new_height = max(lower - upper, 1)
+        new_width = max(right - left, 1)
+        new_cx = new_width - new_cx
+        new_cy = new_height - new_cy
+        K[idx, 0, 2] = new_cx
+        K[idx, 1, 2] = new_cy
+        image_sizes.append((int(new_height), int(new_width)))
+    return K, image_sizes
+def perspective_projection(x3d, K, R=None, T=None):
+    if R != None:
+        x3d = torch.matmul(R, x3d.transpose(1, 2)).transpose(1, 2)
+    if T != None:
+        x3d = x3d + T.transpose(1, 2)
+    x2d = torch.div(x3d, x3d[..., 2:])
+    x2d = torch.matmul(K, x2d.transpose(-1, -2)).transpose(-1, -2)[..., :2]
+    return x2d
+def compute_bbox_from_points(X, img_w, img_h, scaleFactor=1.2):
+    left = torch.clamp(X.min(1)[0][:, 0], min=0, max=img_w)
+    right = torch.clamp(X.max(1)[0][:, 0], min=0, max=img_w)
+    top = torch.clamp(X.min(1)[0][:, 1], min=0, max=img_h)
+    bottom = torch.clamp(X.max(1)[0][:, 1], min=0, max=img_h)
+    cx = (left + right) / 2
+    cy = (top + bottom) / 2
+    width = (right - left)
+    height = (bottom - top)
+    new_left = torch.clamp(cx - width/2 * scaleFactor, min=0, max=img_w-1)
+    new_right = torch.clamp(cx + width/2 * scaleFactor, min=1, max=img_w)
+    new_top = torch.clamp(cy - height / 2 * scaleFactor, min=0, max=img_h-1)
+    new_bottom = torch.clamp(cy + height / 2 * scaleFactor, min=1, max=img_h)
+    bbox = torch.stack((new_left.detach(), new_top.detach(),
+                        new_right.detach(), new_bottom.detach())).int().float().T
+    return bbox
+class Renderer():
+    def __init__(self, width, height, focal_length, device, faces=None):
+        self.width = width
+        self.height = height
+        self.focal_length = focal_length
+        self.device = device
+        if faces is not None:
+            self.faces = torch.from_numpy(
+                (faces).astype('int')
+            ).unsqueeze(0).to(self.device)
+        self.initialize_camera_params()
+        self.lights = PointLights(device=device, location=[[0.0, 0.0, -10.0]])
+        self.create_renderer()
+    def create_renderer(self):
+        self.renderer = MeshRenderer(
+            rasterizer=MeshRasterizer(
+                raster_settings=RasterizationSettings(
+                    image_size=self.image_sizes[0],
+                    blur_radius=1e-5),
+            ),
+            shader=SoftPhongShader(
+                device=self.device,
+                lights=self.lights,
+            )
+        )
+    def create_camera(self, R=None, T=None):
+        if R is not None:
+            self.R = R.clone().view(1, 3, 3).to(self.device)
+        if T is not None:
+            self.T = T.clone().view(1, 3).to(self.device)
+        return PerspectiveCameras(
+            device=self.device,
+            R=self.R.mT,
+            T=self.T,
+            K=self.K_full,
+            image_size=self.image_sizes,
+            in_ndc=False)
+    def initialize_camera_params(self):
+        """Hard coding for camera parameters
+        TODO: Do some soft coding"""
+        # Extrinsics
+        self.R = torch.diag(
+            torch.tensor([1, 1, 1])
+        ).float().to(self.device).unsqueeze(0)
+        self.T = torch.tensor(
+            [0, 0, 0]
+        ).unsqueeze(0).float().to(self.device)
+        # Intrinsics
+        self.K = torch.tensor(
+            [[self.focal_length, 0, self.width/2],
+            [0, self.focal_length, self.height/2],
+            [0, 0, 1]]
+        ).unsqueeze(0).float().to(self.device)
+        self.bboxes = torch.tensor([[0, 0, self.width, self.height]]).float()
+        self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, self.bboxes)
+        self.cameras = self.create_camera()
+    def set_ground(self, length, center_x, center_z):
+        device = self.device
+        v, f, vc, fc = map(torch.from_numpy, checkerboard_geometry(length=length, c1=center_x, c2=center_z, up="y"))
+        v, f, vc = v.to(device), f.to(device), vc.to(device)
+        self.ground_geometry = [v, f, vc]
+    def update_bbox(self, x3d, scale=2.0, mask=None):
+        """ Update bbox of cameras from the given 3d points
+        x3d: input 3D keypoints (or vertices), (num_frames, num_points, 3)
+        """
+        if x3d.size(-1) != 3:
+            x2d = x3d.unsqueeze(0)
+        else:
+            x2d = perspective_projection(x3d.unsqueeze(0), self.K, self.R, self.T.reshape(1, 3, 1))
+        if mask is not None:
+            x2d = x2d[:, ~mask]
+        bbox = compute_bbox_from_points(x2d, self.width, self.height, scale)
+        self.bboxes = bbox
+        self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, bbox)
+        self.cameras = self.create_camera()
+        self.create_renderer()
+    def reset_bbox(self,):
+        bbox = torch.zeros((1, 4)).float().to(self.device)
+        bbox[0, 2] = self.width
+        bbox[0, 3] = self.height
+        self.bboxes = bbox
+        self.K_full, self.image_sizes = update_intrinsics_from_bbox(self.K, bbox)
+        self.cameras = self.create_camera()
+        self.create_renderer()
+    def render_mesh(self, vertices, background, colors=[0.8, 0.8, 0.8]):
+        self.update_bbox(vertices[::50], scale=1.2)
+        vertices = vertices.unsqueeze(0)
+        if colors[0] > 1: colors = [c / 255. for c in colors]
+        verts_features = torch.tensor(colors).reshape(1, 1, 3).to(device=vertices.device, dtype=vertices.dtype)
+        verts_features = verts_features.repeat(1, vertices.shape[1], 1)
+        textures = TexturesVertex(verts_features=verts_features)
+        mesh = Meshes(verts=vertices,
+                      faces=self.faces,
+                      textures=textures,)
+        materials = Materials(
+            device=self.device,
+            specular_color=(colors, ),
+            shininess=0
+            )
+        results = torch.flip(
+            self.renderer(mesh, materials=materials, cameras=self.cameras, lights=self.lights),
+            [1, 2]
+        )
+        image = results[0, ..., :3] * 255
+        mask = results[0, ..., -1] > 1e-3
+        image = overlay_image_onto_background(image, mask, self.bboxes, background.copy())
+        self.reset_bbox()
+        return image
+    def render_with_ground(self, verts, faces, colors, cameras, lights):
+        """
+        :param verts (B, V, 3)
+        :param faces (F, 3)
+        :param colors (B, 3)
+        """
+        # (B, V, 3), (B, F, 3), (B, V, 3)
+        verts, faces, colors = prep_shared_geometry(verts, faces, colors)
+        # (V, 3), (F, 3), (V, 3)
+        gv, gf, gc = self.ground_geometry
+        verts = list(torch.unbind(verts, dim=0)) + [gv]
+        faces = list(torch.unbind(faces, dim=0)) + [gf]
+        colors = list(torch.unbind(colors, dim=0)) + [gc[..., :3]]
+        mesh = create_meshes(verts, faces, colors)
+        materials = Materials(
+            device=self.device,
+            shininess=0
+        )
+        results = self.renderer(mesh, cameras=cameras, lights=lights, materials=materials)
+        image = (results[0, ..., :3].cpu().numpy() * 255).astype(np.uint8)
+        return image
+def prep_shared_geometry(verts, faces, colors):
+    """
+    :param verts (B, V, 3)
+    :param faces (F, 3)
+    :param colors (B, 4)
+    """
+    B, V, _ = verts.shape
+    F, _ = faces.shape
+    colors = colors.unsqueeze(1).expand(B, V, -1)[..., :3]
+    faces = faces.unsqueeze(0).expand(B, F, -1)
+    return verts, faces, colors
+def create_meshes(verts, faces, colors):
+    """
+    :param verts (B, V, 3)
+    :param faces (B, F, 3)
+    :param colors (B, V, 3)
+    """
+    textures = TexturesVertex(verts_features=colors)
+    meshes = Meshes(verts=verts, faces=faces, textures=textures)
+    return join_meshes_as_scene(meshes)
+def get_global_cameras(verts, device, distance=5, position=(-5.0, 5.0, 0.0)):
+    positions = torch.tensor([position]).repeat(len(verts), 1)
+    targets = verts.mean(1)
+    directions = targets - positions
+    directions = directions / torch.norm(directions, dim=-1).unsqueeze(-1) * distance
+    positions = targets - directions
+    rotation = look_at_rotation(positions, targets, ).mT
+    translation = -(rotation @ positions.unsqueeze(-1)).squeeze(-1)
+    lights = PointLights(device=device, location=[position])
+    return rotation, translation, lights
+def _get_global_cameras(verts, device, min_distance=3, chunk_size=100):
+    # split into smaller chunks to visualize
+    start_idxs = list(range(0, len(verts), chunk_size))
+    end_idxs = [min(start_idx + chunk_size, len(verts)) for start_idx in start_idxs]
+    Rs, Ts = [], []
+    for start_idx, end_idx in zip(start_idxs, end_idxs):
+        vert = verts[start_idx:end_idx].clone()
+        import pdb; pdb.set_trace()

lib/vis/run_vis.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import os
+import os.path as osp
+import cv2
+import torch
+import imageio
+import numpy as np
+from progress.bar import Bar
+from lib.vis.renderer import Renderer, get_global_cameras
+def run_vis_on_demo(cfg, video, results, output_pth, smpl, vis_global=True):
+    # to torch tensor
+    tt = lambda x: torch.from_numpy(x).float().to(cfg.DEVICE)
+    cap = cv2.VideoCapture(video)
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
+    width, height = cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
+    # create renderer with cliff focal length estimation
+    focal_length = (width ** 2 + height ** 2) ** 0.5
+    renderer = Renderer(width, height, focal_length, cfg.DEVICE, smpl.faces)
+    if vis_global:
+        # setup global coordinate subject
+        # current implementation only visualize the subject appeared longest
+        n_frames = {k: len(results[k]['frame_ids']) for k in results.keys()}
+        sid = max(n_frames, key=n_frames.get)
+        global_output = smpl.get_output(
+            body_pose=tt(results[sid]['pose_world'][:, 3:]),
+            global_orient=tt(results[sid]['pose_world'][:, :3]),
+            betas=tt(results[sid]['betas']),
+            transl=tt(results[sid]['trans_world']))
+        verts_glob = global_output.vertices.cpu()
+        verts_glob[..., 1] = verts_glob[..., 1] - verts_glob[..., 1].min()
+        cx, cz = (verts_glob.mean(1).max(0)[0] + verts_glob.mean(1).min(0)[0])[[0, 2]] / 2.0
+        sx, sz = (verts_glob.mean(1).max(0)[0] - verts_glob.mean(1).min(0)[0])[[0, 2]]
+        scale = max(sx.item(), sz.item()) * 1.5
+        # set default ground
+        renderer.set_ground(scale, cx.item(), cz.item())
+        # build global camera
+        global_R, global_T, global_lights = get_global_cameras(verts_glob, cfg.DEVICE)
+    # build default camera
+    default_R, default_T = torch.eye(3), torch.zeros(3)
+    writer = imageio.get_writer(
+        osp.join(output_pth, 'output.mp4'),
+        fps=fps, mode='I', format='FFMPEG', macro_block_size=1
+    )
+    bar = Bar('Rendering results ...', fill='#', max=length)
+    frame_i = 0
+    _global_R, _global_T = None, None
+    # run rendering
+    while (cap.isOpened()):
+        flag, org_img = cap.read()
+        if not flag: break
+        img = org_img[..., ::-1].copy()
+        # render onto the input video
+        renderer.create_camera(default_R, default_T)
+        for _id, val in results.items():
+            # render onto the image
+            frame_i2 = np.where(val['frame_ids'] == frame_i)[0]
+            if len(frame_i2) == 0: continue
+            frame_i2 = frame_i2[0]
+            img = renderer.render_mesh(torch.from_numpy(val['verts'][frame_i2]).to(cfg.DEVICE), img)
+        if vis_global:
+            # render the global coordinate
+            if frame_i in results[sid]['frame_ids']:
+                frame_i3 = np.where(results[sid]['frame_ids'] == frame_i)[0]
+                verts = verts_glob[[frame_i3]].to(cfg.DEVICE)
+                faces = renderer.faces.clone().squeeze(0)
+                colors = torch.ones((1, 4)).float().to(cfg.DEVICE); colors[..., :3] *= 0.9
+                if _global_R is None:
+                    _global_R = global_R[frame_i3].clone(); _global_T = global_T[frame_i3].clone()
+                cameras = renderer.create_camera(global_R[frame_i3], global_T[frame_i3])
+                img_glob = renderer.render_with_ground(verts, faces, colors, cameras, global_lights)
+            try: img = np.concatenate((img, img_glob), axis=1)
+            except: img = np.concatenate((img, np.ones_like(img) * 255), axis=1)
+        writer.append_data(img)
+        bar.next()
+        frame_i += 1
+    writer.close()

lib/vis/tools.py ADDED Viewed

	@@ -0,0 +1,822 @@

+import os
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+def read_image(path, scale=1):
+    im = Image.open(path)
+    if scale == 1:
+        return np.array(im)
+    W, H = im.size
+    w, h = int(scale * W), int(scale * H)
+    return np.array(im.resize((w, h), Image.ANTIALIAS))
+def transform_torch3d(T_c2w):
+    """
+    :param T_c2w (*, 4, 4)
+    returns (*, 3, 3), (*, 3)
+    """
+    R1 = torch.tensor(
+        [[-1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, 1.0],], device=T_c2w.device,
+    )
+    R2 = torch.tensor(
+        [[1.0, 0.0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, -1.0],], device=T_c2w.device,
+    )
+    cam_R, cam_t = T_c2w[..., :3, :3], T_c2w[..., :3, 3]
+    cam_R = torch.einsum("...ij,jk->...ik", cam_R, R1)
+    cam_t = torch.einsum("ij,...j->...i", R2, cam_t)
+    return cam_R, cam_t
+def transform_pyrender(T_c2w):
+    """
+    :param T_c2w (*, 4, 4)
+    """
+    T_vis = torch.tensor(
+        [
+            [1.0, 0.0, 0.0, 0.0],
+            [0.0, -1.0, 0.0, 0.0],
+            [0.0, 0.0, -1.0, 0.0],
+            [0.0, 0.0, 0.0, 1.0],
+        ],
+        device=T_c2w.device,
+    )
+    return torch.einsum(
+        "...ij,jk->...ik", torch.einsum("ij,...jk->...ik", T_vis, T_c2w), T_vis
+    )
+def smpl_to_geometry(verts, faces, vis_mask=None, track_ids=None):
+    """
+    :param verts (B, T, V, 3)
+    :param faces (F, 3)
+    :param vis_mask (optional) (B, T) visibility of each person
+    :param track_ids (optional) (B,)
+    returns list of T verts (B, V, 3), faces (F, 3), colors (B, 3)
+    where B is different depending on the visibility of the people
+    """
+    B, T = verts.shape[:2]
+    device = verts.device
+    # (B, 3)
+    colors = (
+        track_to_colors(track_ids)
+        if track_ids is not None
+        else torch.ones(B, 3, device) * 0.5
+    )
+    # list T (B, V, 3), T (B, 3), T (F, 3)
+    return filter_visible_meshes(verts, colors, faces, vis_mask)
+def filter_visible_meshes(verts, colors, faces, vis_mask=None, vis_opacity=False):
+    """
+    :param verts (B, T, V, 3)
+    :param colors (B, 3)
+    :param faces (F, 3)
+    :param vis_mask (optional tensor, default None) (B, T) ternary mask
+        -1 if not in frame
+         0 if temporarily occluded
+         1 if visible
+    :param vis_opacity (optional bool, default False)
+        if True, make occluded people alpha=0.5, otherwise alpha=1
+    returns a list of T lists verts (Bi, V, 3), colors (Bi, 4), faces (F, 3)
+    """
+    #     import ipdb; ipdb.set_trace()
+    B, T = verts.shape[:2]
+    faces = [faces for t in range(T)]
+    if vis_mask is None:
+        verts = [verts[:, t] for t in range(T)]
+        colors = [colors for t in range(T)]
+        return verts, colors, faces
+    # render occluded and visible, but not removed
+    vis_mask = vis_mask >= 0
+    if vis_opacity:
+        alpha = 0.5 * (vis_mask[..., None] + 1)
+    else:
+        alpha = (vis_mask[..., None] >= 0).float()
+    vert_list = [verts[vis_mask[:, t], t] for t in range(T)]
+    colors = [
+        torch.cat([colors[vis_mask[:, t]], alpha[vis_mask[:, t], t]], dim=-1)
+        for t in range(T)
+    ]
+    bounds = get_bboxes(verts, vis_mask)
+    return vert_list, colors, faces, bounds
+def get_bboxes(verts, vis_mask):
+    """
+    return bb_min, bb_max, and mean for each track (B, 3) over entire trajectory
+    :param verts (B, T, V, 3)
+    :param vis_mask (B, T)
+    """
+    B, T, *_ = verts.shape
+    bb_min, bb_max, mean = [], [], []
+    for b in range(B):
+        v = verts[b, vis_mask[b, :T]]  # (Tb, V, 3)
+        bb_min.append(v.amin(dim=(0, 1)))
+        bb_max.append(v.amax(dim=(0, 1)))
+        mean.append(v.mean(dim=(0, 1)))
+    bb_min = torch.stack(bb_min, dim=0)
+    bb_max = torch.stack(bb_max, dim=0)
+    mean = torch.stack(mean, dim=0)
+    # point to a track that's long and close to the camera
+    zs = mean[:, 2]
+    counts = vis_mask[:, :T].sum(dim=-1)  # (B,)
+    mask = counts < 0.8 * T
+    zs[mask] = torch.inf
+    sel = torch.argmin(zs)
+    return bb_min.amin(dim=0), bb_max.amax(dim=0), mean[sel]
+def track_to_colors(track_ids):
+    """
+    :param track_ids (B)
+    """
+    color_map = torch.from_numpy(get_colors()).to(track_ids)
+    return color_map[track_ids] / 255  # (B, 3)
+def get_colors():
+    #     color_file = os.path.abspath(os.path.join(__file__, "../colors_phalp.txt"))
+    color_file = os.path.abspath(os.path.join(__file__, "../colors.txt"))
+    RGB_tuples = np.vstack(
+        [
+            np.loadtxt(color_file, skiprows=0),
+            #             np.loadtxt(color_file, skiprows=1),
+            np.random.uniform(0, 255, size=(10000, 3)),
+            [[0, 0, 0]],
+        ]
+    )
+    b = np.where(RGB_tuples == 0)
+    RGB_tuples[b] = 1
+    return RGB_tuples.astype(np.float32)
+def checkerboard_geometry(
+    length=12.0,
+    color0=[0.8, 0.9, 0.9],
+    color1=[0.6, 0.7, 0.7],
+    tile_width=0.5,
+    alpha=1.0,
+    up="y",
+    c1=0.0,
+    c2=0.0,
+):
+    assert up == "y" or up == "z"
+    color0 = np.array(color0 + [alpha])
+    color1 = np.array(color1 + [alpha])
+    radius = length / 2.0
+    num_rows = num_cols = max(2, int(length / tile_width))
+    vertices = []
+    vert_colors = []
+    faces = []
+    face_colors = []
+    for i in range(num_rows):
+        for j in range(num_cols):
+            u0, v0 = j * tile_width - radius, i * tile_width - radius
+            us = np.array([u0, u0, u0 + tile_width, u0 + tile_width])
+            vs = np.array([v0, v0 + tile_width, v0 + tile_width, v0])
+            zs = np.zeros(4)
+            if up == "y":
+                cur_verts = np.stack([us, zs, vs], axis=-1)  # (4, 3)
+                cur_verts[:, 0] += c1
+                cur_verts[:, 2] += c2
+            else:
+                cur_verts = np.stack([us, vs, zs], axis=-1)  # (4, 3)
+                cur_verts[:, 0] += c1
+                cur_verts[:, 1] += c2
+            cur_faces = np.array(
+                [[0, 1, 3], [1, 2, 3], [0, 3, 1], [1, 3, 2]], dtype=np.int64
+            )
+            cur_faces += 4 * (i * num_cols + j)  # the number of previously added verts
+            use_color0 = (i % 2 == 0 and j % 2 == 0) or (i % 2 == 1 and j % 2 == 1)
+            cur_color = color0 if use_color0 else color1
+            cur_colors = np.array([cur_color, cur_color, cur_color, cur_color])
+            vertices.append(cur_verts)
+            faces.append(cur_faces)
+            vert_colors.append(cur_colors)
+            face_colors.append(cur_colors)
+    vertices = np.concatenate(vertices, axis=0).astype(np.float32)
+    vert_colors = np.concatenate(vert_colors, axis=0).astype(np.float32)
+    faces = np.concatenate(faces, axis=0).astype(np.float32)
+    face_colors = np.concatenate(face_colors, axis=0).astype(np.float32)
+    return vertices, faces, vert_colors, face_colors
+def camera_marker_geometry(radius, height, up):
+    assert up == "y" or up == "z"
+    if up == "y":
+        vertices = np.array(
+            [
+                [-radius, -radius, 0],
+                [radius, -radius, 0],
+                [radius, radius, 0],
+                [-radius, radius, 0],
+                [0, 0, height],
+            ]
+        )
+    else:
+        vertices = np.array(
+            [
+                [-radius, 0, -radius],
+                [radius, 0, -radius],
+                [radius, 0, radius],
+                [-radius, 0, radius],
+                [0, -height, 0],
+            ]
+        )
+    faces = np.array(
+        [[0, 3, 1], [1, 3, 2], [0, 1, 4], [1, 2, 4], [2, 3, 4], [3, 0, 4],]
+    )
+    face_colors = np.array(
+        [
+            [1.0, 1.0, 1.0, 1.0],
+            [1.0, 1.0, 1.0, 1.0],
+            [0.0, 1.0, 0.0, 1.0],
+            [1.0, 0.0, 0.0, 1.0],
+            [0.0, 1.0, 0.0, 1.0],
+            [1.0, 0.0, 0.0, 1.0],
+        ]
+    )
+    return vertices, faces, face_colors
+def vis_keypoints(
+    keypts_list,
+    img_size,
+    radius=6,
+    thickness=3,
+    kpt_score_thr=0.3,
+    dataset="TopDownCocoDataset",
+):
+    """
+    Visualize keypoints
+    From ViTPose/mmpose/apis/inference.py
+    """
+    palette = np.array(
+        [
+            [255, 128, 0],
+            [255, 153, 51],
+            [255, 178, 102],
+            [230, 230, 0],
+            [255, 153, 255],
+            [153, 204, 255],
+            [255, 102, 255],
+            [255, 51, 255],
+            [102, 178, 255],
+            [51, 153, 255],
+            [255, 153, 153],
+            [255, 102, 102],
+            [255, 51, 51],
+            [153, 255, 153],
+            [102, 255, 102],
+            [51, 255, 51],
+            [0, 255, 0],
+            [0, 0, 255],
+            [255, 0, 0],
+            [255, 255, 255],
+        ]
+    )
+    if dataset in (
+        "TopDownCocoDataset",
+        "BottomUpCocoDataset",
+        "TopDownOCHumanDataset",
+        "AnimalMacaqueDataset",
+    ):
+        # show the results
+        skeleton = [
+            [15, 13],
+            [13, 11],
+            [16, 14],
+            [14, 12],
+            [11, 12],
+            [5, 11],
+            [6, 12],
+            [5, 6],
+            [5, 7],
+            [6, 8],
+            [7, 9],
+            [8, 10],
+            [1, 2],
+            [0, 1],
+            [0, 2],
+            [1, 3],
+            [2, 4],
+            [3, 5],
+            [4, 6],
+        ]
+        pose_link_color = palette[
+            [0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]
+        ]
+        pose_kpt_color = palette[
+            [16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]
+        ]
+    elif dataset == "TopDownCocoWholeBodyDataset":
+        # show the results
+        skeleton = [
+            [15, 13],
+            [13, 11],
+            [16, 14],
+            [14, 12],
+            [11, 12],
+            [5, 11],
+            [6, 12],
+            [5, 6],
+            [5, 7],
+            [6, 8],
+            [7, 9],
+            [8, 10],
+            [1, 2],
+            [0, 1],
+            [0, 2],
+            [1, 3],
+            [2, 4],
+            [3, 5],
+            [4, 6],
+            [15, 17],
+            [15, 18],
+            [15, 19],
+            [16, 20],
+            [16, 21],
+            [16, 22],
+            [91, 92],
+            [92, 93],
+            [93, 94],
+            [94, 95],
+            [91, 96],
+            [96, 97],
+            [97, 98],
+            [98, 99],
+            [91, 100],
+            [100, 101],
+            [101, 102],
+            [102, 103],
+            [91, 104],
+            [104, 105],
+            [105, 106],
+            [106, 107],
+            [91, 108],
+            [108, 109],
+            [109, 110],
+            [110, 111],
+            [112, 113],
+            [113, 114],
+            [114, 115],
+            [115, 116],
+            [112, 117],
+            [117, 118],
+            [118, 119],
+            [119, 120],
+            [112, 121],
+            [121, 122],
+            [122, 123],
+            [123, 124],
+            [112, 125],
+            [125, 126],
+            [126, 127],
+            [127, 128],
+            [112, 129],
+            [129, 130],
+            [130, 131],
+            [131, 132],
+        ]
+        pose_link_color = palette[
+            [0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]
+            + [16, 16, 16, 16, 16, 16]
+            + [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+            + [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+        ]
+        pose_kpt_color = palette[
+            [16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]
+            + [0, 0, 0, 0, 0, 0]
+            + [19] * (68 + 42)
+        ]
+    elif dataset == "TopDownAicDataset":
+        skeleton = [
+            [2, 1],
+            [1, 0],
+            [0, 13],
+            [13, 3],
+            [3, 4],
+            [4, 5],
+            [8, 7],
+            [7, 6],
+            [6, 9],
+            [9, 10],
+            [10, 11],
+            [12, 13],
+            [0, 6],
+            [3, 9],
+        ]
+        pose_link_color = palette[[9, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 0, 7, 7]]
+        pose_kpt_color = palette[[9, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 0, 0]]
+    elif dataset == "TopDownMpiiDataset":
+        skeleton = [
+            [0, 1],
+            [1, 2],
+            [2, 6],
+            [6, 3],
+            [3, 4],
+            [4, 5],
+            [6, 7],
+            [7, 8],
+            [8, 9],
+            [8, 12],
+            [12, 11],
+            [11, 10],
+            [8, 13],
+            [13, 14],
+            [14, 15],
+        ]
+        pose_link_color = palette[[16, 16, 16, 16, 16, 16, 7, 7, 0, 9, 9, 9, 9, 9, 9]]
+        pose_kpt_color = palette[[16, 16, 16, 16, 16, 16, 7, 7, 0, 0, 9, 9, 9, 9, 9, 9]]
+    elif dataset == "TopDownMpiiTrbDataset":
+        skeleton = [
+            [12, 13],
+            [13, 0],
+            [13, 1],
+            [0, 2],
+            [1, 3],
+            [2, 4],
+            [3, 5],
+            [0, 6],
+            [1, 7],
+            [6, 7],
+            [6, 8],
+            [7, 9],
+            [8, 10],
+            [9, 11],
+            [14, 15],
+            [16, 17],
+            [18, 19],
+            [20, 21],
+            [22, 23],
+            [24, 25],
+            [26, 27],
+            [28, 29],
+            [30, 31],
+            [32, 33],
+            [34, 35],
+            [36, 37],
+            [38, 39],
+        ]
+        pose_link_color = palette[[16] * 14 + [19] * 13]
+        pose_kpt_color = palette[[16] * 14 + [0] * 26]
+    elif dataset in ("OneHand10KDataset", "FreiHandDataset", "PanopticDataset"):
+        skeleton = [
+            [0, 1],
+            [1, 2],
+            [2, 3],
+            [3, 4],
+            [0, 5],
+            [5, 6],
+            [6, 7],
+            [7, 8],
+            [0, 9],
+            [9, 10],
+            [10, 11],
+            [11, 12],
+            [0, 13],
+            [13, 14],
+            [14, 15],
+            [15, 16],
+            [0, 17],
+            [17, 18],
+            [18, 19],
+            [19, 20],
+        ]
+        pose_link_color = palette[
+            [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+        ]
+        pose_kpt_color = palette[
+            [0, 0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16]
+        ]
+    elif dataset == "InterHand2DDataset":
+        skeleton = [
+            [0, 1],
+            [1, 2],
+            [2, 3],
+            [4, 5],
+            [5, 6],
+            [6, 7],
+            [8, 9],
+            [9, 10],
+            [10, 11],
+            [12, 13],
+            [13, 14],
+            [14, 15],
+            [16, 17],
+            [17, 18],
+            [18, 19],
+            [3, 20],
+            [7, 20],
+            [11, 20],
+            [15, 20],
+            [19, 20],
+        ]
+        pose_link_color = palette[
+            [0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12, 16, 16, 16, 0, 4, 8, 12, 16]
+        ]
+        pose_kpt_color = palette[
+            [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, 12, 12, 12, 16, 16, 16, 16, 0]
+        ]
+    elif dataset == "Face300WDataset":
+        # show the results
+        skeleton = []
+        pose_link_color = palette[[]]
+        pose_kpt_color = palette[[19] * 68]
+        kpt_score_thr = 0
+    elif dataset == "FaceAFLWDataset":
+        # show the results
+        skeleton = []
+        pose_link_color = palette[[]]
+        pose_kpt_color = palette[[19] * 19]
+        kpt_score_thr = 0
+    elif dataset == "FaceCOFWDataset":
+        # show the results
+        skeleton = []
+        pose_link_color = palette[[]]
+        pose_kpt_color = palette[[19] * 29]
+        kpt_score_thr = 0
+    elif dataset == "FaceWFLWDataset":
+        # show the results
+        skeleton = []
+        pose_link_color = palette[[]]
+        pose_kpt_color = palette[[19] * 98]
+        kpt_score_thr = 0
+    elif dataset == "AnimalHorse10Dataset":
+        skeleton = [
+            [0, 1],
+            [1, 12],
+            [12, 16],
+            [16, 21],
+            [21, 17],
+            [17, 11],
+            [11, 10],
+            [10, 8],
+            [8, 9],
+            [9, 12],
+            [2, 3],
+            [3, 4],
+            [5, 6],
+            [6, 7],
+            [13, 14],
+            [14, 15],
+            [18, 19],
+            [19, 20],
+        ]
+        pose_link_color = palette[[4] * 10 + [6] * 2 + [6] * 2 + [7] * 2 + [7] * 2]
+        pose_kpt_color = palette[
+            [4, 4, 6, 6, 6, 6, 6, 6, 4, 4, 4, 4, 4, 7, 7, 7, 4, 4, 7, 7, 7, 4]
+        ]
+    elif dataset == "AnimalFlyDataset":
+        skeleton = [
+            [1, 0],
+            [2, 0],
+            [3, 0],
+            [4, 3],
+            [5, 4],
+            [7, 6],
+            [8, 7],
+            [9, 8],
+            [11, 10],
+            [12, 11],
+            [13, 12],
+            [15, 14],
+            [16, 15],
+            [17, 16],
+            [19, 18],
+            [20, 19],
+            [21, 20],
+            [23, 22],
+            [24, 23],
+            [25, 24],
+            [27, 26],
+            [28, 27],
+            [29, 28],
+            [30, 3],
+            [31, 3],
+        ]
+        pose_link_color = palette[[0] * 25]
+        pose_kpt_color = palette[[0] * 32]
+    elif dataset == "AnimalLocustDataset":
+        skeleton = [
+            [1, 0],
+            [2, 1],
+            [3, 2],
+            [4, 3],
+            [6, 5],
+            [7, 6],
+            [9, 8],
+            [10, 9],
+            [11, 10],
+            [13, 12],
+            [14, 13],
+            [15, 14],
+            [17, 16],
+            [18, 17],
+            [19, 18],
+            [21, 20],
+            [22, 21],
+            [24, 23],
+            [25, 24],
+            [26, 25],
+            [28, 27],
+            [29, 28],
+            [30, 29],
+            [32, 31],
+            [33, 32],
+            [34, 33],
+        ]
+        pose_link_color = palette[[0] * 26]
+        pose_kpt_color = palette[[0] * 35]
+    elif dataset == "AnimalZebraDataset":
+        skeleton = [[1, 0], [2, 1], [3, 2], [4, 2], [5, 7], [6, 7], [7, 2], [8, 7]]
+        pose_link_color = palette[[0] * 8]
+        pose_kpt_color = palette[[0] * 9]
+    elif dataset in "AnimalPoseDataset":
+        skeleton = [
+            [0, 1],
+            [0, 2],
+            [1, 3],
+            [0, 4],
+            [1, 4],
+            [4, 5],
+            [5, 7],
+            [6, 7],
+            [5, 8],
+            [8, 12],
+            [12, 16],
+            [5, 9],
+            [9, 13],
+            [13, 17],
+            [6, 10],
+            [10, 14],
+            [14, 18],
+            [6, 11],
+            [11, 15],
+            [15, 19],
+        ]
+        pose_link_color = palette[[0] * 20]
+        pose_kpt_color = palette[[0] * 20]
+    else:
+        NotImplementedError()
+    img_w, img_h = img_size
+    img = 255 * np.ones((img_h, img_w, 3), dtype=np.uint8)
+    img = imshow_keypoints(
+        img,
+        keypts_list,
+        skeleton,
+        kpt_score_thr,
+        pose_kpt_color,
+        pose_link_color,
+        radius,
+        thickness,
+    )
+    alpha = 255 * (img != 255).any(axis=-1, keepdims=True).astype(np.uint8)
+    return np.concatenate([img, alpha], axis=-1)
+def imshow_keypoints(
+    img,
+    pose_result,
+    skeleton=None,
+    kpt_score_thr=0.3,
+    pose_kpt_color=None,
+    pose_link_color=None,
+    radius=4,
+    thickness=1,
+    show_keypoint_weight=False,
+):
+    """Draw keypoints and links on an image.
+    From ViTPose/mmpose/core/visualization/image.py
+    Args:
+        img (H, W, 3) array
+        pose_result (list[kpts]): The poses to draw. Each element kpts is
+            a set of K keypoints as an Kx3 numpy.ndarray, where each
+            keypoint is represented as x, y, score.
+        kpt_score_thr (float, optional): Minimum score of keypoints
+            to be shown. Default: 0.3.
+        pose_kpt_color (np.array[Nx3]`): Color of N keypoints. If None,
+            the keypoint will not be drawn.
+        pose_link_color (np.array[Mx3]): Color of M links. If None, the
+            links will not be drawn.
+        thickness (int): Thickness of lines.
+        show_keypoint_weight (bool): If True, opacity indicates keypoint score
+    """
+    img_h, img_w, _ = img.shape
+    idcs = [0, 16, 15, 18, 17, 5, 2, 6, 3, 7, 4, 12, 9, 13, 10, 14, 11]
+    for kpts in pose_result:
+        kpts = np.array(kpts, copy=False)[idcs]
+        # draw each point on image
+        if pose_kpt_color is not None:
+            assert len(pose_kpt_color) == len(kpts)
+            for kid, kpt in enumerate(kpts):
+                x_coord, y_coord, kpt_score = int(kpt[0]), int(kpt[1]), kpt[2]
+                if kpt_score > kpt_score_thr:
+                    color = tuple(int(c) for c in pose_kpt_color[kid])
+                    if show_keypoint_weight:
+                        img_copy = img.copy()
+                        cv2.circle(
+                            img_copy, (int(x_coord), int(y_coord)), radius, color, -1
+                        )
+                        transparency = max(0, min(1, kpt_score))
+                        cv2.addWeighted(
+                            img_copy, transparency, img, 1 - transparency, 0, dst=img
+                        )
+                    else:
+                        cv2.circle(img, (int(x_coord), int(y_coord)), radius, color, -1)
+        # draw links
+        if skeleton is not None and pose_link_color is not None:
+            assert len(pose_link_color) == len(skeleton)
+            for sk_id, sk in enumerate(skeleton):
+                pos1 = (int(kpts[sk[0], 0]), int(kpts[sk[0], 1]))
+                pos2 = (int(kpts[sk[1], 0]), int(kpts[sk[1], 1]))
+                if (
+                    pos1[0] > 0
+                    and pos1[0] < img_w
+                    and pos1[1] > 0
+                    and pos1[1] < img_h
+                    and pos2[0] > 0
+                    and pos2[0] < img_w
+                    and pos2[1] > 0
+                    and pos2[1] < img_h
+                    and kpts[sk[0], 2] > kpt_score_thr
+                    and kpts[sk[1], 2] > kpt_score_thr
+                ):
+                    color = tuple(int(c) for c in pose_link_color[sk_id])
+                    if show_keypoint_weight:
+                        img_copy = img.copy()
+                        X = (pos1[0], pos2[0])
+                        Y = (pos1[1], pos2[1])
+                        mX = np.mean(X)
+                        mY = np.mean(Y)
+                        length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5
+                        angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1]))
+                        stickwidth = 2
+                        polygon = cv2.ellipse2Poly(
+                            (int(mX), int(mY)),
+                            (int(length / 2), int(stickwidth)),
+                            int(angle),
+                            0,
+                            360,
+                            1,
+                        )
+                        cv2.fillConvexPoly(img_copy, polygon, color)
+                        transparency = max(
+                            0, min(1, 0.5 * (kpts[sk[0], 2] + kpts[sk[1], 2]))
+                        )
+                        cv2.addWeighted(
+                            img_copy, transparency, img, 1 - transparency, 0, dst=img
+                        )
+                    else:
+                        cv2.line(img, pos1, pos2, color, thickness=thickness)
+    return img

output/demo/test19/output.mp4 ADDED Viewed

Binary file (602 kB). View file

output/demo/test19/slam_results.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb6e0b47809fe94bdc26bc99318f9eb9beccb005ec81c407887a5bd7223b5b81
+size 2353

output/demo/test19/tracking_results.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d1b3d6e23597e07daaa1b124cba63a1ea91d3909fe2c903c3c9b2b2819ce140
+size 333898

output/demo/test19/wham_output.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1484cca6cf2774c3c0cbefa3d47ed7f2dd04db96b05ebdd14e5a11610a415b3e
+size 3167067