GitHub - JubSteven/POEM-v2: (TPAMI 2025) Generalized Multi-view Hand Mesh Reconstruction (original) (raw)
Generalizable Multi-view Hand Mesh Recovery
Multi-view Hand Reconstruction with a Point-Embedded Transformer
Lixin Yang ·Licheng Zhong ·Pengxiang Zhu ·Xinyu Zhan ·Junxiao Kong .Jian Xu .Cewu Lu
Supports reconstruction of both left and right hand. |
|
|---|---|
Absolute metric output and occlusion-robust. |
It support Human-hand teleoperation. |
What‘s POEM-v2?
POEM (POint-EMbed Multi-view Transformer) v2 is a generalizable multi-view hand mesh recovery model designed for seamless use in real-world hand MoCap & teleoperation.
What is POEM-v2's advantage?
It is flexible: Works with any number, order or arrangement of cameras, as long as:
- share overlapping views,
- see the hand in at least some cameras,
- have calibrated extrinsics
**It is robust to occlusion:**It can handle occlusion and partial visibility by leveraging views where the hand remains visible.
It produces absolute hand position: It directly recovers hand‐surface vertices in real‐world (meter) units, referenced to the first camera’s coordinate system..
It supports both left and right hands:: Although trained on right-hand data, it can still also handle left hand by a world-mirroring process (horizon-tally flipping all images and mirroring camera extrinsics along the first camera's Y-Z plane)
🕹️ Try me
We provide a real-world demonstration for running our model.
Download the example data from huggingface. The tarball includes multi-view video of manipulaiton captured in a laboratory setting, along with the corresponding camera instrinsics and extrinsics, hand poses, and hand's side information.
In the file tool/infer_hand.py, modify the path prefix (/prefix/data/) to the full path of the directory where the data has been extracted.
DATA_FILEDIR = "/prefix/data/data" # Modify /prefix/data to where example data is extracted MASK_FILEDIR = "/prefix/data/human_mask_hand" CALIB_FILEDIR = "/prefix/data/calib/calib__2025_0319_1534_41" HAND_SIDE_FILEPATH = "/prefix/data/hand_labels.json"
The visualize command (you need to install our env first)
python -m tool.infer_hand -c config/release/eval_single.yaml --reload ./checkpoints/medium.pth.tar -g 0
As a multiview method, camera extrinsics mat is crucial for POEM-v2 making prediction. In the tool/infer_hand.py, we require the N extrinsics matrices TcwT_{cw}Tcw in the SE(3) form:
P_c = T_{cw} \cdot P_w$$
where c indicats the camera coordinate system, w indicates the world, and mathbfPc\mathbf{P}_cmathbfPc is the 3D points in camera system.
📓 Instructions
- See docs/installation.md to setup the environment and install all the required packages.
- See docs/datasets.md to download all the datasets and additional assets required.
🏃 Training and Evaluation
Available models
We provide four models with different configurations for training and evaluation. We have evaluated the models on multiple datasets.
- set
${MODEL}as one in[small, medium, medium_MANO, large]. - set
${DATASET}as one in[HO3D, DexYCB, Arctic, Interhand, Oakink, Freihand].
Download the pretrained checkpoints at 🔗ckpt_release and move the contents to ./checkpoints.
Command line arguments
-g, --gpu_id, visible GPUs for training, e.g.-g 0,1,2,3. evaluation only supports single GPU.-w, --workers, num_workers in reading data, e.g.-w 4.-p, --dist_master_port, port for distributed training, e.g.-p 60011, set different-pfor different training processes.-b, --batch_size, e.g.-b 32, default is specified in config file, but will be overwritten if-bis provided.--cfg, config file for this experiment, e.g.--cfg config/release/train_${MODEL}.yaml.--exp_idspecify the name of experiment, e.g.--exp_id <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mrow><mi>E</mi><mi>X</mi><msub><mi>P</mi><mi>I</mi></msub><mi>D</mi></mrow><mi mathvariant="normal">‘</mi><mi mathvariant="normal">.</mi><mi>W</mi><mi>h</mi><mi>e</mi><mi>n</mi><mi mathvariant="normal">‘</mi><mo>−</mo><mo>−</mo><mi>e</mi><mi>x</mi><msub><mi>p</mi><mi>i</mi></msub><mi>d</mi><mi mathvariant="normal">‘</mi><mi>i</mi><mi>s</mi><mi>p</mi><mi>r</mi><mi>o</mi><mi>v</mi><mi>i</mi><mi>d</mi><mi>e</mi><mi>d</mi><mo separator="true">,</mo><mi>t</mi><mi>h</mi><mi>e</mi><mi>c</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>r</mi><mi>e</mi><mi>q</mi><mi>u</mi><mi>i</mi><mi>r</mi><mi>e</mi><mi>s</mi><mi>t</mi><mi>h</mi><mi>a</mi><mi>t</mi><mi>n</mi><mi>o</mi><mi>u</mi><mi>n</mi><mi>c</mi><mi>o</mi><mi>m</mi><mi>m</mi><mi>i</mi><mi>t</mi><mi>t</mi><mi>e</mi><mi>d</mi><mi>c</mi><mi>h</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>e</mi><mi>i</mi><mi>s</mi><mi>r</mi><mi>e</mi><mi>m</mi><mi>a</mi><mi>i</mi><mi>n</mi><mi>e</mi><mi>d</mi><mi>i</mi><mi>n</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>g</mi><mi>i</mi><mi>t</mi><mi>r</mi><mi>e</mi><mi>p</mi><mi>o</mi><mi mathvariant="normal">.</mi><mi>O</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>r</mi><mi>w</mi><mi>i</mi><mi>s</mi><mi>e</mi><mo separator="true">,</mo><mi>i</mi><mi>t</mi><mi>d</mi><mi>e</mi><mi>f</mi><mi>a</mi><mi>u</mi><mi>l</mi><mi>t</mi><mi>s</mi><mi>t</mi><msup><mi>o</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mi>d</mi><mi>e</mi><mi>f</mi><mi>a</mi><mi>u</mi><mi>l</mi><msup><mi>t</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mi>f</mi><mi>o</mi><mi>r</mi><mi>t</mi><mi>r</mi><mi>a</mi><mi>i</mi><mi>n</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>a</mi><mi>n</mi><msup><mi>d</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mi>e</mi><mi>v</mi><mi>a</mi><mi>l</mi><mi mathvariant="normal">_</mi><msup><mrow><mi>c</mi><mi>f</mi><mi>g</mi></mrow><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mi>f</mi><mi>o</mi><mi>r</mi><mi>e</mi><mi>v</mi><mi>a</mi><mi>l</mi><mi>u</mi><mi>a</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mi mathvariant="normal">.</mi><mi>A</mi><mi>l</mi><mi>l</mi><mi>r</mi><mi>e</mi><mi>s</mi><mi>u</mi><mi>l</mi><mi>t</mi><mi>s</mi><mi>w</mi><mi>i</mi><mi>l</mi><mi>l</mi><mi>b</mi><mi>e</mi><mi>s</mi><mi>a</mi><mi>v</mi><mi>e</mi><mi>d</mi><mi>i</mi><mi>n</mi><mi mathvariant="normal">‘</mi><mi>e</mi><mi>x</mi><mi>p</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">{EXP_ID}. When--exp_idis provided, the code requires that no uncommitted change is remained in the git repo. Otherwise, it defaults to 'default' for training and 'eval_{cfg}' for evaluation. All results will be saved inexp/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">EX</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">P</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.07847em;">I</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.02778em;">D</span></span><span class="mord">‘.</span><span class="mord mathnormal">Wh</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord">‘</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.1462em;vertical-align:-0.31em;"></span><span class="mord">−</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">d</span><span class="mord">‘</span><span class="mord mathnormal">i</span><span class="mord mathnormal">s</span><span class="mord mathnormal">p</span><span class="mord mathnormal">ro</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">i</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal">d</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">eco</span><span class="mord mathnormal">d</span><span class="mord mathnormal">ere</span><span class="mord mathnormal" style="margin-right:0.03588em;">q</span><span class="mord mathnormal">u</span><span class="mord mathnormal">i</span><span class="mord mathnormal">res</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ha</span><span class="mord mathnormal">t</span><span class="mord mathnormal">n</span><span class="mord mathnormal">o</span><span class="mord mathnormal">u</span><span class="mord mathnormal">n</span><span class="mord mathnormal">co</span><span class="mord mathnormal">mmi</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">e</span><span class="mord mathnormal">d</span><span class="mord mathnormal">c</span><span class="mord mathnormal">han</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">e</span><span class="mord mathnormal">i</span><span class="mord mathnormal">sre</span><span class="mord mathnormal">main</span><span class="mord mathnormal">e</span><span class="mord mathnormal">d</span><span class="mord mathnormal">in</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">i</span><span class="mord mathnormal">t</span><span class="mord mathnormal">re</span><span class="mord mathnormal">p</span><span class="mord mathnormal">o</span><span class="mord">.</span><span class="mord mathnormal">Ot</span><span class="mord mathnormal">h</span><span class="mord mathnormal" style="margin-right:0.02778em;">er</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">i</span><span class="mord mathnormal">se</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord mathnormal">i</span><span class="mord mathnormal">t</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">a</span><span class="mord mathnormal">u</span><span class="mord mathnormal">lt</span><span class="mord mathnormal">s</span><span class="mord mathnormal">t</span><span class="mord"><span class="mord mathnormal">o</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">a</span><span class="mord mathnormal">u</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord"><span class="mord mathnormal">t</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal" style="margin-right:0.02778em;">or</span><span class="mord mathnormal">t</span><span class="mord mathnormal" style="margin-right:0.02778em;">r</span><span class="mord mathnormal">ainin</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord mathnormal">an</span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7519em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord" style="margin-right:0.02778em;">_</span><span class="mord"><span class="mord"><span class="mord mathnormal">c</span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8362em;"><span style="top:-3.1473em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.10764em;">f</span><span class="mord mathnormal">ore</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">u</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mord">.</span><span class="mord mathnormal">A</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mord mathnormal">res</span><span class="mord mathnormal">u</span><span class="mord mathnormal">lt</span><span class="mord mathnormal">s</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.01968em;">ll</span><span class="mord mathnormal">b</span><span class="mord mathnormal">es</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.03588em;">v</span><span class="mord mathnormal">e</span><span class="mord mathnormal">d</span><span class="mord mathnormal">in</span><span class="mord">‘</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord mathnormal">p</span><span class="mord">/</span></span></span></span>{EXP_ID}*{timestamp}.--reload, specify the path to the checkpoint (.pth.tar) to be loaded.
Compare POEM-v2 vs Single-view methods on HO3D
To provide a holistic benchmark, we compare POEM-v2 with state-of-the-art single-view 3D hand recon- struction frameworks. Since the absolute position of hands is ambiguous in a single-view setting, we only report the MPJPE and MPVPE under the Procrustes Alignment.
We perform this comparison on the official HO3D test set v2 and v3, now the testset GT can be download from the official repo (Update - Nov 3rd, 2024).
├── HO3D_v2 ├── HO3D_v2_official_gt │ ├── evaluation_verts.json │ └── evaluation_xyz.json ├── HO3D_v3 ├── HO3D_v3_manual_test_gt ├── evaluation_verts.json └── evaluation_xyz.json
Then run the following command to get the results:
HO3D_VERSION can be set to 2 or 3,
$ python scripts/eval_ho3d_official.py --ho3d-v ${HO3D_VERSION}
--cfg config/release/eval_single.yaml
--model large
--reload ${PATH_TO_POEM_LARGE_CKPT}
--eval_extra ho3d_offi
Then you can get the results reported in the paper:
Evaluation
Specify the ${PATH_TO_CKPT} to ./checkpoints/${MODEL}.pth.tar. Then, run the following command. Note that we essentially modify the config file in place to suit different configuration settings. view_min and view_max specify the range of views fed into the model. Use --draw option to render the results, note that it is incompatible with the computation of auc metric.
$ python scripts/eval_single.py --cfg config/release/eval_single.yaml -g ${gpu_id} --reload ${PATH_TO_CKPT} --dataset ${DATASET} --view_min ${MIN_VIEW} --view_max ${MAX_VIEW} --model ${MODEL}
The evaluation results will be saved at exp/${EXP_ID}_{timestamp}/evaluations.
Training
We have used the mixature of multiple datasets packed by webdataset for training. Excecute the following command to train a specific model on the provided dataset.
$ python scripts/train_ddp_wds.py --cfg config/release/train_${MODEL}.yaml -g 0,1,2,3 -w 4
Tensorboard
$ cd exp/${EXP_ID}_{timestamp}/runs/ $ tensorboard --logdir .
Checkpoint
All the checkpoints during training are saved at exp/${EXP_ID}_{timestamp}/checkpoints/, where ../checkpoints/checkpoint records the most recent checkpoint.
License
This code and model are available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.
Citation
@misc{yang2024multiviewhandreconstructionpointembedded, title={Multi-view Hand Reconstruction with a Point-Embedded Transformer}, author={Lixin Yang and Licheng Zhong and Pengxiang Zhu and Xinyu Zhan and Junxiao Kong and Jian Xu and Cewu Lu}, year={2024}, eprint={2408.10581}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2408.10581}, }
For more questions, please contact Lixin Yang: siriusyang@sjtu.edu.cn



