GitHub - physical-superintelligence-lab/Psi0: [RSS26'] Welcome to Psi-Zero, a Humanoid VLA towards Universal Humanoid Intelligence. (original) (raw)

[RSS26'] Ψ₀: An Open Foundation Model

Towards Universal Humanoid Loco-Manipulation

Psi0 teaser image

arXiv Static Badge Model Data License

Contributors: Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni , Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang,Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang


Psi_0\Psi_0Psi_0 is an open vision-language-action (VLA) model for dexterous humanoid loco-manipulation. Our model first learns task semantics and visual representation from large-scale human egocentic videos, and then is post-trained on a smaller amount of real-world teleoperated robot data, to learn general dynamics of the embodiment.

[Optional] Expand to know more about Ψ₀.

Our foundation model is capable of acquiring new long-horizontal dexterous loco-manipulation skill by fine-tuning using as few as 80 trajectories. Our key finding is that scaling the right data in the right way.

At the top, the Psi_0\Psi_0Psi_0 model consists of two end-to-end trained components: a vision–language backbone (System-2) and a multimodal diffusion transformer (System-1) action expert. The backbone is based on Qwen’s Qwen3-VL-2B-Instruct, which extracts vision–language features from observations and instructions. These features condition a flow-based multimodal diffusion transformer inspired by Stable Diffusion 3. The action expert (≈500M parameters) predicts future whole-body action chunks, enabling efficient fusion of visual, linguistic, and action representations. At the lowest level (System-0), an RL-based tracking controller executes the predicted lower-body action commands, ensuring stable and precise physical control.

Psi0 model

📢 News & Updates

Table of Contents

Finetune Ψ₀ on Unitree G1 Humanoid Robot

Installation

Clone the project and change directory to the project root:

git clone git@github.com:physical-superintelligence-lab/Psi0.git cd Psi0

We use uv to manage Python dependencies. Install uv if not already installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Set up the Psi_0\Psi_0Psi_0 environment:

ℹ️ We manage the Psi_0\Psi_0Psi_0 environment and all the baselines through uv and they all share the same src/ code. See Environment Management for more details.

uv venv .venv-psi --python 3.10
source .venv-psi/bin/activate
GIT_LFS_SKIP_SMUDGE=1 uv sync \
  --group serve \
  --group viz \
  --group psi \
  --index-strategy unsafe-best-match \
  --active
uv pip install flash_attn==2.7.4.post1 --no-build-isolation

If you want to support SIMPLE evaluation, you can use the following commands to install SIMPLE along with Psi0. See also quickstart.

git submodule update --init --recursive
GIT_LFS_SKIP_SMUDGE=1 uv sync --all-groups --index-strategy unsafe-best-match --active
uv pip install flash_attn==2.7.4.post1 --no-build-isolation
UV_PROJECT_ENVIRONMENT=${pwd}/.venv-psi ./scripts/install_curobo.sh

Test installation, a version number should be displayed.

python -c "import psi;print(psi.version);"

Verify SIMPLE installation

python -c "import simple; print(simple.version)"

Verify the shared lerobot stack is importable.

python -c "from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)"

Data Collection

📂 We open-sourced all the 9 real-world tasks. You can directly download the data and jump to the Fine-Tuning.

See the detailed teleoperation guide here:
Real-World Deployment Guide

Pre-Processing: Convert Raw Data to LeRobot Format

export task=Hug_box_and_move

hf download USC-PSI-Lab/psi-data \
  g1_real_raw/$task.zip \
  --local-dir=$PSI_HOME/data/real_teleop_g1 \
  --repo-type=dataset

unzip <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><msub><mi>l</mi><mi>t</mi></msub><mi>e</mi><mi>l</mi><mi>e</mi><mi>o</mi><msub><mi>p</mi><mi>g</mi></msub><mn>1</mn><mi mathvariant="normal">/</mi><mi>g</mi><msub><mn>1</mn><mi>r</mi></msub><mi>e</mi><mi>a</mi><msub><mi>l</mi><mi>r</mi></msub><mi>a</mi><mi>w</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real_teleop_g1/g1_real_raw/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em;"><span style="top:-2.55em;margin-left:-0.0197em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">eo</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">g</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord">1/</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal">a</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:-0.0197em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord">/</span></span></span></span>task.zip -d <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><msub><mi>l</mi><mi>t</mi></msub><mi>e</mi><mi>l</mi><mi>e</mi><mi>o</mi><msub><mi>p</mi><mi>g</mi></msub><mn>1</mn><mi mathvariant="normal">/</mi><mi>g</mi><msub><mn>1</mn><mi>r</mi></msub><mi>e</mi><mi>a</mi><msub><mi>l</mi><mi>r</mi></msub><mi>a</mi><mi>w</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real_teleop_g1/g1_real_raw/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em;"><span style="top:-2.55em;margin-left:-0.0197em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">eo</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.03588em;">g</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mord">1/</span><span class="mord mathnormal" style="margin-right:0.03588em;">g</span><span class="mord"><span class="mord">1</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">e</span><span class="mord mathnormal">a</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em;"><span style="top:-2.55em;margin-left:-0.0197em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.02778em;">r</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord">/</span></span></span></span>task

You should observe similar folder structure:

g1_real_raw
└── Hug_box_and_move
    ├── episode_0
    │   ├── color
    │   │   ├── frame_000000.jpg
    │   │   └── ...
    │   └── data.json
    └── ...

Edit the task description file with the following format, eg.,

vim scripts/data/task_description_dict.json
{
  "Hug_box_and_move": "Hug box and move."
}

Run conversion script

python scripts/data/raw_to_lerobot.py \
  --data-root=$PWD/data/real_teleop_g1/g1_real_raw \
  --work-dir=$PWD/data/real \
  --repo-id=psi0-real-g1 \
  --robot-type=g1 \
  --task=$task

Calculate stats

python scripts/data/calc_modality_stats.py \
  --work-dir=$PSI_HOME/data/real \
  --task=$task

Create $\Psi_0$ format stats (simply a copy for now)

cp <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>l</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord">/</span></span></span></span>task/meta/stats.json <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>l</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord">/</span></span></span></span>task/meta/stats_psi0.json

Now it's ready to finetune Psi_0\Psi_0Psi_0.

✈️ If training env is already configured, directly launch training via scripts/train/psi0/finetune-real-psi0.sh $task

Fine-Tuning

✔️ Suppose the data is already collected and processed. Now we can proceed to fine-tune the Psi_0\Psi_0Psi_0 model.

There is a known issue of loading our real data, apply this fix first python scripts/data/patch_lerobot_meta.py <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>l</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord">/</span></span></span></span>task

📝 Here we illustrate by using the pre-collected data from Huggingface psi-data.

Set up the environment variables following .env.sample. The environment variables will be loaded by the dotenv.load_dotenv() in python.

cp .env.sample .env
# and edit the following env variables 
# HF_TOKEN=<YOUR HF READ TOKEN>
# WANDB_API_KEY=<API KEY for wandb logging>
# WANDB_ENTITY=<wandb entity>
# PSI_HOME=<Path where PSI cache/checkpoint/data are located by convention>

source .env
echo $PSI_HOME

Download the collected real-world data and extract it:

export task=Pick_bottle_and_turn_and_pour_into_cup

hf download USC-PSI-Lab/psi-data \
  real/$task.zip \
  --local-dir=$PSI_HOME/data \
  --repo-type=dataset

unzip <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>r</mi><mi>e</mi><mi>a</mi><mi>l</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/real/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">re</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord">/</span></span></span></span>task.zip -d $PSI_HOME/data/real

👀 If you want to visualize the episode please refer to the Data Visualization in the examples.

Launch the training script:

scripts/train/psi0/finetune-real-psi0.sh $task

🖥️ You can always change the GPUs, e.g., CUDA_VISIBLE_DEVICES=0,1,2,3 scripts/train/....

⚠️ Please try to maintain a reasonable global batch size = device batch size x number of GPUs x gradient accumulation step. We use global batch size 128 throughout all the real-world and simulation experiments.

Open-Loop Evaluation

Follow the steps in examples/simple/openloop_eval.ipynb

Load the training dataset, and run model inference to see how model fits the training data.

Deployment

Serve Psi_0\Psi_0Psi_0 (RTC mode)

bash ./scripts/deploy/serve_psi0-rtc.sh

Start Psi_0\Psi_0Psi_0 Client (RTC mode)

bash ./real/scripts/deploy_psi0-rtc.sh

For detailed real-world deployment environment setup, please also refer to the dedicated documentation:

Real-World Teleoperation Guide

Ψ₀ with SONIC

SONIC is a powerful whole-body controller for humanoid robots. Psi_0\Psi_0Psi_0 now supports data collection, fine-tuning, and deployment with SONIC. Please use our fork to avoid any compatibility issues.

Initialize the SONIC submodule first:

git submodule update --init --recursive third_party/GR00T-WholeBodyControl

For the full environment setup — workstation venvs, TensorRT + C++ build, PICO/XRoboToolkit, and the robot-side camera server — see the SONIC real-world teleoperation guide.

Data collection

Please follow the SONIC real-world teleoperation guide to record demonstrations.

Datasets are saved locally under third_party/GR00T-WholeBodyControl/outputs/<dataset-name>/ in LeRobot format.

Pre-Processing: Convert to Psi_0\Psi_0Psi_0 LeRobot Format

Convert the SONIC-collected dataset into the Psi_0\Psi_0Psi_0 LeRobot format:

export task=

python scripts/data/raw_sonic_to_psi_lerobot.py
--data-root=third_party/GR00T-WholeBodyControl/outputs/$task
--work-dir=$PSI_HOME/data/sonic/lerobot
--repo-id=$task
--robot-type=g1

Calculate stats

python scripts/data/calc_modality_stats.py
--work-dir=$PSI_HOME/data/sonic/lerobot
--task=$task

Create $\Psi_0$ format stats (simply a copy for now)

cp PSIHOME/data/sonic/lerobot/PSI_HOME/data/sonic/lerobot/PSIHOME/data/sonic/lerobot/task/meta/stats.json PSIHOME/data/sonic/lerobot/PSI_HOME/data/sonic/lerobot/PSIHOME/data/sonic/lerobot/task/meta/stats_psi0.json

Now it's ready to fine-tune.

Finetune Psi_0\Psi_0Psi_0 with SONIC

bash ./scripts/train/psi0/finetune-real-sonic-psi0.sh $task

Deploy Psi_0\Psi_0Psi_0 with SONIC

Please follow the SONIC real-world deployment guide for detailed instructions.

Serve Policy Server of Psi_0\Psi_0Psi_0 with SONIC (RTC mode)

bash ./scripts/deploy/serve_psi0-rtc-sonic.sh

Start whole-body controller on robot for Psi_0\Psi_0Psi_0 with SONIC (RTC mode)

bash ./real/scripts/deploy_psi0-sonic-rtc-robot.sh

Start Policy Client of Psi_0\Psi_0Psi_0 with SONIC (RTC mode)

bash ./real/scripts/deploy_psi0-sonic-rtc-client.sh

Baselines

GR00T

Install the env

  1. training

cd src/gr00t ./scripts/train_gr00t.sh --dataset-path /your/lerobot/dataset

  1. serving a checkpoint

cd src/gr00t ./scripts/deploy_gr00t.sh

  1. openloop eval on trained checkpoint using gt

cd src/gr00t ./scripts/openloop_eval.sh

OpenPI pi0.5\pi_{0.5}pi0.5

Please see more detailed instructions here: baselines/pi05.

InternVLA-M1

Install the env

cd src/InternVLA-M1; uv sync --python 3.10

  1. training

cd src/InternVLA-M1 bash scripts/train_internvla.sh

  1. serving a checkpoint

cd src/InternVLA-M1 ./scripts/deploy_internvla.sh

H-RDT

See quick-start doc for baseline/hrdt.

EgoVLA

See quick-start doc for baseline/egovla.

Diffusion Policy

See dedicated doc here baseline/dp

ACT

See dedicated doc here baseline/act

Simulation

We use SIMPLE to benchmark Psi_0\Psi_0Psi_0 and all the baselines.

📢 SIMPLE is an easy-to-use humanoid benchmarking simulator built on the MuJoCo physics engine and Isaac Sim rendering.

Install SIMPLE

Currently, there are two options to integrate SIMPLE and Psi-0.

[Option 1] Install stand-alone SIMPLE (Best for collecting data through teleoperation)

We recommend to install SIMPLE on stand alone desktop with a NVIDIA GPU (3090/4090/5090).

Please refer to the SIMPLE repo here

[Option 2] Install SIMPLE as third-party dependency (Best for evaluting Psi-0 and all baselines)

Please refer the more details steps here.

Data Generation

📂 We also provide 6 pre-collected whole-body humanoid loco-manipulation tasks at Huggingface psi-data. If you want to use the existing simulation data, jump to the Fine-Tuning

Motion-Planning Based Data Generation

Please refert to the SIMPLE docs.

Teleoperation in Simulator

Please refert to the SIMPLE docs.

Fine-Tuning

👉 You can skip fine-tuning and download our released checkpoints for SIMPLE.

Download SIMPLE task data and extract it:

💡 Dont forget source .env first before following below commands.

export task=G1WholebodyXMovePickTeleop-v0

hf download USC-PSI-Lab/psi-data \
  simple/$task.zip \
  --local-dir=$PSI_HOME/data \
  --repo-type=dataset

unzip <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>S</mi><msub><mi>I</mi><mi>H</mi></msub><mi>O</mi><mi>M</mi><mi>E</mi><mi mathvariant="normal">/</mi><mi>d</mi><mi>a</mi><mi>t</mi><mi>a</mi><mi mathvariant="normal">/</mi><mi>s</mi><mi>i</mi><mi>m</mi><mi>p</mi><mi>l</mi><mi>e</mi><mi mathvariant="normal">/</mi></mrow><annotation encoding="application/x-tex">PSI_HOME/data/simple/</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.05764em;">PS</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em;">I</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.08125em;">H</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.05764em;">OME</span><span class="mord">/</span><span class="mord mathnormal">d</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">a</span><span class="mord">/</span><span class="mord mathnormal">s</span><span class="mord mathnormal">im</span><span class="mord mathnormal" style="margin-right:0.01968em;">pl</span><span class="mord mathnormal">e</span><span class="mord">/</span></span></span></span>task.zip -d $PSI_HOME/data/simple

👀 If you want to visualize the episode please refer to the Data Visualization in the examples.

Start training:

Please set up the envrionment variables if not done so yet.

bash scripts/train/psi0/finetune-simple-psi0.sh $task

The training will create a run dir which is located under .runs in the project root. If your GPU has limited VRAM, set --train.optimizer-foreach=false to reduce optimizer-step memory usage at the cost of some speed.

Evaluation in SIMPLE

Serve Psi_0\Psi_0Psi_0

export run_dir=<the run dir here under folder .runs>
export ckpt_step=<checkpoint step>
uv run --active --group psi --group serve serve_psi0 \
  --host 0.0.0.0 \
  --port 22085 \
  --run-dir=$run_dir \
  --ckpt-step=$ckpt_step \
  --action-exec-horizon=24 \
  --rtc

Run open-loop evaluation (offline)

examples/simple/openloop_eval.ipynb

Run the Evaluation in SIMPLE

This quick-start guide assumes running SIMPLE on a Stand-alone workstation with NVIDIA GPU.

We recommend serving the VLA models on a remote server other than locally as IsaacSim is also resource demanding.

If the server is started on a remote server, run ssh port forward. eg., ssh -L 22086:localhost:22086 songlin@nebula100.

Once port forward is done, open a new terminal to test if server is up curl -i http://localhost:22085/health

Download eval tasks from USC-PSI-Lab/psi-data.

cd /path/to/SIMPLE
export task=G1WholebodyXMovePickTeleop-v0

Download eval data and extract it:

hf download USC-PSI-Lab/psi-data \
    simple-eval/$task.zip \
    --local-dir=data/evals \
    --repo-type=dataset

unzip data/evals/simple-eval/$task.zip -d data/evals/simple-eval

Now start SIMPLE eval in the SIMPLE environment:

We provide three domain randomization levels: level-0, level-1, level-2 for each task

We use two different entrypoints for evaluating different tasks:

set entrypoint and agent to eval_decoupled_wbc.py and psi0_decoupled_wbc if the evaluating task ends with Teleop, which means the task data is collected using teleoperation:

export entry=eval_decoupled_wbc.py
export agent=psi0_decoupled_wbc

and set entrypoint and agent to eval.py and psi0 if the evaluating task ends with MP, which means the task data is generated using CuRobo Motion planning:

export entry=eval.py
export agent=psi0

Launch the evaluation script:

python src/simple/cli/$entry \
    simple/$task \
    $agent \
    $dr \
    --host=localhost \
    --port=9000 \
    --sim-mode=mujoco_isaac \
    --no-headless \
    --data-format=lerobot \
    --data-dir=data/evals/simple-eval/$task/$dr

The policy rollout videos will be found in folder third_party/SIMPLE/data/evals/psi0.

The evaluation for a single episode could take up to 6~10 minutes because SIMPLE use a synchronous rendering API in IsaacSim. See here for more explanation.

Reproduce Ψ₀: Pre-Training and Post-Training

Pre-Train VLM

Download and cache the official Qwen/Qwen3-VL-2B-Instruct weights.

scripts/predownload_qwen3vl.py

Pre-train on the EgoDex dataset

Pre-compute 48 DoF EgoDex action:

We re-use the pre-process code from H-RDT EgoDex Pre-Processing.

  1. Change the paths in src/h_rdt/datasets/pretrain/setup_pretrain.sh.
  2. Tweak the NUM_PROCESSES if on a powerful server, i tried max 64.
  3. set FORCE_OVERWRITE=True if the processing script is disrupted.
source src/h_rdt/datasets/pretrain/setup_pretrain.sh
source .venv-psi/bin/activate
bash src/h_rdt/datasets/pretrain/run_pretrain_pipeline.sh

[Optinal] If you also want to train FAST tokenizer, please refer to traing FAST.

bash scripts/train/psi0/pretrain-egodex-psi0-fast.sh 

Pre-train on humanoid everyday dataset

Please download the pre-processed HE data here: hf download USC-PSI-Lab/psi-data HE_RAW.zip --repo-type=dataset

bash scripts/train/psi0/pretrain-he-psi0-fast.sh

Save the pretrained checkpoints once training is done:

python scripts/save_pretrain_qwen3vl_backbone.py

Post-Train Action Expert

Download pre-trained psi-0 VLM backbone

python scripts/data/download.py \
  --repo-id=USC-PSI-Lab/psi-model \
  --remote-dir=psi0/pre.fast.1by1.2601091803.ckpt.ego200k.he30k \
  --local-dir=$PSI_HOME/cache/checkpoints/psi0/pre.fast.1by1.2601091803.ckpt.ego200k.he30k \
  --repo-type=model

Post-train on humanoid everyday (HE) dataset

bash scripts/train/psi0/posttrain-he-psi0.sh

Save post-trained action header once training is over

python scripts/save_posttrain_action_expert.py

Checkpoints

The released checkpoints on HuggingFace Psi-Model is listed

Checkpoint Description Remote Directory
Psi_0\Psi_0Psi_0 VLM(Baseline) Pre-trained VLM backbone (EgoDex 200K steps + HE 30K steps) psi0/pre.fast.1by1.2601091803.ckpt.ego200k.he30k
Psi_0\Psi_0Psi_0 Action Expert(Baseline) Post-trained Action Expert On HE psi0/postpre.1by1.pad36.2601131206.ckpt.he30k

and more variants for ablation studies:

Checkpoint Description Remote Directory
Psi_0\Psi_0Psi_0 VLM(Ablation Study) Pre-trained VLM backbone only on EgoDex 200K steps psi0/pre.fast.egodex.2512241941.ckpt200k
Psi_0\Psi_0Psi_0 VLM(Ablation Study) Pre-trained VLM backbone only on HE 48K steps psi0/pre.abl.only.he.2512311516.48k
Psi_0\Psi_0Psi_0 VLM(Ablation Study) Pre-trained VLM backbone only on 10% EgoDex psi0/pre.abl.ego.10per.2602021632.46k
Psi_0\Psi_0Psi_0 Action Expert(Ablation Study) Post-train on HE by picking pre-trained variant psi0/pre.abl.only.he.2512311516.48k psi0/postpre.abl.only.he.2602050012
Psi_0\Psi_0Psi_0 Action Expert(Ablation Study) Post-train on HE by picking pre-trained variant psi0/pre.abl.ego.10per.2602021632.46k psi0/postpre.abl.ego.10per.2602050006

Download the selected models

Edit .env to use HF_ENDPOINT=https://hf-mirror.com if needed.

python scripts/data/download.py \
  --repo-id=USC-PSI-Lab/psi-model \
  --remote-dir=<Remote Directory> \
  --local-dir=$PSI_HOME/cache/checkpoints/<Remote Directory> \
  --repo-type=model

Troubleshootings

  1. Lerobot dataset issues: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

This usually means the environment is still on the legacy PSI lerobot stack. Resync the PSI env so it uses the same lerobot and datasets versions as SIMPLE, then verify the import layout:

source .venv-psi/bin/activate uv sync --group psi --active python -c "from psi.data.lerobot.compat import LEROBOT_LAYOUT; print(LEROBOT_LAYOUT)"

  1. Fail to install evdev, src/evdev/input.c:10:10: fatal error: Python.h: No such file or directory
sudo apt update
sudo apt install -y python3-dev python3-venv build-essential \
    linux-headers-$(uname -r)
  1. RuntimeError: Could not load libtorchcodec. Likely causes ...
sudo apt-get install ffmpeg
  1. ImportError: cannot import name 'Deprecated' from 'wandb.proto.wandb_telemetry_pb2'

re-install wandb

source .venv-pusht/bin/activate
uv pip uninstall wandb
uv pip install wandb==0.18.0
  1. support sm_120 on newer GPUs like 5090 or RTX 6000, UserWarning: Ignoring invalid value for boolean flag CUDA_LAUNCH_BLOCKING: truevalid values are 0 or 1.

update torch and flash-attn

uv pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128
uv pip install flash-attn --no-build-isolation
  1. Failed to download and build lerobot ... , Use git lfs logs last to view the log.
GIT_LFS_SKIP_SMUDGE=1 uv ...

Citation

@article{wei2026psi0,
  title={{$\Psi_0$}: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation},
  author={Wei, Songlin and Jing, Hongyi and Li, Boqian and Zhao, Zhenyu and Mao, Jiageng and Ni, Zhenhao and He, Sicheng and Liu, Jie and Liu, Xiawei and Kang, Kaidi and others},
  journal={arXiv preprint arXiv:2603.12263},
  year={2026}
}

License

This project is licensed under the Apache License 2.0.

See the LICENSE file for details.