GitHub - unified-force/UniFP: CoRL2025 UniFP: Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation (original) (raw)

Overview

This project implements a reinforcement learning-based whole body control framework for B2Z1 robots, supporting unified policy learning for both position and force control. The framework uses Isaac Gym for simulation training and supports deployment from simulation to real robots.

Key Features:

Support for B2Z1 robot whole body control
Unified policy learning for position and force control
Reinforcement learning training based on PPO algorithm
Support for multiple robot configurations (B2Z1, G1, etc.)
Complete simulation-to-real deployment pipeline

TODO

Release UniFP training pipeline
Release sim2real with ROS2
Release sim2sim in MuJoCo
Release imitation learing data collection pipeline

Installation

System Requirements

Ubuntu 20.04/22.04
Python 3.8
CUDA 11.2+
Isaac Gym Preview 4 (requires NVIDIA developer account)

Installation Steps

Clone this project
git clone https://github.com/deathpoker/UniFP.git
cd UniFP
Set up the environment
conda create -n unifp python=3.8

isaacgym requires python <=3.8

conda activate unifp

Download the Isaac Gym binaries from https://developer.nvidia.com/isaac-gym

wget https://developer.nvidia.com/isaac-gym-preview-4
tar -xvzf isaac-gym-preview-4
cd isaacgym/python && pip install -e .
For libpython error:

Set LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=</path/to/conda/envs/your_env/lib>:$LD_LIBRARY_PATH

Install Python dependencies

Install PyTorch

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

Install other dependencies

pip install numpy matplotlib wandb

Usage

Policy Training

B2Z1 Position-Force Control Training

cd legged_gym/scripts python train_b2z1posforce.py --task=b2z1_pos_force --headless

Policy Evaluation and Testing

Run Trained Policies

B2Z1 position-force control testing

python play_b2z1posforce.py --task=b2z1_pos_force --load_run=

Parameter Configuration

Training Parameters

--task: Task name (b2z1_pos_force, b2z1_force_realrobot, h1, g1_humanoidgym, etc.)
--headless: Run in headless mode
--num_envs: Number of parallel environments
--max_iterations: Maximum training iterations

Environment Parameters

--flat_terrain: Use flat terrain
--physics_engine: Physics engine (physx)
--sim_device: Simulation device (cuda:0)

Core Components

Environment Configuration (legged_gym/envs/b2/b2z1_pos_force_config.py)
- Robot initial state configuration
- Reward function parameters
- Observation space definition
- Action space definition
Environment Implementation (legged_gym/envs/b2/legged_robot_b2z1_pos_force.py)
- Simulation environment logic
- Reward calculation
- Observation space construction
- Action execution
Training Algorithm (legged_gym/b2_gym_learn/ppo_cse_pf/)
- PPO algorithm implementation
- Policy network structure
- Value network structure
Task Registration (legged_gym/utils/task_registry_b2z1posforce.py)
- Task registration management
- Environment creation
- Trainer creation