HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning

Zhi Jing1,2 Siyuan Yang1,3 Jicong Ao1 Ting Xiao4 Yugang Jiang2 Chenjia Bai1✉
1Institute of Artificial Intelligence (TeleAI), China Telecom 2Fudan University 3University of Science and Technology of China,  4East China University of Science and Technology
Equally leading organizations  Corresponding authors 
institution

Abstract

For robotic manipulation, existing robotics datasets and simulation benchmarks predominantly cater to robot-arm platforms. However, for humanoid robots equipped with dual arms and dexterous hands, simulation tasks and high-quality demonstrations are notably lacking. Bimanual dexterous manipulation is inherently more complex, as it requires coordinated arm movements and hand operations, making autonomous data collection challenging. This paper presents HumanoidGen, an automated task creation and demonstration collection framework that leverages atomic dexterous operations and LLM reasoning to generate relational constraints. Specifically, we provide spatial annotations for both assets and dexterous hands based on the atomic operations, and perform an LLM planner to generate a chain of actionable spatial constraints for arm movements based on object affordances and scenes. To further improve planning ability, we employ a variant of Monte Carlo tree search to enhance LLM reasoning for long-horizon tasks and insufficient annotation. In experiments, we create a novel benchmark with augmented scenarios to evaluate the quality of the collected data. The results show that the performance of the 2D and 3D diffusion policies can scale with the generated dataset.

Method

图片1

HumanoidGen is an automated simualtion framework for scene generation, demonstration collection, and data generalization of bimanual dexterous manipulation, aiming to provide high-quality demonstrations over diverse scenarios to facilitate data scaling and policy learning. (i) As preparation, the assets and dexterous hands are meticulously annotated with spatial information. In scene generation, the LLM planner aims to generate an environment setup with code-form configuration based on asset, scene, and task descriptions. (ii) Based on the generated scenes and pre-defined hand atomic operations, the LLM proceeds to generate the planning code of a chain of spatial constraints for subsequent data collection. (iii) For tasks with long task horizon planning and insufficient annotations, we employ MCTS with introspective exploration to enhance the reasoning performance of LLMs. (iv) Then, we collect demonstrations by executing the generated code plan with scene scaling to enhance data diversity. These demonstrations are utilized to construct a humanoid manipulation benchmark for policy evaluation.


Experiments


Demonstration Generation


We designed 20 diverse tabletop manipulation tasks and used HumanoidGen to generate demonstrations for them. These tasks cover a wide range of dexterous manipulation scenarios, including single-arm and bimanual operations, long-horizon tasks, articulated object manipulation, and complex collision scenarios.

block_handover
blocks_stack_easy
blocks_stack_hard
close_box_easy
close_box_hard
close_drawer
close_laptop_easy
close_laptop_hard
cup_pour_easy
dual_bottles_pick_easy
dual_bottles_pick_hard
empty_cup_place
handover_and_storage
handover_and_storage_coopration
open_box_easy
open_box_hard
open_drawer
open_laptop_easy
open_laptop_hard
pyramid_stack

MCTS Enhanced Demonstration Generation


We enhanced HumanoidGen with MCTS to improve its demonstration generation capabilities. Three tasks from the aforementioned tasks, along with an additional single-arm task, are selected for the experiment. We increased the task complexity by (i) removing all operation annotations for cubes and (ii) simplifying task descriptions to provide minimal guidance.



Various implementations (more videos):

图片1
blocks_stack_hard(1)
blocks_stack_hard(2)
blocks_stack_hard(3)
blocks_stack_hard(4)
pyramid_stack(1)
pyramid_stack(2)
pyramid_stack(3)
pyramid_stack(4)

DP & DP3 Deployment

图片1


DP3 Performance Videos

cup_pour_easy (success)
cup_pour_easy (fail)
dual_bottles_pick_easy (success)
dual_bottles_pick_easy (fail)
dual_bottles_pick_hard (success)
dual_bottles_pick_hard (fail)
empty_cup_place (success)
empty_cup_place (fail)
open_box_easy (success)
open_box_easy (fail)
open_drawer (success)
open_drawer (fail)
open_laptop_easy (success)
open_laptop_easy (fail)

BibTeX