ManiCM

Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

Guanxing Lu¹* Zifeng Gao¹* Tianxing Chen² Wenxun Dai²
Ziwei Wang³ and Yansong Tang^1† ¹Tsinghua Shenzhen International Graduate School, Tsinghua University
²Shanghai AI Laboratory ³Carnegie Mellon University

^*Equal Contributions, ^†Corresponding author

Paper arXiv Code

Abstract

Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference.

Framework of ManiCM

Given a raw action sequence a₀, we first perform a forward diffusion to introduce noise over n + k steps. The resulting noisy sequence a_n+k is then fed into both the online network and the teacher network to predict the clean action sequence. The target network uses the teacher network’s k-step estimation results to predict the action sequence. To enforce self-consistency, a loss function is applied to ensure that the outputs of the online network and the target network are consistent.

Results

31 tasks (Adroit and MetaWorld)

We conduct our experiments in the well-recognized MetaWorld and Adroit benchmarks, resulting in a total of 31 tasks. These tasks range from simple pick-and-place tasks to more challenging scenarios such as dexterous manipulation, which ensure that the model is effective across various scenarios.

Comparisons on Runtime

We evaluate 100 episodes on 31 challenging tasks from Adroit and Metaworld across 3 random seeds and report the time consumption per step (s) with standard deviation. The second results are underlined and the best results are bold. ‘∗’ denotes the reproduced version. The performance of our ManiCM in one-step inference surpasses all state-of-the-art models, providing ample evidence for the effectiveness of consistency distillation.

Comparisons on Success Rate

We evaluate 100 episodes on 31 challenging tasks from Adroit and Metaworld across 3 random seeds and report the success rates (%) with standard deviation. The second results are underlined and the best results are bold. ‘∗’ denotes the reproduced version. The performance of our ManiCM in one-step inference surpasses all state-of-the-art models, providing ample evidence for the effectiveness of consistency distillation.

Acknowledgements

The author team would like to acknowledge Zhixuan Liang and Yao Mu from the University of Hong Kong for their helpful technical discussion and suggestions.

Our code is built upon 3D Diffusion Policy, MotionLCM, Latent Consistency Model, Diffusion Policy, VRL3, Metaworld, and ManiGaussian. We would like to thank the authors for their excellent works.

BibTeX

@article{lu2024manicm,
      title={ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation}, 
      author={Guanxing Lu and Zifeng Gao and Tianxing Chen and Wenxun Dai and Ziwei Wang and Yansong Tang},
      journal={arXiv preprint arXiv:2406.01586},
      year={2024}
}