
Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation

Guanxing Lu1*    Zifeng Gao1*    Tianxing Chen2    Wenxun Dai2   
Ziwei Wang3   and   Yansong Tang1†
1Tsinghua Shenzhen International Graduate School, Tsinghua University
2Shanghai AI Laboratory 3Carnegie Mellon University

*Equal Contributions, Corresponding author


Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories. Recent diffusion-based methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps, especially with high-dimensional observations. To this end, we propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process, so that the model can generate robot actions in only one-step inference.

Framework of ManiCM

Given a raw action sequence a0, we first perform a forward diffusion to introduce noise over n + k steps. The resulting noisy sequence an+k is then fed into both the online network and the teacher network to predict the clean action sequence. The target network uses the teacher network’s k-step estimation results to predict the action sequence. To enforce self-consistency, a loss function is applied to ensure that the outputs of the online network and the target network are consistent.


31 tasks (Adroit and MetaWorld)
We conduct our experiments in the well-recognized MetaWorld and Adroit benchmarks, resulting in a total of 31 tasks. These tasks range from simple pick-and-place tasks to more challenging scenarios such as dexterous manipulation, which ensure that the model is effective across various scenarios.
Comparisons on Runtime
We evaluate 100 episodes on 31 challenging tasks from Adroit and Metaworld across 3 random seeds and report the time consumption per step (s) with standard deviation. The second results are underlined and the best results are bold. ‘∗’ denotes the reproduced version. The performance of our ManiCM in one-step inference surpasses all state-of-the-art models, providing ample evidence for the effectiveness of consistency distillation.
Comparisons on Success Rate
We evaluate 100 episodes on 31 challenging tasks from Adroit and Metaworld across 3 random seeds and report the success rates (%) with standard deviation. The second results are underlined and the best results are bold. ‘∗’ denotes the reproduced version. The performance of our ManiCM in one-step inference surpasses all state-of-the-art models, providing ample evidence for the effectiveness of consistency distillation.


The author team would like to acknowledge Zhixuan Liang and Yao Mu from the University of Hong Kong for their helpful technical discussion and suggestions.

Our code is built upon 3D Diffusion Policy, MotionLCM, Latent Consistency Model, Diffusion Policy, VRL3, Metaworld, and ManiGaussian. We would like to thank the authors for their excellent works.


      title={ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation}, 
      author={Guanxing Lu and Zifeng Gao and Tianxing Chen and Wenxun Dai and Ziwei Wang and Yansong Tang},
      journal={arXiv preprint arXiv:2406.01586},