metadata

license: mit
language:
  - en

RDT-1B

RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given a language instruction and 3-view RGB image observations, RDT can predict the next 64 robot actions. RDT is inherently compatible with almost all kinds of modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis.

All the code and pretrained model weights are licensed under MIT license.

Please refer to our project page and paper for more information.

Model Details

Developed by RDT Team from Tsinghua University.
License: MIT
Language(s) (NLP): en
Model Architecture: Diffusion Transformer.
Pretrain dataset: Curated pretrain dataset collected from 46 datasets. Please see here for detail.
Repository: [repo_url]
Paper : [paper_url]
Project Page: https://rdt-robotics.github.io/rdt-robotics/

Uses

RDT takes language instruction, image observations and proprioception as input, and predicts the next 64 robot actions in the form of unified action space vector, including all the main physical quantities of robots, including the end-effector and joint, position and velocity, base movement, etc.

Getting Started

RDT-1B supports finetuning on custom dataset, deploying and inferencing on real-robots, as well as pretraining the model.

Please refer to our repository for all the above guides.

Citation

BibTeX:

[More Information Needed]