Results
We demonstrate how RCS supports VLA research by investigating VLA generalization across multiple embodiments and assessing the benefit of simulated data for robotic foundation models.
Fig. 2: We fine-tune Pi Zero on four datasets from different setups.
Each dataset contains fewer than 150 episodes.
The fine-tuned models are deployed on the corresponding setups.
The robots that are more prominent in the base model's data mix achieve better results.
Fig. 3: We investigate the impact of simulated data on VLA performance.
Our setup is replicated in simulation and used to generate 500 trajectories using a scripted policy, which is then used to complement our manually collected dataset of 143 trajectories.
The plots show the success rate of the policy, both in the simulated scene and on the hardware, as training progresses.
Success rates in simulation correlate with success rates on the physical robot—consistent with a good evaluation metric.
Adding simulated data to the training mix improves performance in both settings.
System Performance
To support modern, high-frequency VLA architectures (like π0), RCS must process data efficiently. We evaluated the recording frequencies across multiple camera and tactile sensor configurations. RCS scales reliably up to 90-120 Hz before encountering hardware IO bottlenecks, proving its suitability for high-frequency RL and VLA deployments despite the synchronous nature of Gymnasium environments.
Fig. 4: Configured control vs. measured data frequency during teleoperation, averaged over more than 1000 steps. The shaded area denotes the standard deviation. FR3 2 Cams: Two RealSense cameras. FR3 4+2 Cams: Four RealSense cameras and two DIGIT sensors. Dual FR3 4+2 Cams: Like FR3 4+2 Cam but with two FR3 robots.