T-Rex Dataset — Tactile-Reactive Dexterous Manipulation

Overview

Existing robot manipulation datasets are mostly built around parallel-gripper embodiments or grasp-centric dexterous hands, offering limited coverage of tactile-rich dexterous interactions beyond grasping. To train tactile-reactive policies, we collect a large-scale robot play dataset with synchronized tactile observations on our bimanual dexterous embodiment.

The T-Rex mid-training dataset consists of 50 open source hours of teleoperated bimanual dexterous manipulation episodes spanning more than 200 daily objects and 22 motion primitives. Episodes are designed around object × motion-primitive combinations for compact coverage of contact-rich behaviors, and include randomly selected distractor objects to encourage instruction-following under visual ambiguity.

Data is collected on a bimanual Dexmate Vega-1 robot with a fixed base and two Sharpa Wave dexterous hands.

Robot Platform

Hardware setup. The T-Rex dataset is collected on a bimanual Dexmate Vega-1 robot with two Sharpa Wave dexterous hands. The on-board sensing rig combines a ZED X Mini head camera and two wide-view ZED X One S wrist cameras with per-fingertip tactile sensors on all ten fingertips. Teleoperation uses Manus gloves for finger retargeting and VIVE trackers for wrist SE(3) pose, with a Windows laptop running the teleoperation interface.

The Vega-1 platform has a fixed base, torso, and head, with two 7-DoF arms actuated during data collection. Each Sharpa Wave hand contributes an additional 22 DoF for in-hand dexterous control and carries five fingertip tactile sensors that report estimated deformation depth and a 6-axis net wrench.

All RGB streams are captured at 640×360, time-synchronized with robot states and tactile signals at 30 Hz. Camera placements are tuned so the head camera covers the full reachable workspace while the wrist cameras retain unoccluded views of the fingers during contact-rich behaviour.

During teleoperation, VIVE trackers stream SE(3) wrist targets and Manus gloves stream fingertip targets to the same inverse-kinematics pipeline used at deployment. A high-level thread runs at 30 Hz, while a low-level controller tracks the resulting joint targets at 300 Hz.

What's in Each Episode

Multi-View Vision

Time-synchronised RGB streams from one head camera and two wrist cameras, aligned with all robot states and tactile signals at 30 Hz.

Robot States & Actions

30 Hz joint positions and EF poses — current states paired with target actions — across the bimanual 7-DOF arms and 22-DOF dexterous hands.

Per-Fingertip Tactile

Three synchronised modalities across 10 fingertips: 240×320 raw tactile images, 240×240 deformation maps, and 6-axis force/torque signals.

Language Instructions

VLM-generated natural-language task descriptions paired with every episode, enabling robust instruction-following under visual ambiguity and distractor clutter.

Object × Primitive Coverage

Episodes are structured around object × motion-primitive combinations spanning 200+ daily objects and 22 motion primitives, giving compact coverage of contact-rich behaviours.

Scene Diversification

Each scene randomizes both distractor objects and patterned-cloth backgrounds, producing visually ambiguous and varied contexts that ground policy learning and discourage visual overfitting.

Dataset Composition

T-Rex spans a long-tail distribution over both objects and motion primitives. The three charts below summarise how demonstration time is allocated across high-level task categories, individual household objects, and motion primitives.

Pie chart of demonstration time per task category — **Composition of the T-Rex dataset.** Top‑left: share of total demonstration time across the high-level task categories covered by the T-Rex taxonomy. Top‑right: hours of demonstration data per motion primitive across the 22 verbs in the T-Rex taxonomy. Bottom: number of demonstrations collected for each of the 200+ daily household objects, ordered by frequency.

Bar chart of demonstration hours per motion primitive — **Composition of the T-Rex dataset.** Top‑left: share of total demonstration time across the high-level task categories covered by the T-Rex taxonomy. Top‑right: hours of demonstration data per motion primitive across the 22 verbs in the T-Rex taxonomy. Bottom: number of demonstrations collected for each of the 200+ daily household objects, ordered by frequency.

Sample Trajectories

A small selection of demonstrations is shown below, covering deformation, force-sensitive contact, bimanual coordination, and precision insertion. The full dataset spans many more tasks, objects, and scene configurations.

Fold aluminum foil

Wrap cable

Peel bandaid

Press bumper sticker

Squeeze air blower

Wipe book cover

Open clamshell container

Cut origami paper

Pour from carafe

Insert battery

Screw on carafe lid

Shake matchbox

Getting Started

🔍 Explore Dataset on GitHub

Interactive Dataset Visualizer

Open visualizer →

Browse a 500-trajectory random subset, filter by object and motion primitive, and resample on demand — all directly in your browser.

Open visualizer →

Dataset Quickstart (Colab)

quickstart.ipynb

# !pip install -q huggingface_hub pandas pyarrow av numpy matplotlib
SOURCE = "zekaiwang/trex_dataset"  # streamed on demand from the Hub

# 1. Browse & filter — only meta/ (a few MB) is downloaded
eps = load_episodes(meta_root(SOURCE))
eps[eps["motor_primitive"] == "reach"].head()

# 2. Inspect one episode — pulls just that episode's files
droot = fetch_episode(SOURCE, ep_row)
f6 = load_array(droot, "observation.tactile_force", ep_row)

A single-notebook quickstart that streams the dataset straight from Hugging Face: browse and filter every episode from metadata alone, then pull individual episodes to inspect the three cameras, 20 tactile streams, and per-finger forces — no full download required.

Open in Colab → View on GitHub

T-Rex Dataset Tactile-Rich Bimanual
Dexterous Manipulation

Overview

Robot Platform