The data layer for physical AI

Action-aligned human data for robot learning.

Manipulation policies are bottlenecked on diverse, real-world, action-labeled demonstrations. We capture them from real human work — egocentric and UMI handheld-gripper — and deliver training-ready trajectories in your format and to your spec: the boundaries, contacts, and sync that decide whether a policy actually learns.

Egocentric + UMI capture Temporal action segmentation Your format — LeRobot v3 · RLDS · custom Verified to your spec
Egocentric videoUMI handheld-gripperTemporal action segmentationContact & grasp labelingFrame-accurate sync6-DoF end-effector poseLanguage groundingLeRobotDataset v3.0RLDS exportCross-embodiment retargeting
What we do

Real human work, turned into trainable robot data.

Two capture modes feed one curation pipeline — producing the "empty corner" of robot data: cheap, diverse-in-the-real-world, and directly action-aligned.

Capture

First-person video and handheld-gripper demos from real, varied tasks and environments — not a single lab bench.

Curate

Calibration, tight sensor sync, temporal action segmentation, contact/grasp labeling, language grounding — automated first pass, human-verified.

Deliver

VLA-ready trajectories to your schema with a per-episode QA report — and we can validate them on your stack.

2 modes
egocentric + UMI capture
frame-accurate
action↔video synchronization
your format
LeRobot v3 · RLDS · HDF5 · custom
to spec
your schema & taxonomy
The data we collect

Two complementary capture modes.

Egocentric head-mounted camera capturing a first-person field of view

Egocentric human video

Head/chest-mounted capture of people doing real tasks — long-horizon, naturally diverse, impossible to fake in simulation.

  • Streams: wide-FoV RGB, IMU, optional eye-gaze & wrist IMUs/gloves.
  • Derived: 3D hand pose, object tracking, contact events, camera trajectory.
  • Best for: long-tail, bimanual, tool-use; human-data co-training.
UMI handheld gripper holding a Rubik's cube

UMI handheld-gripper demos

A low-cost handheld gripper + fisheye camera collects robot-shaped, directly action-aligned demos — no robot required.

  • Streams: 155° fisheye RGB + IMU, gripper-width, mirror stereo cues.
  • Derived: 6-DoF end-effector trajectory + gripper width via visual-inertial pose tracking (gripper camera + IMU).
  • Best for: contact-rich manipulation that transfers to parallel-jaw grippers & arms.

We work in the field's standard methods and interfaces.

Handheld capture in the UMI style, visual-inertial pose recovery, established hand/object tracking, and delivery into the LeRobot / RLDS ecosystem. Our edge is operational: doing it reliably, at real-world diversity, with rigorous verification.

UMI-style captureVisual-inertial poseAction segmentationLeRobot · RLDS
How it works

Raw capture → trainable trajectories.

Off-the-shelf models give primitives, not trainable labels. The value-determining parts — exact boundaries, action↔observation sync, contact instants, sub-task structure — are what general tools miss, and what we specialize in.

01

Calibrate & sync

Intrinsics/extrinsics + frame-accurate time-alignment across every sensor.

02

Track

Visual-inertial pose tracking for camera/end-effector; hand-pose + object segmentation.

03

Segment

Temporal action segmentation: precise boundaries + grasp/release events.

04

Ground

Per-segment language, mapped to your task taxonomy.

05

Verify & export

Human QA vs a gold set; export to LeRobot v3 / RLDS, ready to train.

RAW DEMONSTRATION reachgrasptransportplace Temporal action segmentation

A long demonstration, automatically split into labeled action segments — then human-verified.

A concise technical overview (rigs, sync budget, models, schema) is available to teams under discussion. See the technical overview →

What you get

Drops straight into your training stack.

Each delivered episode is a time-synchronized, labeled trajectory in your format. Example below: a LeRobotDataset v3.0 episode (RLDS, HDF5, or custom on request).

episode_0007 · LeRobotDataset v3.0
# one episode = one task demonstration
observation.images.*   : mp4   # wrist + scene cams
observation.state      : [ee_pose, gripper]
action                 : [next ee target, gripper]
language_instruction   : "bag the groceries"
is_first / is_last     : episode boundaries
fps                    : 30

Every delivery includes

  • Trajectories to your spec — your fields, frame rate, taxonomy.
  • A QA report — per-episode quality scores vs a gold set.
  • Capture context — task, environment, rig metadata per episode.
  • Optional — cross-embodiment retargeting to your robot.

We learn your exact data spec first, then deliver a small paid pilot before any scale-up.

How we prove it's trainable

"Looks labeled" isn't the bar. "Trains a policy" is.

We treat quality as something to measure, not assert — acceptance criteria are set up front and reported against.

Gold-set agreement

A held-out gold standard + inter-annotator agreement on boundaries, contacts, and labels — quality as a number, not an opinion.

Tolerance to your spec

Boundary & contact tolerances, action↔video sync, and pose accuracy set to your thresholds and verified per episode.

Policy-level validation

On request, we train a baseline VLA (via LeRobot) on a sample to confirm the data moves task success — not just that it parses.

How we work

A low-risk path from spec to scale.

1

Spec & pilot

A short call to capture your exact data spec, then a fixed-price pilot — a small, defined batch to that spec, typically in a few weeks.

2

Evaluate

You assess the pilot against the agreed criteria (and train on it if you like). We iterate the spec from your feedback.

3

Scale

Once the data clears your bar, we scale collection across tasks and environments on a recurring basis.

Collaborative by default — we work inside your formats and conventions, and we're happy to sign an NDA.

FAQ

The questions labs ask first.

What format do you deliver in?

LeRobotDataset v3.0 by default; RLDS/TFDS on request. We conform to your field names, frame rate, camera layout, and label taxonomy — not a fixed schema of ours.

Can we specify the exact tasks and environments?

Yes — task distribution, objects, environments, and edge cases (including failures and recoveries) are part of the spec we agree before collection.

How do you guarantee quality?

Acceptance criteria are defined up front (boundary/contact tolerances, sync, pose accuracy). We report per-episode QA against a gold set with inter-annotator agreement, and can validate a sample by training a baseline policy on it.

Why not just collect this ourselves, or use a generic labeling vendor?

You can — many teams do for a while. We exist for the part that's annoying to scale: diverse real-world capture plus robotics-specific curation (boundaries, contacts, sync) at a cost basis that's hard to staff internally. If self-collection is working for you, we're not a fit.

Who owns the data, and can you sign an NDA?

Data ownership and licensing follow your program's terms — typically you own what we deliver. We're happy to work under an NDA. Any specific privacy or compliance requirements are something we'll scope with you.

Can the data target our specific robot?

The base deliverable is end-effector / hand-action trajectories. Cross-embodiment retargeting to your robot is an optional add-on; we're explicit about where it's reliable and where it isn't.

What's the smallest way to start?

A scoped, fixed-price pilot on one task to your spec — cheap enough to be a quick "yes," concrete enough to judge.

Work with us

Tell us what your models are hungry for.

We're partnering with a small number of robot-learning teams on initial datasets. If you have raw capture you haven't curated, or a task distribution you can't get data for, a 30-minute call to understand your data spec is the best place to start.

SF Bay Area · collection across diverse real-world environments · happy to sign an NDA