Manipulation policies are bottlenecked on diverse, real-world, action-labeled demonstrations. We capture them from real human work — egocentric and UMI handheld-gripper — and deliver training-ready trajectories in your format and to your spec: the boundaries, contacts, and sync that decide whether a policy actually learns.
Two capture modes feed one curation pipeline — producing the "empty corner" of robot data: cheap, diverse-in-the-real-world, and directly action-aligned.
First-person video and handheld-gripper demos from real, varied tasks and environments — not a single lab bench.
Calibration, tight sensor sync, temporal action segmentation, contact/grasp labeling, language grounding — automated first pass, human-verified.
VLA-ready trajectories to your schema with a per-episode QA report — and we can validate them on your stack.

Head/chest-mounted capture of people doing real tasks — long-horizon, naturally diverse, impossible to fake in simulation.

A low-cost handheld gripper + fisheye camera collects robot-shaped, directly action-aligned demos — no robot required.
Handheld capture in the UMI style, visual-inertial pose recovery, established hand/object tracking, and delivery into the LeRobot / RLDS ecosystem. Our edge is operational: doing it reliably, at real-world diversity, with rigorous verification.
Off-the-shelf models give primitives, not trainable labels. The value-determining parts — exact boundaries, action↔observation sync, contact instants, sub-task structure — are what general tools miss, and what we specialize in.
Intrinsics/extrinsics + frame-accurate time-alignment across every sensor.
Visual-inertial pose tracking for camera/end-effector; hand-pose + object segmentation.
Temporal action segmentation: precise boundaries + grasp/release events.
Per-segment language, mapped to your task taxonomy.
Human QA vs a gold set; export to LeRobot v3 / RLDS, ready to train.
A long demonstration, automatically split into labeled action segments — then human-verified.
A concise technical overview (rigs, sync budget, models, schema) is available to teams under discussion. See the technical overview →
Each delivered episode is a time-synchronized, labeled trajectory in your format. Example below: a LeRobotDataset v3.0 episode (RLDS, HDF5, or custom on request).
# one episode = one task demonstration observation.images.* : mp4 # wrist + scene cams observation.state : [ee_pose, gripper] action : [next ee target, gripper] language_instruction : "bag the groceries" is_first / is_last : episode boundaries fps : 30
We learn your exact data spec first, then deliver a small paid pilot before any scale-up.
We treat quality as something to measure, not assert — acceptance criteria are set up front and reported against.
A held-out gold standard + inter-annotator agreement on boundaries, contacts, and labels — quality as a number, not an opinion.
Boundary & contact tolerances, action↔video sync, and pose accuracy set to your thresholds and verified per episode.
On request, we train a baseline VLA (via LeRobot) on a sample to confirm the data moves task success — not just that it parses.
A short call to capture your exact data spec, then a fixed-price pilot — a small, defined batch to that spec, typically in a few weeks.
You assess the pilot against the agreed criteria (and train on it if you like). We iterate the spec from your feedback.
Once the data clears your bar, we scale collection across tasks and environments on a recurring basis.
Collaborative by default — we work inside your formats and conventions, and we're happy to sign an NDA.
LeRobotDataset v3.0 by default; RLDS/TFDS on request. We conform to your field names, frame rate, camera layout, and label taxonomy — not a fixed schema of ours.
Yes — task distribution, objects, environments, and edge cases (including failures and recoveries) are part of the spec we agree before collection.
Acceptance criteria are defined up front (boundary/contact tolerances, sync, pose accuracy). We report per-episode QA against a gold set with inter-annotator agreement, and can validate a sample by training a baseline policy on it.
You can — many teams do for a while. We exist for the part that's annoying to scale: diverse real-world capture plus robotics-specific curation (boundaries, contacts, sync) at a cost basis that's hard to staff internally. If self-collection is working for you, we're not a fit.
Data ownership and licensing follow your program's terms — typically you own what we deliver. We're happy to work under an NDA. Any specific privacy or compliance requirements are something we'll scope with you.
The base deliverable is end-effector / hand-action trajectories. Cross-embodiment retargeting to your robot is an optional add-on; we're explicit about where it's reliable and where it isn't.
A scoped, fixed-price pilot on one task to your spec — cheap enough to be a quick "yes," concrete enough to judge.
We're partnering with a small number of robot-learning teams on initial datasets. If you have raw capture you haven't curated, or a task distribution you can't get data for, a 30-minute call to understand your data spec is the best place to start.
SF Bay Area · collection across diverse real-world environments · happy to sign an NDA