.notion-header { display: none; !important }

Dataset/Code Release V1

Home | Dataset/Code | Challenge | Terms | Leaderboard | Sign Up

High-level summary



  1. Training data (train_synthetic/). You can access them when signing up for the challenge here
  2. Example test data (test_synthetic/) where tracking loss segments were already chosen. The example is included in the


  1. Generates simulated tracking failures, creates initial conditions with an error model, and adds noise to simulated IMU data
  2. Performs evaluation/analysis (by computing VMAE and comparing to pure IMU integration) and visualize results
  3. Includes an Arcturus pre-trained model

Available at:

Data Details

Data format

We share our data as a set of hdf5 files (one file per recording). The DIPr GitHub page repo contains sample code for reading and using the data. Unpacked data has the format below :

  1. A numpy arrays of shapes (N,7)
  2. A numpy arrays of shapes (N, 32)

Helper classes were added in the code to access the data by name.

imu, states, segments = load_hdf5_data()

# time | gyro xyz | acc xyz | END
# 0      1          4         7
imu: np.ndarray # shape = (N_samples, 7)

# time | isometry 1x16 | vel xyz | w xyz | gb xyz | ab xyz | gravity xyz | END
# 0      1               17        20      23       26       29            32
states: np.ndarray # shape = (N_samples, 32)

# Test only: contaons N_segments of (beg_time, end_time) tuples in seconds
segmennts : np.ndarray # shaep = (N_segments, 2)

# where
# time, seconds
# gyro rad/s, acc m/s2 are measurements in IMU frame, acc is inverseOfBodyAccel convention
# isometry is a transform from IMU frame to world, 
# vel m/s, w rad/s are linear and angular velocities of IMU in world
# gb rad/s, ab m/s2 are gyro and accel biases, that are zeros for synthetic data
# gravity m/s2 is gravity in world and equal to [0, 9.81, 0] for synthetic dataset

This format is used in case we want to release live recorded tracker data in the future, which may have different biases and gravity estimates per frame.

Dataset Recording / Creation

The data in this release is synthetic only and was generated as follow : 

  • We recorded head-mounted VR headset trajectories taken while a user was in VR (playing games, watching Google Earth, etc). Those trajectories were recorded using a Lighthouse tracking system providing data at ~200Hz (could vary). Around a dozen people participated in this recording for data diversity purposes.
  • We then fit B-Splines to the trajectories, which we use to generate synthetic IMU data and synthetic ground truth positions and velocities at 1000Hz. The synthetic IMU data is generated by estimating 1st rotational and 2nd translation derivative.

This release includes a total duration of about 6.54 hours of human interaction with VR applications. The provided model was trained on similar data.

It is worth mentioning that our synthetic IMU samples are perfect and have zero biases in contrast to IMU data recorded in a real-life setup. We don’t consider this a problem since live, with a SLAM tracker, we always have a current estimate for the biases which can be subtracted from the signal (up to bias estimation error).

Example Test Data Details

We provide a dataset named OpenVR_2021-09-02_17-40-34-synthetic.hdf5 inside the DIPr github page repo (’shared/test_synthetic’). It’s a synthetic dataset containing perfect IMU data, initial positions and velocities.

For such data, a simple IMU-only integration can produce a trajectory that matches closely the ground truth without any Deep Learning needed. Indeed, there are no initial errors, IMU noise or biases errors which could accumulate and create a drift.

Code Details


Solution example

We provide a simple solution with a single test dataset, sample code, and a pretrained CNN model. For the challenge, you should create your own solution based on this example.

Real Data simulation

To evaluate the model in conditions close to real-life, errors to the initial states and noise to the IMU signal were added. The parameters to define the error and noise magnitude were obtained from real-life experiments running Arcturus Industries SLAM on a VR device.

Our noise/error model includes:

  • For initial conditions: each time tracking fallback from SLAM to inertial-only prediction, we may add errors to the SLAM initial condition vector such as :
    • The timestamp will not contain any error
    • The headset pose is error-free since we will be testing the prediction quality relative to initial pose
    • The linear velocity will have an error sampled from N(0, 0.01 m/s2)
    • The gravity will have an error sampled from N(0, 0.01 m/s2)
    • The gravity will have an error sampled from N(0, 0.01 m/s2)
    • IMU Biases with errors sampled from N(0, 0.01m/s2) and N(0, 0.2º/s)
  • For IMU data:
    • Random IMU intrinsic (per fallback simulation): scale error sample from U(0.5%) and cross-axis rotation errors from U(0.02º)
    • IMU Gaussian noise (per each IMU sample) is generated using N(0,0.12 º/s) and N(0, 0.03 m/s2).

As the synthetic dataset contains zero-biased IMU, we add errors on the biases in the initial condition vector in order to simulate a SLAM tracker bias estimation error.

Please refer to the numerical values of each distortion parameter in our sample code. We will use exactly the same parameters to evaluate submissions, but we will use more datasets. We may adjust the noise parameters on our future dataset releases. To disentangle the prediction errors from the initial error defined randomly we will run multiple trials with different seeds, and then aggregate the results.

Sample CNN Implementation Details

In our solution, the CNN predicts the IMU linear velocity and a confidence for this estimation. The CNN output is then fused with an IMU-only integration process using an EKF filter [1].

We mention three implementation points.

  1. The CNN IMU inputs are expressed in a gravity-aligned frame with zero yaw heading. That frame is different from the world or IMU frame and implicitly encodes roll and pitch in the CNN inputs.
  2. The CNN velocity estimates are computed in the same gravity-aligned frame with zero-yaw.
  3. The EKF measurement function (measurement_fn) corresponds to the velocity in the same gravity-aligned frame with zero-yaw.

We follow the same approach than [2] to transform the velocity from the world frame to a gravity-aligned frame with zero-yaw. The rotation between these two frames corresponds to the current orientation pre-multiplied by the transpose of the yaw matrix.

[1] Sola, J., 2017. Quaternion kinematics for the error-state Kalman filter. arXiv preprint arXiv:1711.02508.

[2] Liu, W., Caruso, D., Ilg, E., Dong, J., Mourikis, A.I., Daniilidis, K., Kumar, V. and Engel, J., 2020. TLIO: Tight learned inertial odometry. IEEE Robotics and