Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions

CVPR, 2022

Abstract

The estimation of the relative pose of two camera views is a fundamental problem in computer vision. Kneip et al. proposed to solve this problem by introducing the NEC. However, their approach does not take into account uncertainties, so that the accuracy of the estimated relative pose is highly dependent on accurate feature positions in the target frame. In this work, we introduce the PNEC that overcomes this limitation by accounting for anisotropic and inhomogeneous uncertainties in the feature positions. To this end, we propose a novel objective function, along with an efficient optimization scheme that effectively minimizes our objective while maintaining real-time performance. In experiments on synthetic data, we demonstrate that the novel PNEC yields more accurate rotation estimates than the original NEC and several popular relative rotation estimation algorithms. Furthermore, we integrate the proposed method into a state-of-the-art monocular rotation-only odometry system and achieve consistently improved results for the real-world KITTI dataset.

Incooporating Uncertainty into Pose Estimation

The NEC allows for relative pose estimation by enforcing the coplanarity of \emph{epipolar plane normal vectors} constructed from feature correspondences. An energy function to minimize the NEC can be written as: $$ E(\boldsymbol{R}, \boldsymbol{t}) = \sum_i e_i^2 = \sum_i | \boldsymbol{t}^\top (\boldsymbol{f}_i \times \boldsymbol{R} \boldsymbol{f}^\prime_i) |^2 $$ The PNEC extends the NEC by incorporating uncertainty. To be more specific, the PNEC allows the use of the anisotropic and inhomogeneous nature of the uncertainty of the feature position in the energy function. We assume that the position error follows a 2D Gaussian distribution in the image plane with a known covariance matrix $\boldsymbol{\Sigma}_{2\text{D},i}$ per feature.
Given the 2D covariance matrix of the feature position in the target frame $\boldsymbol{\Sigma}_{2\text{D},i}$, we propagate it through the unprojection function using the unscented transform in order to obtain the 3D covariance matrix $\boldsymbol{\Sigma}_i$ of the bearing vector $\boldsymbol{f}^\prime_i$. Using the unscented transform ensures full-rank covariance matrices after the transform. We derive the details of the unscented transform in the supplementary material and show qualitative examples. Propagating this distribution to the normalized epipolar error gives the probabilistic distribution of the residual. Due to the linearity of the transformations, the distribution of the residual is a univariate Gaussian distribution $\mathcal{N}(0, \sigma_i^2)$, with variance $$ \sigma_i^2(\boldsymbol{R}, \boldsymbol{t}) = \boldsymbol{t}^\top \hat{\boldsymbol{f}_i} \boldsymbol{R} \boldsymbol{\Sigma}_i \boldsymbol{R}^\top \hat{\boldsymbol{f}_i}{}^\top \boldsymbol{t} $$ $$ E_P(\boldsymbol{R}, \boldsymbol{t}) = \sum_i \frac{e_i^2}{\sigma_i^2} = \sum_i \frac{| \boldsymbol{t}^\top (\boldsymbol{f}_i \times \boldsymbol{R} \boldsymbol{f}^\prime_i) |^2}{\boldsymbol{t}^\top \hat{\boldsymbol{f}_i} \boldsymbol{R} \boldsymbol{\Sigma}_i \boldsymbol{R}^\top \hat{\boldsymbol{f}_i}{}^\top \boldsymbol{t}} $$

Sythetic Experiments

With the simulated experiments we evaluate the performance of the PNEC in a frame-to-frame setting. The experiments consist of randomly generated problems of two frames with known correspondences. The simulated experiments show the benefit of incorporating uncertainty into the optimization.

Omnidirectional Cameras. Rotation and translation error for anisotropic and inhomogeneous noise

The figures shows the rotational and translational error for experiment with omni-directional camera setups for anisotropic inhomogeneous noise for both experiments. The PNEC achieves consistently better results for the rotation over all noise levels.

Pinhole Cameras. Rotation and translation error for anisotropic and inhomogeneous noise

The figures shows the rotational and translational error for experiment with pinhole cameras camera setups for anisotropic inhomogeneous noise for both experiments. The PNEC achieves consistently better results for the rotation over all noise levels.
Please refer to the paper and supplementary material for a more detailed evaluation on synthetic experiments.

KITTI Evaluation

Trajetory. KITTI seq. 07 and 08. Visualization uses the ground truth translation as we do monocular pose estimation and focus on rotation estimation.

The PNEC also improves on non-probabilistic monocular visual odometry methods that use the NEC. Please refer to the paper for a more detailed evaluation of the PNEC on the visual odometry task on KITTI.

Citation