| Deep Rigid Instance Scene Flow (Apr 2019) |
4.73% |
| Wei-Chiu Ma, Shenlong Wang, Rui Hu, Yuwen Xiong, Raquel Urtasun In this paper we tackle the problem of scene flow estimation in the context of self-driving. We leverage deep learning techniques as well as strong priors as in our application domain the motion of the scene can be composed by the motion of the robot and the 3D motion of the actors in the scene. We formulate the problem as energy minimization in a deep structured model, which can be solved efficiently in the GPU by unrolling a Gaussian-Newton solver. Our experiments in the challenging KITTI scene flow dataset show that we outperform the state-of-the-art by a very large margin, while being 800 times faster. |
|
| Probabilistic Pixel-Adaptive Refinement Networks (Mar 2020) |
6.06% |
| Anne S. Wannenwetsch, Stefan Roth Encoder-decoder networks have found widespread use in various dense prediction tasks. However, the strong reduction of spatial resolution in the encoder leads to a loss of location information as well as boundary artifacts. To address this, image-adaptive post-processing methods have shown beneficial by leveraging the high-resolution input image(s) as guidance data. We extend such approaches by considering an important orthogonal source of information: the network's confidence in its own predictions. We introduce probabilistic pixel-adaptive convolutions (PPACs), which not only depend on image guidance data for filtering, but also respect the reliability of per-pixel predictions. As such, PPACs allow for image-adaptive smoothing and simultaneously propagating pixels of high confidence into less reliable regions, while respecting object boundaries. We demonstrate their utility in refinement networks for optical flow and semantic segmentation, where PPACs lead to a clear reduction in boundary artifacts. Moreover, our proposed refinement step is able to substantially improve the accuracy on various widely used benchmarks. |
|
| MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask (Mar 2020) |
6.11% |
| Shengyu Zhao, Yilun Sheng, Yue Dong, Eric I-Chao Chang, Yan Xu Feature warping is a core technique in optical flow estimation; however, the ambiguity caused by occluded areas during warping is a major problem that remains unsolved. In this paper, we propose an asymmetric occlusion-aware feature matching module, which can learn a rough occlusion mask that filters useless (occluded) areas immediately after feature warping without any explicit supervision. The proposed module can be easily integrated into end-to-end network architectures and enjoys performance gains while introducing negligible computational cost. The learned occlusion mask can be further fed into a subsequent network cascade with dual feature pyramids with which we achieve state-of-the-art performance. At the time of submission, our method, called MaskFlownet, surpasses all published optical flow methods on the MPI Sintel, KITTI 2012 and 2015 benchmarks. Code is available at https://github.com/microsoft/MaskFlownet. |
|
| Bounding Boxes, Segmentations and Object Coordinates: How Important Is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? (ICCV 2017) |
6.22% |
| Aseem Behl, Omid Hosseini Jafari, Siva Karthik Mustikovela, Hassan Abu Alhaija, Carsten Rother, Andreas Geiger
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission. |
|
| Volumetric Correspondence Networks for Optical Flow (NIPS 2019) |
6.30% |
| Gengshan Yang, Deva Ramanan
Many classic tasks in vision – such as the estimation of optical flow or stereo
disparities – can be cast as dense correspondence matching. Well-known techniques
for doing so make use of a cost volume, typically a 4D tensor of match costs between
all pixels in a 2D image and their potential matches in a 2D search window. State
of-the-art (SOTA) deep networks for flow/stereo make use of such volumetric
representations as internal layers. However, such layers require significant amounts
of memory and compute, making them cumbersome to use in practice. As a
result, SOTA networks also employ various heuristics designed to limit volumetric
processing, leading to limited accuracy and overfitting. Instead, we introduce
several simple modifications that dramatically simplify the use of volumetric
layers - (1) volumetric encoder-decoder architectures that efficiently capture large
receptive fields, (2) multi-channel cost volumes that capture multi-dimensional
notions of pixel similarities, and finally, (3) separable volumetric filtering that
significantly reduces computation and parameters while preserving accuracy. Our
innovations dramatically improve accuracy over SOTA on standard benchmarks
while being significantly easier to work with - training converges in 7X fewer
iterations, and most importantly, our networks generalize across correspondence
tasks. On-the-fly adaptation of search windows allows us to repurpose optical flow
networks for stereo (and vice versa), and can also be used to implement adaptive
networks that increase search window sizes on-demand. |
|
| Upgrading Optical Flow to 3D Scene Flow through Optical Expansion (CVPR 2020) |
6.56% |
| Gengshan Yang, Deva Ramanan
We describe an approach for upgrading 2D optical flow
to 3D scene flow. Our key insight is that dense optical
expansion – which can be reliably inferred from monocular
frame pairs – reveals changes in depth of scene elements,
e.g., things moving closer will get bigger. When integrated
with camera intrinsics, optical expansion can be converted
into a normalized 3D scene flow vectors that provide meaningful
directions of 3D movement, but not their magnitude
(due to an underlying scale ambiguity). Normalized scene
flow can be further “upgraded” to the true 3D scene flow
knowing depth in one frame. We show that dense optical
expansion between two views can be learned from annotated
optical flow maps or unlabeled video sequences, and
applied to a variety of dynamic 3D perception tasks
including optical scene flow, LiDAR scene flow, time-to-collision
estimation and depth estimation, often demonstrating significant
improvement over the prior art. |
|
| ScopeFlow: Dynamic Scene Scoping for Optical Flow (Feb 2020) |
6.82% |
| Aviram Bar-Haim, Lior Wolf We propose to modify the common training protocols of optical flow, leading to sizable accuracy improvements without adding to the computational complexity of the training process. The improvement is based on observing the bias in sampling challenging data that exists in the current training protocol, and improving the sampling process. In addition, we find that both regularization and augmentation should decrease during the training protocol. Using an existing low parameters architecture, the method is ranked first on the MPI Sintel benchmark among all other methods, improving the best two frames method accuracy by more than 10%. The method also surpasses all similar architecture variants by more than 12% and 19.7% on the KITTI benchmarks, achieving the lowest Average End-Point Error on KITTI2012 among two-frame methods, without using extra datasets. |
|
| Cascaded Scene Flow Prediction using Semantic Segmentation (Jul 2017) |
7.14% |
| Zhile Ren, Deqing Sun, Jan Kautz, Erik B. Sudderth Given two consecutive frames from a pair of stereo cameras, 3D scene flow methods simultaneously estimate the 3D geometry and motion of the observed scene. Many existing approaches use superpixels for regularization, but may predict inconsistent shapes and motions inside rigidly moving objects. We instead assume that scenes consist of foreground objects rigidly moving in front of a static background, and use semantic cues to produce pixel-accurate scene flow estimates. Our cascaded classification framework accurately models 3D scenes by iteratively refining semantic segmentation masks, stereo correspondences, 3D rigid motion estimates, and optical flow fields. We evaluate our method on the challenging KITTI autonomous driving benchmark, and show that accounting for the motion of segmented vehicles leads to state-of-the-art performance. |
|
| A Fusion Approach for Multi-Frame Optical Flow Estimation (Oct 2018) |
7.17% |
| Zhile Ren, Orazio Gallo, Deqing Sun, Ming-Hsuan Yang, Erik B. Sudderth, Jan Kautz To date, top-performing optical flow estimation methods only take pairs of consecutive frames into account. While elegant and appealing, the idea of using more than two frames has not yet produced state-of-the-art results. We present a simple, yet effective fusion approach for multi-frame optical flow that benefits from longer-term temporal cues. Our method first warps the optical flow from previous frames to the current, thereby yielding multiple plausible estimates. It then fuses the complementary information carried by these estimates into a new optical flow field. At the time of writing, our method ranks first among published results in the MPI Sintel and KITTI 2015 benchmarks. Our models will be available on https://github.com/NVlabs/PWC-Net. |
|
| Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation (Sep 2018) |
7.90% |
| Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models and training. First, we design a compact but effective CNN model, called PWC-Net, according to simple and well-established principles: pyramidal processing, warping, and cost volume processing. PWC-Net is 17 times smaller in size, 2 times faster in inference, and 11% more accurate on Sintel final than the recent FlowNet2 model. It is the winning entry in the optical flow competition of the robust vision challenge. Next, we experimentally analyze the sources of our performance gains. In particular, we use the same training procedure of PWC-Net to retrain FlowNetC, a sub-network of FlowNet2. The retrained FlowNetC is 56% more accurate on Sintel final than the previously trained one and even 5% more accurate than the FlowNet2 model. We further improve the training procedure and increase the accuracy of PWC-Net on Sintel by 10% and on KITTI 2012 and 2015 by 20%. Our newly trained model parameters and training protocols will be available on https://github.com/NVlabs/PWC-Net |
|
| SelFlow: Self-Supervised Learning of Optical Flow (Apr 2019) |
8.42% |
| Pengpeng Liu, Michael Lyu, Irwin King, Jia Xu We present a self-supervised learning approach for optical flow. Our method distills reliable flow estimations from non-occluded pixels, and uses these predictions as ground truth to learn optical flow for hallucinated occlusions. We further design a simple CNN to utilize temporal information from multiple frames for better flow estimation. These two principles lead to an approach that yields the best performance for unsupervised optical flow learning on the challenging benchmarks including MPI Sintel, KITTI 2012 and 2015. More notably, our self-supervised pre-trained model provides an excellent initialization for supervised fine-tuning. Our fine-tuned models achieve state-of-the-art results on all three datasets. At the time of writing, we achieve EPE=4.26 on the Sintel benchmark, outperforming all submitted methods. |
|
| LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation (May 2018) |
9.38% |
| Tak-Wai Hui, Xiaoou Tang, Chen Change Loy FlowNet2, the state-of-the-art convolutional neural network (CNN) for optical flow estimation, requires over 160M parameters to achieve accurate flow estimation. In this paper we present an alternative network that outperforms FlowNet2 on the challenging Sintel final pass and KITTI benchmarks, while being 30 times smaller in the model size and 1.36 times faster in the running speed. This is made possible by drilling down to architectural details that might have been missed in the current frameworks: (1) We present a more effective flow inference approach at each pyramid level through a lightweight cascaded network. It not only improves flow estimation accuracy through early correction, but also permits seamless incorporation of descriptor matching in our network. (2) We present a novel flow regularization layer to ameliorate the issue of outliers and vague flow boundaries by using a feature-driven local convolution. (3) Our network owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2. Our code and trained models are available at https://github.com/twhui/LiteFlowNet . |
|
| PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume (Sep 2017) |
9.60% |
| Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the cur- rent optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024x436) images. Our models are available on https://github.com/NVlabs/PWC-Net. |
|
| Continual Occlusions and Optical Flow Estimation (Nov 2018) |
10.03% |
| Michal Neoral, Jan Šochman, Jiří Matas Two optical flow estimation problems are addressed: i) occlusion estimation and handling, and ii) estimation from image sequences longer than two frames. The proposed ContinualFlow method estimates occlusions before flow, avoiding the use of flow corrupted by occlusions for their estimation. We show that providing occlusion masks as an additional input to flow estimation improves the standard performance metric by more than 25% on both KITTI and Sintel. As a second contribution, a novel method for incorporating information from past frames into flow estimation is introduced. The previous frame flow serves as an input to occlusion estimation and as a prior in occluded regions, i.e. those without visual correspondences. By continually using the previous frame flow, ContinualFlow performance improves further by 18% on KITTI and 7% on Sintel, achieving top performance on KITTI and Sintel. |
|