header

Welcome


bdrp


Welcome to the Computer Vision Group at RWTH Aachen University!

The Computer Vision group has been established at RWTH Aachen University in context with the Cluster of Excellence "UMIC - Ultra High-Speed Mobile Information and Communication" and is associated with the Chair Computer Sciences 8 - Computer Graphics, Computer Vision, and Multimedia. The group focuses on computer vision applications for mobile devices and robotic or automotive platforms. Our main research areas are visual object recognition, tracking, self-localization, 3D reconstruction, and in particular combinations between those topics.

We offer lectures and seminars about computer vision and machine learning.

You can browse through all our publications and the projects we are working on.

News

CVPR'22

We have two papers accepted at the Conference on Computer Vision and Pattern Recognition (CVPR) 2022. Both are selected for oral presentations! Check them out:

March 30, 2022

3DV'21

We have one paper accepted at the International Conference on 3D Vision (3DV) 2021:

Oct. 11, 2021

CVPR'21

Our work on 3D multi-object reconstruction from a single image was accepted at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021. Check it out:

June 12, 2021

IJCV'20

We are excited to share that our paper HOTA: A Higher Order Metric for Evaluating Multi-object Tracking has been accepted for publication in the International Journal of Computer Vision (IJCV'20).

Nov. 3, 2020

WACV'21

We have one paper accepted at the 2021 Winter Conference on Applications of Computer Vision (WACV ’21)

Nov. 2, 2020

We won the ECCV2020 "3D Poses in the Wild" Challenge!

See our MeTRAbs paper, accepted for publication in the IEEE T-BIOM special journal issue "Selected Best works on Automatic Face and Gesture Recognition 2020" for our approach and check out the code on GitHub.

Aug. 23, 2020

Recent Publications

pubimg
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Oral)

Existing state-of-the-art methods for Video Object Segmentation (VOS) learn low-level pixel-to-pixel correspondences between frames to propagate object masks across video. This requires a large amount of densely annotated video data, which is costly to annotate, and largely redundant since frames within a video are highly correlated. In light of this, we propose HODOR: a novel method that tackles VOS by effectively leveraging annotated static images for understanding object appearance and scene context. We encode object instances and scene information from an image frame into robust high-level descriptors which can then be used to re-segment those objects in different frames. As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks compared to existing methods trained without video annotations. Without any architectural modification, HODOR can also learn from video context around single annotated video frames by utilizing cyclic consistency, whereas other methods rely on dense, temporally consistent annotations.

fadeout
 
pubimg
Opening up Open World Tracking

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 (Oral)

Tracking and detecting any object, including ones never-seen-before during model training, is a crucial but elusive capability of autonomous systems. An autonomous agent that is blind to never-seen-before objects poses a safety hazard when operating in the real world and yet this is how almost all current systems work. One of the main obstacles towards advancing tracking any object is that this task is notoriously difficult to evaluate. A benchmark that would allow us to perform an apples-to-apples comparison of existing efforts is a crucial first step towards advancing this important research field. This paper addresses this evaluation deficit and lays out the landscape and evaluation methodology for detecting and tracking both known and unknown objects in the open-world setting. We propose a new benchmark, TAO-OW: Tracking Any Object in an Open World}, analyze existing efforts in multi-object tracking, and construct a baseline for this task while highlighting future challenges. We hope to open a new front in multi-object tracking research that will hopefully bring us a step closer to intelligent systems that can operate safely in the real world.

fadeout
 
pubimg
Mix3D: Out-of-Context Data Augmentation for 3D Scenes

International Conference on 3D Vision (3DV) 2021 (Oral)

Mix3D is a data augmentation technique for segmenting large-scale 3D scenes. Since scene context helps reasoning about object semantics, current works focus on models with large capacity and receptive fields that can fully capture the global context of an input 3D scene. However, strong contextual priors can have detrimental implications like mistaking a pedestrian crossing the street for a car. In this work, we focus on the importance of balancing global scene context and local geometry, with the goal of generalizing beyond the contextual priors in the training set. In particular, we propose a "mixing" technique which creates new training samples by combining two augmented scenes. By doing so, object instances are implicitly placed into novel out-of-context environments and therefore making it harder for models to rely on scene context alone, and instead infer semantics from local structure as well. In the paper, we perform detailed analysis to understand the importance of global context, local structures and the effect of mixing scenes. In experiments, we show that models trained with Mix3D profit from a significant performance boost on indoor (ScanNet, S3DIS) and outdoor datasets (SemanticKITTI). Mix3D can be trivially used with any existing method, e.g., trained with Mix3D, MinkowskiNet outperforms all prior state-of-the-art methods by a significant margin on the ScanNet test benchmark 78.1 mIoU.

fadeout
Disclaimer Home Visual Computing institute RWTH Aachen University