Profile

M.Sc. Markus Knoche
Room 127
Email: knoche (at) vision.rwth-aachen.de

[GitHub] [Google Scholar]

Publications

Towards Metric-Agnostic Trajectory Forecasting

Markus Knoche, Daan de Geus, Bastian Leibe

European Conference on Computer Vision (ECCV) 2026

Accurate trajectory forecasting of surrounding traffic participants is a core capability for autonomous driving, enabling vehicles to anticipate behavior and plan safe maneuvers. We observe that current state-of-the-art forecasting models on Argoverse 2 and the Waymo Open Motion Dataset tailor their training objectives to the different benchmark metrics. Because these metrics encourage conflicting behavior, we propose a paradigm change for trajectory forecasting: training models with metric-agnostic probabilistic objectives and treating metric optimization as a downstream task applied to the predictive distribution. Concretely, we introduce Trajectory Distribution Evaluation (TraDiE) policies, metric-specific policies that map a predictive distribution to the set of K trajectories and confidences required by trajectory forecasting metrics. We evaluate this framework by introducing DONUT-NLL, which adapts the training objective of the state-of-the-art trajectory forecasting model DONUT to directly optimize the predictive distribution. Using our policies, DONUT-NLL achieves state-of-the-art results on all metrics of the Waymo motion prediction benchmark.

» Show BibTeX

@inproceedings{knoche2026tradie,
title = {{Towards Metric-Agnostic Trajectory Forecasting}},
author = {Knoche, Markus and de Geus, Daan and Leibe, Bastian},
booktitle = {ECCV},
year = {2026}
}

Downloads: arXiv Project Page Github

DONUT: A Decoder-Only Model for Trajectory Prediction

Markus Knoche, Daan de Geus, Bastian Leibe

International Conference on Computer Vision (ICCV) 2025

Predicting the motion of other agents in a scene is highly relevant for autonomous driving, as it allows a self-driving car to anticipate. Inspired by the success of decoder-only models for language modeling, we propose DONUT, a Decoder-Only Network for Unrolling Trajectories. Different from existing encoder-decoder forecasting models, we encode historical trajectories and predict future trajectories with a single autoregressive model. This allows the model to make iterative predictions in a consistent manner, and ensures that the model is always provided with up-to-date information, enhancing the performance. Furthermore, inspired by multi-token prediction for language modeling, we introduce an 'overprediction' strategy that gives the network the auxiliary task of predicting trajectories at longer temporal horizons. This allows the model to better anticipate the future, and further improves the performance. With experiments, we demonstrate that our decoder-only approach outperforms the encoder-decoder baseline, and achieves new state-of-the-art results on the Argoverse 2 single-agent motion forecasting benchmark.

» Show BibTeX

@article{knoche2025donut,
title = {{DONUT: A Decoder-Only Model for Trajectory Prediction}},
author = {Knoche, Markus and de Geus, Daan and Leibe, Bastian},
journal = {arXiv preprint arXiv:2506.06854},
year = {2025}
}

Downloads: arXiv Project Page Github YouTube

Reposing Humans by Warping 3D Features

Markus Knoche, István Sárándi, Bastian Leibe

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2020

We address the problem of reposing an image of a human into any desired novel pose. This conditional image-generation task requires reasoning about the 3D structure of the human, including self-occluded body parts. Most prior works are either based on 2D representations or require fitting and manipulating an explicit 3D body mesh. Based on the recent success in deep learning-based volumetric representations, we propose to implicitly learn a dense feature volume from human images, which lends itself to simple and intuitive manipulation through explicit geometric warping. Once the latent feature volume is warped according to the desired pose change, the volume is mapped back to RGB space by a convolutional decoder. Our state-of-the-art results on the DeepFashion and the iPER benchmarks indicate that dense volumetric human representations are worth investigating in more detail.

» Show Videos
» Show BibTeX

@inproceedings{Knoche20reposing,
author = {Markus Knoche and Istv\'an S\'ar\'andi and Bastian Leibe},
title = {Reposing Humans by Warping {3D} Features},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
year = {2020}
}

Downloads: Code (GitHub) Paper (PDF) Supplementary (PDF) arXiv