Long Term Motion Prediction Using Keyposes

Sena Kiciroglu¹, Wei Wang^1,2, Mathieu Salzmann^1,3, Pascal Fua¹

¹ CVLab, EPFL, Switzerland
² MHUG, University of Trento, Italy
³ Clearspace, Switzerland

arXiv

Code

Video

Abstract

Long term human motion prediction is essential in safety-critical applications such as human-robot interaction and autonomous driving. In this paper we show that to achieve long term forecasting, predicting human pose at every time instant is unnecessary. Instead, it is more effective to predict a few keyposes and approximate intermediate ones by interpolating the keyposes.

We demonstrate that our approach enables us to predict realistic motions for up to 5 seconds in the future, which is far longer than the typical 1 second encountered in the literature. Furthermore, because we model future keyposes probabilistically, we can generate multiple plausible future motions by sampling at inference time. Over this extended time period, our predictions are more realistic, more diverse and better preserve the motion dynamics than those state-of-the-art methods yield.

Long Term Motion Prediction

Motion prediction is an essential component in safety-critical applications, such as human-robot interaction and autonomous driving. Most existing methods have accurate pose predictions of up to 1 second. Our work extends this time horizon to 5 seconds, by following a different approach. Instead of regressing a pose at every future timestep, we aim to predict the next "keypose" in the sequence via a classification scheme.

Keyposes

Figure: Distribution of keyposes in sequence. Plots show the x, y, and z coordinates of a single joint across time.

Human motion follows patterns that are well-represented by a few essential poses in the sequence. We call such poses "keyposes". By interpolating between the keyposes, we can reconstruct the original sequence. Therefore, it is sufficient to only predict the keyposes in the sequence. We extract the keyposes by determining the poses which yield minimum L2 error when used to reconstruct the original sequence.

Predicting Keyposes

We cluster and label the keyposes and turn motion prediction into a classification problem. This shifts the focus on the transition from one keypose label to another and avoid accumulating errors. We have designed a GRU based framework for keypose prediction. Moreover, by sampling the predicted logits during inference we can generate diverse future motions.

BibTeX

If you find our work useful, please cite it as: @inproceedings{kiciroglu2022keyposes, author = {Kiciroglu, Sena and Wang, Wei and Salzmann, Mathieu and Fua, Pascal}, booktitle = {3DV}, title = {Long Term Motion Prediction Using Keyposes}, year = {2022} }