1st Workshop, Benchmark and Challenge on Human Trajectory and Pose Dynamics Forecasting in the Wild (ICCV 2021)

Human motion forecasting as forecasting humans' joint locations (pose dynamics) and global locations (trajectories) is an important topic due to prominent demand in various artificial intelligence applications such as self-driving cars, healthcare, assistant robots, detection of perilous behavioral patterns in surveillance systems, etc. While this problem sounds interesting, it is extremely challenging in real-world scenes due to the many factors involved. Humans are intuitively social agents, able to effortlessly conceive a detailed level of semantics from the scene, which contributes to making swift decisions for their next movements. To accurately forecast their trajectory and pose dynamics, a model must learn how their movements are influenced by surrounding people and objects and how small changes in the body's pose relate to the person's movement in global space.

In the past few years, many methods have been developed that model some of these challenges to address a part of the problem, e.g., considering social interactions to better predict humans' future paths or human-object interactions for more accurate pose forecasting. However, these methods have mainly focused on one aspect of the problem only, e.g., trajectory forecasting or human pose prediction, and do not consider all the practical challenges, mainly because there was not a proper real-world dataset and benchmark that were tailored for both problems jointly. Thus, the existence of an available benchmark that can bridge this gap and can unify these two research directions will be advantageous for the research community.

In this workshop, we aim to promote this new line of work, i.e., predicting future human trajectories and skeleton pose dynamics jointly. We also target to introduce our new benchmark dataset and challenge including new practical settings for the problem by re-purposing existing datasets and introducing sensible metrics. We create a unified extensive evaluation system to ensure a fair comparison between different approaches, enabling participants to compete in solving the real world challenges for the problem.


The goal of the SoMoF benchmark is to predict future human trajectories and skeleton poses using information about the surrounding scene and the other humans involved. Participants are given video of the scene and labeled trajectories and poses up to some time t and must predict trajectories and poses for all individuals through some time t+T. Our data contains various challenging real-world scenarios, such as different levels of social (human-human) interactions, environmental (human-object) interactions, occluded joints, and people leaving the scene.


Submissions to the challenge will be evaluated by submitting to the SoMoF benchmark website. The top submissions will be selected based on their average performance on the SoMoF leaderboards using the VIM metric on the 3DPW dataset and the VAM metric on the PoseTrack dataset as described on the website. The winners of the challenge have an opportunity to present their work as a spotlight and poster presentation during the workshop. Each challenge submission should be followed by an extended abstract or full paper submission via our CMT webpage (see details below) or a link to an existing preprint/publication.

Important dates:
Start of the public benchmark: May 1 00:00 PST
Close of the public benchmark: July 29 23:59 PST
(The benchmark will continue to be available for evaluation after this date, but the results will not be considered for the workshop.)

Call for Papers

We invite researchers to submit their papers addressing topics related to human motion and pose dynamics. Relevant topics include, but are not limited to:

  • Human body skeleton pose forecasting
  • Human trajectory forecasting
  • Social motion and body joint prediction
  • Human intention prediction
  • Individual, group and social activity recognition and forecasting from skeletal data
  • Visual surveillance and abnormal activity recognition and forecasting
  • Human-robot interaction considering predictions
  • Visual scene prediction
  • Human walking behaviour analysis
  • Visual and social navigation in crowded scenes
  • Comprehensive video understanding and prediction in-the-wild
  • Action anticipation
  • Early action prediction
  • Prediction in multiple levels of abstraction
  • New metrics for motion forecasting

Submissions can follow the ICCV format (4-8 double-column pages excluding references) with the submission deadline of August 2 or extended abstract (1 page, double-column excluding references) with the submission deadline of August 30. Accepted papers have the opportunity to be presented as a poster during the workshop. However, only full papers in ICCV format will appear in the proceedings. By submitting to this workshop, the authors agree to the review process and understand that we will do our best to match papers to the best possible reviewers. The reviewing process is double-blind. Submission to the challenge is independent of the paper submission, but we encourage the authors to submit to the challenge.

Important dates:
Paper submission deadline: August 2 23:59 PST
Paper decisions announced: August 10 23:59 PST
Camera-ready submission deadline: August 16 23:59 PST
Extended abstract submission deadline: August 30 23:59 PST
Extended abstract decisions announced: September 10 23:59 PST

Submissions can be made here. If you have any questions about submitting, please contact us here.

Program (October 16, all times are US Eastern Time)

Start Time End Time Description
8:30 8:40 Introduction
8:40 9:10 [Recording] Rita Cucchiara, From synthetic to real data in motion prediction
9:10 9:40 [Recording] Siyu Tang, Neural bodies and hands
9:40 10:00 [Recording] Challenge winners presentation
10:00 10:30 Coffee break and poster session
10:30 11:00 [Recording] Kris Kitani, Diverse and physically-plausible motion prediction
11:00 11:20 [Recording] Dataset and challenge
11:20 11:50 [Recording] Marco Pavone, Safe, interaction-aware decision making and control
11:50 12:20 [Recording] Angjoo Kanazawa, Learning to dance! Music conditioned 3D human motion generation
12:20 12:30 Closing remarks


Angjoo Kanazawa

University of California, Berkeley

Kris Kitani

Carnegie Mellon University

Siyu Tang

ETH Z├╝rich

Rita Cucchiara


Marco Pavone

Stanford University, NVIDIA

Accepted Papers

  • Chenxi Wang, Yunfeng Wang, Zixuan Huang, Zhiwen Chen, "Simple Baseline for Single Human Motion Forecasting", ICCV SoMoF Workshop, 2021. [Paper] [Poster]
  • Daiheng Gao, Bang Zhang, Qi Wang, Xindi Zhang, Pan Pan, Yinghui Xu, "SCAT: Stride Consistency with Auto-regressive regressor and Transformer for hand pose estimation", ICCV SoMoF Workshop, 2021. [Paper] [Poster] [Video]
  • Angel Martinez Gonzalez, Michael Villamizar, Jean-Marc Odobez, "Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers", ICCV SoMoF Workshop, 2021. [Paper] [Poster] [Video]
  • Yusheng Peng, Gaofeng Zhang, Xiangyu Li, Liping Zheng, "STIRNet: A Spatial-temporal Interaction-aware Recursive Network for Human Trajectory Prediction", ICCV SoMoF Workshop, 2021. [Paper] [Poster] [Video]
  • Behnam Parsaeifard, Saeed Saadatnejad, Yuejiang Liu, Taylor Mordan, Alexandre Alahi, "Learning Decoupled Representations for Human Pose Forecasting", ICCV SoMoF Workshop, 2021. [Paper] [Poster] [Video]
  • Ankur Singh, Upendra Suddamalla, "Multi-Input Fusion for Practical Pedestrian Intention Prediction", ICCV SoMoF Workshop, 2021. [Paper] [Poster] [Video]

Accepted Extended Abstracts

  • Jiashun Wang, Huazhe Xu, Medhini Narasimhan, Xiaolong Wang, "Multi-Person 3D Motion Prediction with Multi-Range Transformers", ICCV SoMoF Workshop, 2021 (Extended Abstract). [Paper] [Poster] [Video]
  • Armin Saadat, Nima Fathi, Saeed Saadatnejad, Alexandre Alahi, "Towards Human Pose Prediction using the Encoder-Decoder LSTM", ICCV SoMoF Workshop, 2021 (Extended Abstract). [Paper] [Poster] [Video]

Challenge Winners

The challenge winners are Chenxi Wang, Yunfeng Wang, Zixuan Huang, and Zhiwen Chen with the paper "Simple Baseline for Single Human Motion Forecasting." Congratulations!

Program Committee

Name Affiliation
Nikos Athanasiou Max Planck Institute for Intelligent Systems
Shyamal Buch Stanford University
Hsu-Kuang Chiu Waymo
Mahsa Ehsanpour University of Adelaide
Mohsen Fayyaz Microsoft
Harshayu Girase University of California, Berkeley
De-An Huang Stanford University
Boris Ivanovic Stanford University
Jingwei Ji Stanford University
Vineet Kosaraju OpenAI
Karttikeya Mangalam University of California, Berkeley
Edwin Pan Stanford University
Nathan Tsoi Yale University
Michael Wray University of Bristol
Ye Yuan Carnegie Mellon University


Andrew Sharp

Stanford University

Vida Adeli

University of Toronto

Juan Carlos Niebles

Stanford University

Ehsan Adeli

Stanford University

Silvio Savarese

Stanford University

Hamid Rezatofighi

Monash University, Stanford University