Human motion forecasting as forecasting humans' joint locations (pose dynamics) and global locations (trajectories) is an important topic due to prominent demand in various artificial intelligence applications such as self-driving cars, healthcare, assistant robots, detection of perilous behavioral patterns in surveillance systems, etc. While this problem sounds interesting, it is extremely challenging in real-world scenes due to the many factors involved. Humans are intuitively social agents, able to effortlessly conceive a detailed level of semantics from the scene, which contributes to making swift decisions for their next movements. To accurately forecast their trajectory and pose dynamics, a model must learn how their movements are influenced by surrounding people and objects and how small changes in the body's pose relate to the person's movement in global space.
In the past few years, many methods have been developed that model some of these challenges to address a part of the problem, e.g., considering social interactions to better predict humans' future paths or human-object interactions for more accurate pose forecasting. However, these methods have mainly focused on one aspect of the problem only, e.g., trajectory forecasting or human pose prediction, and do not consider all the practical challenges, mainly because there was not a proper real-world dataset and benchmark that were tailored for both problems jointly. Thus, the existence of an available benchmark that can bridge this gap and can unify these two research directions will be advantageous for the research community.
In this workshop, we aim to promote this new line of work, i.e., predicting future human trajectories and skeleton pose dynamics jointly. We also target to introduce our new benchmark dataset and challenge including new practical settings for the problem by re-purposing existing datasets and introducing sensible metrics. We create a unified extensive evaluation system to ensure a fair comparison between different approaches, enabling participants to compete in solving the real world challenges for the problem.
The goal of the SoMoF benchmark is to predict future human trajectories and skeleton poses using information about the surrounding scene and the other humans involved. Participants are given video of the scene and labeled trajectories and poses up to some time t and must predict trajectories and poses for all individuals through some time t+T. Our data contains various challenging real-world scenarios, such as different levels of social (human-human) interactions, environmental (human-object) interactions, occluded joints, and people leaving the scene.
Submissions to the challenge will be evaluated by submitting to the SoMoF benchmark website. The top submissions will be selected based on their average performance on the SoMoF leaderboards using the VIM metric on the 3DPW dataset and the VAM metric on the PoseTrack dataset as described on the website. The winners of the challenge have an opportunity to present their work as a spotlight and poster presentation during the workshop. Each challenge submission should be followed by an extended abstract or full paper submission via our CMT webpage (see details below) or a link to an existing preprint/publication.
Start of the public benchmark: May 1 00:00 PST
Close of the public benchmark: July 29 23:59 PST
(The benchmark will continue to be available for evaluation after this date, but the results will not be considered for the workshop.)
We invite researchers to submit their papers addressing topics related to human motion and pose dynamics. Relevant topics include, but are not limited to:
Submissions can follow the ICCV format (4-8 double-column pages excluding references) with the submission deadline of August 2 or extended abstract (1 page, double-column excluding references) with the submission deadline of August 30. Accepted papers have the opportunity to be presented as a poster during the workshop. However, only full papers in ICCV format will appear in the proceedings. By submitting to this workshop, the authors agree to the review process and understand that we will do our best to match papers to the best possible reviewers. The reviewing process is double-blind. Submission to the challenge is independent of the paper submission, but we encourage the authors to submit to the challenge.
Paper submission deadline: August 2 23:59 PST
Paper decisions announced: August 10 23:59 PST
Camera-ready submission deadline: August 16 23:59 PST
Extended abstract submission deadline: August 30 23:59 PST
Extended abstract decisions announced: September 10 23:59 PST
|Start Time||End Time||Description|
|8:40||9:10||[Recording] Rita Cucchiara, From synthetic to real data in motion prediction|
|9:10||9:40||[Recording] Siyu Tang, Neural bodies and hands|
|9:40||10:00||[Recording] Challenge winners presentation|
|10:00||10:30||Coffee break and poster session|
|10:30||11:00||[Recording] Kris Kitani, Diverse and physically-plausible motion prediction|
|11:00||11:20||[Recording] Dataset and challenge|
|11:20||11:50||[Recording] Marco Pavone, Safe, interaction-aware decision making and control|
|11:50||12:20||[Recording] Angjoo Kanazawa, Learning to dance! Music conditioned 3D human motion generation|
The challenge winners are Chenxi Wang, Yunfeng Wang, Zixuan Huang, and Zhiwen Chen with the paper "Simple Baseline for Single Human Motion Forecasting." Congratulations!
|Nikos Athanasiou||Max Planck Institute for Intelligent Systems|
|Shyamal Buch||Stanford University|
|Mahsa Ehsanpour||University of Adelaide|
|Harshayu Girase||University of California, Berkeley|
|De-An Huang||Stanford University|
|Boris Ivanovic||Stanford University|
|Jingwei Ji||Stanford University|
|Karttikeya Mangalam||University of California, Berkeley|
|Edwin Pan||Stanford University|
|Nathan Tsoi||Yale University|
|Michael Wray||University of Bristol|
|Ye Yuan||Carnegie Mellon University|