The goal of the SoMoF benchmark is to predict future human trajectories and skeleton poses using information about the surrounding scene and the other humans involved. Participants are given video of the scene and labeled trajectories and poses up to some time t and must predict trajectories and poses for all individuals through some time t+T.
The above videos show examples of the problem input and output. For each 16-frame input video, the locations of all body joints of each person in each frame are provided. The video is also provided as input for incorporating scene context into the model.
The model output should be the predicted locations of all body joints for each person in each of the next 14 video frames, which are not seen by the model.
To participate in the SoMoF benchmark:
If you use the SoMoF benchmark in your work, please cite the following publications:
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild
Adeli, V. and Ehsanpour, M. and Reid, I. and Niebles, J.C. and Savarese, S. and Adeli, E. and Rezatofighi, H.
IEEE International Conference on Computer Vision (ICCV21)
Socially and Contextually Aware Human Motion and Pose Forecasting
Adeli, V. and Adeli, E. and Reid, I. and Niebles, J.C. and Rezatofighi, H.
RA-L and IROS 2020