CVPR 2020 Workshop On Towards Human-Centric Image/Video Synthesis, and the 4th Look Into Person (LIP) Challenge (original) (raw)


Time Schedule
Location: Date: Friday, 19 June 2020 from 13:20 pm PDT to 18:30 pm PDT. (All times are Pacific Daylight Time, Seattle time).
13:20-13:40 Opening remarks and best paper talk [YouTube Video1] [Bilibili Video1] [YouTube Video2]
13:40-14:20 Invited talk 1: Ira Kemelmacher-Shlizerman, Associate Professor, University of Washington [YouTube] [Bilibili]Talk title: Human Modeling and Synthesis
14:20:15:00 Invited talk 2: William T. Freeman, Professor, MIT [YouTube] [Bilibili]Talk title: Learning from videos playing forwards, backwards, fast, and slowShow Abstract
Abstract: How can we tell that a video is playing backwards? People's motions look wrong when the video is played backwards--can we develop an algorithm to distinguish forward from backward video? Similarly, can we tell if a video is sped-up? We have developed algorithms to distinguish forwards from backwards video, and fast from slow. Training algorithms for these tasks provides a self-supervised task that facilitates human activity recognition. We'll show these results, and applications of these unsupervised video learning tasks. Joint work with: Donglai Wei, Joseph Lim, Andrew Zisserman, Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, Michael Rubinstein, Michal Irani, Tali Dekel
15:00-15:15 Winner talk 1: Winner of the Multi-Person Human Parsing Challenge [YouTube] [Bilibili] [Slide]
15:15-15:30 Winner talk 2: Winner of the Video Multi-Person Human Parsing Challenge [YouTube] [Bilibili] [Slide]
15:30-16:10 Invited talk 3: Ming-Hsuan Yang, Professor, University of California at Merced [YouTube] [Bilibili]Talk title: Synthesizing Human Images in 2D and 3D ScenesShow Abstract
Abstract: In this talk, I will present our recent results on synthesizing human images in 2D and 3D scenes. In the first part, I will present a context-aware approach to synthesize and place object instances in an image with semantically coherent contents. In the second part, I will describe a method to synthesizing 3D humans with varying pose in indoors in an image by inferring 3D layout and context. When time allows, I will also present an algorithm to model music-to-dance generation process for synthesizing realistic, diverse, style-consistent, and beat-matching dances from music.
16:10-16:50 Invited talk 4: Jun-Yan Zhu, Assistant Professor, Carnegie Mellon University [YouTube] [Bilibili]Talk title: Visualizing and Understanding GANsShow Abstract
Abstract: Generative Adversarial Networks (GANs) have recently achieved impressive results for a wide range of real-world applications, and many GAN variants have emerged with improvements in sample quality and training stability. However, they have not been well visualized or understood. How does a GAN represent our visual world internally? What causes the artifacts in GAN results? How do architectural choices affect GAN learning? Answering such questions could enable us to develop new insights and better models. In this talk, I will present several analytic tools to visualize and understand GANs at the unit-, object-, and scene-level. Collectively, these tools highlight what a GAN has learned and has not. We show several practical applications enabled by our method, from comparing internal representations across different layers, models, and datasets, to improving GANs by locating and removing artifact-causing units, to interactively manipulating objects in a real image.
16:50-17:05 Winner talk 3: Winner of the Image-based Multi-pose Virtual Try-on Challenge [YouTube] [Bilibili] [Slide]
17:05-17:20 Winner talk 4: Winner of the Video Virtual Try-on Challenge [YouTube] [Bilibili] [Slide]
17:20-17:35 Winner talk 5: Winner of the Dark Complexion Portrait Segmentation Challenge [YouTube] [Bilibili] [Slide]
17:35-18:30 Oral: Epipolar Transformer for Multi-view Human Pose Estimation.
17:35-18:30 Oral: Yoga-82: A New Dataset for Fine-grained Classification of Human Poses.
17:35-18:30 Oral: The MTA Dataset for Multi Target Multi Camera Pedestrian Tracking by Weighted Distance Aggregation.
17:35-18:30 Poster: LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking.
17:35-18:30 Poster: Fine grained pointing recognition for natural drone guidance.
17:35-18:30 Poster: Reposing Humans by Warping 3D Features.






img

Zhenyu Xie

xiezhy6@mail2.sysu.edu.cn