Alameda-Pineda; Xavier Alameda-Pineda (LJK & INRIA Grenoble/France)

Multi-speaker tracking with robotic platforms by exploiting audio, video and robot servoing

Abstract: Multi-person tracking is one of the cornerstone tasks in human-robot interaction. Although thoroughly investigated in computer vision and signal processing, tracking a time-varying number of persons remains a challenging open problem. In this context, exploiting auditory and visual information is gratifying and challenging at the same time. Gratifying because the complementary nature of auditory and visual information allows us to be more robust against noise and outliers than uni-modal approaches. Challenging because how to properly fuse auditory and visual information for multi-speaker tracking is far from being a solved question. Furthermore, in the particular case of robotic platforms (or more generally autonomous moving systems), it is unclear how to servoe the robot and perform multi-speaker tracking at the same time. In this talk, I will describe a recent framework for multi-person tracking based on variational Bayes techniques. Secondly, I will describe how to exploit this framework for visual-servoing and multiple speaker tracking. Finally, I will present a recent study on audio-visual multi-person tracking. Current limitations and future research guidelines will be discussed.

Bio: Xavier Alameda-Pineda received M.Sc. in mathematics ('08), in telecommunications ('09) and in computer science ('10). He worked towards his Ph.D. in mathematics and computer science in the Perception Team at INRIA, and obtained it from Universite Joseph Fourier in 2013. He was a post-doctoral fellow at University of Trento and he currently holds a position of research scientist at INRIA. His research interests are multimodal machine learning and signal processing for behavior and scene analysis with applications to human-robot interaction.