[cs-talks] TIME CHANGE: Ronghang Hu, IVC Group Talk, 12/2 at 1pm in MCS 148
Harrington, Jacob Walter
jwharrin at bu.edu
Thu Dec 1 14:50:38 EST 2016
Please be aware that the following talk has been changed to 1pm from its previous time of 3pm. Location remains the same at MCS 148.
Object Localization from Natural Language Referential Expressions
Ronghang Hu, Second Year PhD Student, University of California, Berkeley
Friday, December 2nd at 1pm, in MCS 148
Abstract: Great progress has been made on object detection and semantic image segmentation, the tasks of localizing visual entities belonging to a pre-defined set of object categories like "airplane" or "bus". However, the more general and challenging task of localizing entities based on arbitrary natural language expressions remains far from solved. Can we find the region in an image that corresponds to "a young man wearing a blue shirt riding a bicycle" from a crowd of people based on its natural language description, and localize it with a bounding box or a segmentation mask? Or in general, can we perform object localization based on natural language instead of fixed classes?
We provide several approaches to address this problem. (1) Sequence-to-sequence models have been successful on image and video descriptions, and we show that these models can also be adapted to language-based object localization by region-level retrieval. (2) Language-based localization and segmentation can be accomplished end-to-end in either a supervised way with encoding and classification, or in an unsupervised or semi-supervised way by reconstruction (3) By looking into the components of the natural language expressions instead of treating them holistically, we can model the relationships between objects and match the textual components to the visual entities.
This is a joint work with Marcus Rohrbach, Trevor Darrell, Kate Saenko and other collaborators.
Bio: Ronghang Hu is currently a 2nd-year PhD student at UC Berkeley, working with Prof. Trevor Darrell. He obtained B.E. degree from Tsinghua University in 2015. His research interests include computer vision and natural language processing, and in particular how linguistic information can help visual comprehension. Previously, he was a research intern at Institute of Computing Technology, Chinese Academy of Science (ICTCAS) and was advised by Prof. Shiguang Shan and Prof. Ruiping Wang.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cs-talks