View map

Abstract: In this talk I will provide an overview of how the field of computer vision has been impacted by the recent success of large multimodal models and what are some of the opportunities to build more sophisticated models that can work for open ended tasks. Particularly, I will discuss some of our recent work on adapting multimodal models for open vocabulary visual grounding through limited extra annotated data, self-consistency regularization and synthetic data. I will also describe our work on adapting large multimodal models for fine-grained question answering, step-by-step reasoning and text-to-image generation. Fine-grained question answering requires optimizing for correctness in a setup where incorrect answers are very similar to correct answers. Our work demonstrates the importance of verifying and reinforcing correct answers both during training or inference. Text-to-image generation is a complementary capability that is typically optimized independently for models trained for image-to-text generation. Our work shows how some large pre-trained models can be used as composable modules to achieve a single framework for both generation and understanding.

Bio: Vicente Ordóñez-Román is an Associate Professor in the Department of Computer Science at Rice University and Visiting Academic at the Amazon AGI Foundations team. He is also affiliated with the Ken Kennedy Institute at Rice University. His research interests lie at the intersection of computer vision, natural language processing and machine learning. He is a recipient of a Best Paper Award at the conference on Empirical Methods in Natural Language Processing (EMNLP) in 2017 and the Best Paper Award -- Marr Prize at the International Conference on Computer Vision (ICCV) in 2013. He has also been the recipient of an NSF CAREER Award, an IBM Faculty Award, a Google Faculty Research Award, a Google Award for Inclusion Research and a Facebook Research Award. Vicente obtained his PhD from the University of North Carolina at Chapel Hill, and has also been a visiting researcher at the Allen Institute for Artificial Intelligence and a visiting professor at Adobe Research.

Website:
https://www.cs.rice.edu/~vo9/

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved