View map

Abstract
As large language models (LLMs) such as ChatGPT and Gemini become increasingly integrated into research and operational workflows, a critical question arises: Can these systems be trusted to behave safely, predictably, and reliably under real-world conditions? This talk explores that question through recent findings, including our own, on how AI models behave when confronted with unexpected or adversarial inputs. We begin with a brief introduction to modern language models and their emergent capabilities, followed by an analysis of different types of vulnerabilities observed at both the surface and mechanistic levels. We then delve deeper into model architectures to examine failure modes from a mechanistic perspective, highlighting how internal representations and attention pathways can contribute to security risks. The majority of the discussion will focus on attacks, including prompt injection, multimodal jailbreaks, and targeted perturbations, to reveal the diverse and often subtle vulnerabilities that persist across model families.


Bio
Yue Dong (yuedong.us) is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, Riverside. Her research focuses on natural language processing and machine learning, with a particular emphasis on trustworthy AI and the safety of large language models (LLMs). She is recognized for her work on red teaming vision-language models, which won a Best Paper Award at SoCal NLP 2024 and was spotlighted at ICLR. Dr. Dong has authored over 40 peer-reviewed papers in top-tier venues including ACL, EMNLP, ICML, ICLR, AAAI, ICRA, and CIKM. She co-led a widely attended ACL 2024 tutorial on LLM safety and contributes actively to the research community. She has served as Senior Area Chair for ACL 2025, NAACL 2025 & 2024, EMNLP 2025 & 2024, and IJCNLP-AACL 2023, and as Area Chair for ICLR 2025, ACL 2024 & 2023, and EMNLP 2022 & 2023. She also co-organizes workshops and tutorials on summarization and efficient LLMs at EMNLP, NAACL, and NeurIPS (2021–2024).

Website: https://yuedong.us/

 

Event Details

Please let us know if you require an accommodation in order to participate in this event. Accommodations may include live captioning, ASL interpreters, and/or captioned media and accessible documents from recorded events. At least 5 days in advance is recommended.

University of Pittsburgh Powered by the Localist Community Event Platform © All rights reserved