Yige Li

Research Scientist
Singapore Management University

Email / Google Scholar / GitHub / Linkedin

About Me

I welcome collaborations with researchers and students working on Trustworthy AI, LLM security, and agentic AI safety. Feel free to contact me for potential research discussions.

I am a research fellow at the School of Computing and Information Systems at Singapore Management University supervised by Prof. Jun Sun. I also work closely with Prof. Xingjun Ma at Fudan University. I completed my Ph.D. at Xidian University under the supervision of Prof. Xixiang Lyu.

I pursue research in Trustworthy AI, aiming to build secure, robust, and interpretable systems that align with human values and cognition. I’m especially interested in generative models (LLMs, diffusion models, and AI agents) and AI safety, and I seek simple yet insightful solutions grounded in theory. Guided by the philosophy "Everything should be made as simple as possible, but not simpler," I approach research with both rigor and curiosity. Outside of work, I enjoy rock climbing 🧗 and swimming 🏊.

My research centers on securing modern learning systems and large models, including backdoor behaviors, robust training and evaluation, and controllable/safe deployment of LLMs and agentic systems.

Selected Publications

† indicates corresponding author. For full publications, refer to my Google Scholar page.

	Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs Yige Li, Wei Zhao, Zhe Li, Nay Myat Min, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Yu-Gang Jiang, Jun Sun arXiv, 2026
	AutoBackdoor: Automating Backdoor Attacks via LLM Agents Yige Li, Zhe Li, Wei Zhao, Nay Myat Min, Hanxun Huang, Xingjun Ma, Jun Sun arXiv, 2025
	Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks Yige Li, Jiabo He, Hanxun Huang, Jun Sun, Xingjun Ma IEEE TDSC 2025
	BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun NeurIPS, 2025 \| First Prize in SafetyBench Competition
	Adaptive Content Restriction for Large Language Models via Suffix Optimization Yige Li, Peihai Jiang, Jun Sun, Peng Shu, Tianming Liu, Zhen Xiang arXiv, 2025
	Expose Before You Defend: Better Backdoor Defense With Exposed Models Yige Li, Hanxun Huang, Jiaming Zhang, Xingjun Ma, Yu-Gang Jiang arXiv, 2025
	Reconstructive Neuron Pruning for Backdoor Defense Yige Li, Xixiang Lyu, Xingjun Ma, Nodens Koren, Lingjuan Lyu, Bo Li, Yu-Gang Jiang ICML, 2023
	Anti-Backdoor Learning: Training Clean Models on Poisoned Data Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma NeurIPS, 2021
	Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma ICLR, 2021
	CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization Nay Myat Min, Long H Pham, Yige Li†, Jun Sun ICML, 2025
	Propaganda AI: An Analysis of Semantic Divergence in Large Language Models Nay Myat Min, Long H Pham, Yige Li†, Jun Sun ICLR, 2026
	Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security Wei Zhao, Zhe Li, Yige Li†, Jun Sun NDSS, 2026
	X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP Hanxun Huang, Sarah Erfani, Yige Li†, Xingjun Ma, James Bailey ICML, 2025

Award Honors

We're honored to share that our BackdoorLLM has won the First Prize (only three worldwide) in the SafetyBench competition, organized by the Center for AI Safety. Huge thanks to the organizers and reviewers for recognizing our work.

Professional Activities

Program Committee Member

I’m very glad to share that I’ve been invited to serve as an Area Chair (AC) for ACL 2026, one of the top conferences in NLP.
Top Reviewer, NeurIPS 2025
Reviewer: ICLR, ICML, NeurIPS, CVPR, ICCV, AAAI, ACL, EMNLP

Journal Reviewer

IEEE TPAMI, IEEE TIFS, IEEE TDSC, IEEE TKDE

Template adapted from Jon Barron's website.