* indicates corresponding author

2026

Teach to Reason Safely: Policy-Guided Safety Tuning for MLRMs
Jingyu Zhang, Kun Yang*, Ming Wen, Zhuoer Xu, Zeyang Sha*, Shiwen Cui, Zhaohui Yang ICLR 2026 arXiv

2025

Single AI Agent Runtime Security Testing Standards
Ant Group World Digital Technology Academy WDTA
Agent Safety Alignment via Reinforcement Learning
Zeyang Sha, Hanling Tian, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weiqiang Wang arXiv arXiv
A Survey of LLM-Driven AI Agent Communication: Protocols, Security Risks, and Defense Countermeasures
Dezhang Kong, Shi Lin, Zhenhua Xu, Zhebo Wang, Minghao Li, Yufeng Li, Yilun Zhang, Zeyang Sha, Yuyuan Li, Changting Lin, Xun Wang, Xuan Liu, Muhammad Khurram Khan, Ningyu Zhang, Chaochao Chen, Meng Han arXiv arXiv
SEM: Reinforcement Learning for Search-Efficient Large Language Models
Zeyang Sha, Shiwen Cui, Weiqiang Wang arXiv arXiv
FragFake: A Dataset for Fine-Grained Detection of Edited Images with Vision Language Models
Zhen Sun, Ziyi Zhang, Zeren Luo, Zeyang Sha, Tianshuo Cong, Zheng Li, Shiwen Cui, Weiqiang Wang, Jiaheng Wei, Xinlei He, Qi Li, Qian Wang arXiv arXiv
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models
Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He arXiv arXiv

2024

Conversation Reconstruction Attack Against GPT Models
Junjie Chu, Zeyang Sha*, Michael Backes, Yang Zhang* EMNLP 2024 arXiv Code
ZeroFake: Zero-Shot Detection of Fake Images Generated and Edited by Text-to-Image Generation Models
Zeyang Sha, Yicong Tan, Mingjie Li, Michael Backes, Yang Zhang CCS 2024 arXiv Code
Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming
Yukun Jiang, Xinyue Shen, Rui Wen, Zeyang Sha, Junjie Chu, Yugeng Liu, Michael Backes, Yang Zhang ICWSM 2024 arXiv
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha, Yang Zhang arXiv arXiv
Comprehensive Assessment of Toxicity in ChatGPT
Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang arXiv arXiv

2023

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang CCS 2023 arXiv Code Best Paper Finalist · CSAW Europe 2024
Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders
Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang CVPR 2023 arXiv Code
From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need
Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang arXiv arXiv

2022

Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
Zeyang Sha, Xinlei He, Pascal Berrang, Mathias Humbert, Yang Zhang arXiv arXiv