It appears that the List of All Adversarial Example Papers has been experiencing crashes over the past few days. In the absence of this valuable resource, staying up-to-date with the latest research papers in this field has become challenging. Consequently, I created a repository aimed at aggregating and maintaining the most current papers in this domain. While this repository may not encompass every paper, I did try. If you find any papers we have missed, just drop me an email. We have included the data from List of All Adversarial Example Papers till 2023-09-01. We also provide a list of papers about transfer-based attacks here.
-
Model Medicine: A Clinical Framework for Understanding, Diagnosing, and Treating AI Models
Jihoon Jeong
-
G. Madan Mohan, Veena Kiran Nambiar, Kiranmayee Janardhan
-
Differentially Private Multimodal In-Context Learning
Ivoline C. Ngong, Zarreen Reza, Joseph P. Near
-
Hiroki Fukui
-
AegisUI: Behavioral Anomaly Detection for Structured User Interface Protocols in AI Agent Systems
Mohd Safwan Uddin, Saba Hajira
-
Judge Reliability Harness: Stress Testing the Reliability of LLM Judges
Sunishchal Dev, Andrew Sloan, Joshua Kavner, Nicholas Kong, Morgan Sandler
-
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar
-
FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation
Min Tan, Junchao Ma, Yinfu Feng, Jiajun Ding, Wenwen Pan, Tingting Han, Qian Zheng, Zhenzhong Kuang, Zhou Yu
-
AgentSCOPE: Evaluating Contextual Privacy Across Agentic Workflows
Ivoline C. Ngong, Keerthiram Murugesan, Swanand Kadhe, Justin D. Weisz, Amit Dhurandhar, Karthikeyan Natesan Ramamurthy
-
Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks
Yuxiang Zhang, Bin Ma, Enyan Dai
-
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
Helena Casademunt, Bartosz Cywiński, Khoi Tran, Arya Jakkli, Samuel Marks, Neel Nanda
-
Ruiqi Zhang, Lingxiang Wang, Hainan Zhang, Zhiming Zheng, Yanyan Lan
-
Panagiotis Alexios Spanakis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou
-
ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts
Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul, Pakhapoom Sarapat
-
NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance
Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim
-
Privacy-Aware Camera 2.0 Technical Report
Huan Song, Shuyu Tian, Ting Long, Jiang Liu, Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li
-
Knowledge Divergence and the Value of Debate for Scalable Oversight
Robin Young
-
Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler
-
Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models
Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler
-
SRasP: Self-Reorientation Adversarial Style Perturbation for Cross-Domain Few-Shot Learning
Wenqian Li, Pengfei Fang, Hui Xue
-
Chanmi Lee, Minsung Yoon, Woojae Kim, Sebin Lee, Sung-eui Yoon
-
When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining
Zhihao Li, Gezheng Xu, Jiale Cai, Ruiyi Fang, Di Wu, Qicheng Lao, Charles Ling, Boyu Wang
-
Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness
Ruichen Xu, Kexin Chen
-
Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller
-
Francisco M. Calatrava-Nicolás, Shoko Miyauchi, Vitor Fortes Rey, Paul Lukowicz, Todor Stoyanov, Oscar Martinez Mozos
-
Latent Wasserstein Adversarial Imitation Learning
Siqi Yang, Kai Yan, Alexander G. Schwing, Yu-Xiong Wang
-
Osmosis Distillation: Model Hijacking with the Fewest Samples
Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou
-
Good-Enough LLM Obfuscation (GELO)
Anatoly Belikov, Ilya Fedotov
-
Fai Gu, Qiyu Tang, Te Wen, Emily Davis, Finn Carter
-
Efficient Privacy-Preserving Sparse Matrix-Vector Multiplication Using Homomorphic Encryption
Yang Gao, Gang Quan, Wujie Wen, Scott Piersall, Qian Lou, Liqiang Wang
-
ShieldBypass: On the Persistence of Impedance Leakage Beyond EM Shielding
Md Sadik Awal, Md Tauhidur Rahman
-
Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation
Xiaoguang Li, Hanyi Wang, Yaowei Huang, Jungang Yang, Qingqing Ye, Haonan Yan, Ke Pan, Zhe Sun, Hui Li
-
When Agents Persuade: Propaganda Generation and Mitigation in LLMs
Julia Jose, Ritik Roongta, Rachel Greenstadt
-
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration
Yizhe Xie, Congcong Zhu, Xinyue Zhang, Tianqing Zhu, Dayong Ye, Minfeng Qi, Huajie Chen, Wanlei Zhou
-
Pedram Agand
-
DMD-augmented Unpaired Neural Schrödinger Bridge for Ultra-Low Field MRI Enhancement
Youngmin Kim, Jaeyun Shin, Jeongchan Kim, Taehoon Lee, Jaemin Kim, Peter Hsu, Jelle Veraart, Jong Chul Ye
-
Oracle-efficient Hybrid Learning with Constrained Adversaries
Princewill Okoroafor, Robert Kleinberg, Michael P. Kim
-
Yangyang Wei, Yijie Xu, Zhenyuan Li, Xiangmin Shen, Shouling Ji
-
Measuring Privacy vs. Fidelity in Synthetic Social Media Datasets
Henry Tari, Adriana Iamnitchi
-
Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Hongliu Cao, Ilias Driouich, Eoin Thomas
-
AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Zihang Zeng, Jiaquan Zhang, Pengze Li, Yuan Qi, Xi Chen
-
Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
Achyutha Menon, Magnus Saebo, Tyler Crosse, Spencer Gibson, Eyon Jang, Diogo Cruz
-
Scores Know Bobs Voice: Speaker Impersonation Attack
Chanwoo Hwang, Sunpill Kim, Yong Kiam Tan, Tianchi Liu, Seunghun Paik, Dongsoo Kim, Mondal Soumik, Khin Mi Mi Aung, Jae Hong Seo
-
StegaFFD: Privacy-Preserving Face Forgery Detection via Fine-Grained Steganographic Domain Lifting
Guoqing Ma, Xun Lin, Hui Ma, Ajian Liu, Yizhong Liu, Wenzhong Tang, Shan Yu, Chenqi Kong, Yi Yu
-
Contextualized Privacy Defense for LLM Agents
Yule Wen, Yanzhe Zhang, Jianxun Lian, Xiaoyuan Yi, Xing Xie, Diyi Yang
-
Robin Young
-
Conditioned Activation Transport for T2I Safety Steering
Maciej Chrabąszcz, Aleksander Szymczyk, Jan Dubiński, Tomasz Trzciński, Franziska Boenisch, Adam Dziedzic
-
Understanding and Mitigating Dataset Corruption in LLM Steering
Cullen Anderson, Narmeen Oozeer, Foad Namjoo, Remy Ogasawara, Amirali Abdullah, Jeff M. Phillips
-
ExpGuard: LLM Content Moderation in Specialized Domains
Minseok Choi, Dongjin Kim, Seungbin Yang, Subin Kim, Youngjun Kwak, Juyoung Oh, Jaegul Choo, Jungmin Son
-
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
Sudip Bhujel
-
TAO-Attack: Toward Advanced Optimization-Based Jailbreak Attacks for Large Language Models
Zhi Xu, Jiaqi Li, Xiaotong Zhang, Hong Yu, Han Liu
-
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
Aradhye Agarwal, Gurdit Siyan, Yash Pandya, Joykirat Singh, Akshay Nambi, Ahmed Awadallah
-
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
Zhongxi Wang, Yueqian Lin, Jingyang Zhang, Hai Helen Li, Yiran Chen
-
From Shallow to Deep: Pinning Semantic Intent via Causal GRPO
Shuyi Zhou, Zeen Song, Wenwen Qiang, Jiyan Sun, Yao Zhou, Yinlong Liu, Wei Ma
-
The Price of Robustness: Stable Classifiers Need Overparameterization
Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok
-
Integrating Homomorphic Encryption and Synthetic Data in FL for Privacy and Learning Quality
Yenan Wang, Carla Fabiana Chiasserini, Elad Michael Schiller
-
Patrick Inoue, Florian Röhrbein, Andreas Knoblauch
-
Exploiting PendingIntent Provenance Confusion to Spoof Android SDK Authentication
Ramanpreet Singh Khinda
-
Extending the Formalism and Theoretical Foundations of Cryptography to AI
Federico Villa, F. Betül Durak, Tadayoshi Kohno, Tapdig Maharramli, Franziska Roesner
-
DSBA: Dynamic Stealthy Backdoor Attack with Collaborative Optimization in Self-Supervised Learning
Jiayao Wang, Mohammad Maruf Hasan, Yiping Zhang, Xiaoying Lei, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao
-
Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field
Peter Horvath, Ilia Shumailov, Lukasz Chmielewski, Lejla Batina, Yuval Yarom
-
RAIN: Secure and Robust Aggregation under Shuffle Model of Differential Privacy
Yuhang Li, Yajie Wang, Xiangyun Tang, Peng Jiang, Yu-an Tan, Liehuang Zhu
-
Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks
Junjie Chu, Xinyue Shen, Ye Leng, Michael Backes, Yun Shen, Yang Zhang
-
VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models
Duoxun Tang, Dasen Dai, Jiyao Wang, Xiao Yang, Jianyu Wang, Siqi Cai
-
Towards Privacy-Preserving LLM Inference via Collaborative Obfuscation (Technical Report)
Yu Lin, Qizhi Zhang, Wenqiang Ruan, Daode Zhang, Jue Hong, Ye Wu, Hanning Xia, Yunlong Mao, Sheng Zhong
-
Extracting Training Dialogue Data from Large Language Model based Task Bots
Shuo Zhang, Junzhou Zhao, Junji Hou, Pinghui Wang, Chenxu Wang, Jing Tao
-
Xiaoyi Pang, Xuanyi Hao, Pengyu Liu, Qi Luo, Song Guo, Zhibo Wang
-
What Helps -- and What Hurts: Bidirectional Explanations for Vision Transformers
Qin Su, Tie Luo
-
Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution
Guoxin Shi, Haoyu Wang, Zaihui Yang, Yuxing Wang, Yongzhe Chang
-
ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs
Xunlei Chen, Jinyu Guo, Yuang Li, Zhaokun Wang, Yi Gong, Jie Zou, Jiwei Wei, Wenhong Tian
-
Tailai Song, Pedro Casas, Michela Meo
-
Real Money, Fake Models: Deceptive Model Claims in Shadow APIs
Yage Zhang, Yukun Jiang, Zeyuan Chen, Michael Backes, Xinyue Shen, Yang Zhang
-
Explanation-Guided Adversarial Training for Robust and Interpretable Models
Chao Chen, Yanhui Chen, Shanshan Lin, Dongsheng Hong, Shu Wu, Xiangwen Liao, Chuanyi Liu
-
Huw Day, Adrianna Jezierska, Jessica Woodgate
-
Guy Smorodinsky, Sveta Gimpleson, Itay Safran
-
Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection
Jianfeng Liao, Yichen Wei, Raymond Chan Ching Bon, Shulan Wang, Kam-Pui Chow, Kwok-Yan Lam
-
Bo Ma, Jinsong Wu, Weiqi Yan, Catherine Shi, Minh Nguyen
-
CoopDiff: A Diffusion-Guided Approach for Cooperation under Corruptions
Gong Chen, Chaokun Zhang, Pengcheng Lv
-
Frontier Models Can Take Actions at Low Probabilities
Alex Serrano, Wen Xing, David Lindner, Erik Jenner
-
From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions
Zhihang Deng, Jiaping Gui, Weinan Zhang
-
Protection against Source Inference Attacks in Federated Learning
Andreas Athanasiou, Kangsoo Jung, Catuscia Palamidessi
-
Yizhi Liu, Balaji Padmanabhan, Siva Viswanathan
-
AutoFFS: Adversarial Deformations for Facial Feminization Surgery Planning
Paul Friedrich, Florentin Bieder, Florian M. Thieringer, Philippe C. Cattin
-
TraceGuard: Process-Guided Firewall against Reasoning Backdoors in Large Language Models
Zhen Guo, Shanghao Shi, Hao Li, Shamim Yazdani, Ning Zhang, Reza Tourani
-
Oluseyi Olukola, Nick Rahimi
-
Haoyuan Zhang, Keyao Wang, Guosheng Zhang, Haixiao Yue, Zhiwen Tan, Siran Peng, Tianshuo Zhang, Xiao Tan, Kunbin Chen, Wei He, Jingdong Wang, Ajian Liu, Xiangyu Zhu, Zhen Lei
-
Turning Black Box into White Box: Dataset Distillation Leaks
Huajie Chen, Tianqing Zhu, Yuchen Zhong, Yang Zhang, Shang Wang, Feng He, Lefeng Zhang, Jialiang Shen, Minghao Wang, Wanlei Zhou
-
Hide&Seek: Remove Image Watermarks with Negligible Cost via Pixel-wise Reconstruction
Huajie Chen, Tianqing Zhu, Hailin Yang, Yuchen Zhong, Yang Zhang, Hui Sun, Heng Xu, Zuobin Ying, Lihua Yin, Wanlei Zhou
-
Token-level Data Selection for Safe LLM Fine-tuning
Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang
-
Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight
-
JailNewsBench: Multi-Lingual and Regional Benchmark for Fake News Generation under Jailbreak Attacks
Masahiro Kaneko, Ayana Niwa, Timothy Baldwin
-
I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift
Subramanyam Sahoo, Vinija Jain, Divya Chaudhary, Aman Chadha
-
\textsc{Mobile-VTON}: High-Fidelity On-Device Virtual Try-On
Zhenchen Wan, Ce Chen, Runqi Lin, Jiaxin Huang, Tianxi Chen, Yanwu Xu, Tongliang Liu, Mingming Gong
-
Towards Policy-Adaptive Image Guardrail: Benchmark and Method
Caiyong Piao, Zhiyuan Yan, Haoming Xu, Yunzhen Zhao, Kaiqing Lin, Feiyang Xu, Shuigeng Zhou
-
Xinwen Cheng, Jingyuan Zhang, Zhehao Huang, Yingwen Wu, Xiaolin Huang
-
Subliminal Signals in Preference Labels
Isotta Magistrali, Frédéric Berdoz, Sam Dauncey, Roger Wattenhofer
-
S2O: Enhancing Adversarial Training with Second-Order Statistics of Weights
Gaojie Jin, Xinping Yi, Wei Huang, Sven Schewe, Xiaowei Huang
-
Shengbo wang, Nian Si
-
BadRSSD: Backdoor Attacks on Regularized Self-Supervised Diffusion Models
Jiayao Wang, Yiping Zhang, Mohammad Maruf Hasan, Xiaoying Lei, Jiale Zhang, Junwu Zhu, Qilin Wu, Dongfang Zhao
-
Frédéric Berdoz, Leonardo Rugli, Roger Wattenhofer
-
Clawdrain: Exploiting Tool-Calling Chains for Stealthy Token Exhaustion in OpenClaw Agents
Ben Dong, Hui Feng, Qian Wang
-
A Systematic Study of LLM-Based Architectures for Automated Patching
Qingxiao Xu, Ze Sheng, Zhicheng Chen, Jeff Huang
-
Shrey Shah, Levent Ozgur
-
Exact and Asymptotically Complete Robust Verifications of Neural Networks via Quantum Optimization
Wenxin Li, Wenchao Liu, Chuan Wang, Qi Gao, Yin Ma, Hai Wei, Kai Wen
-
ROKA: Robust Knowledge Unlearning against Adversaries
Jinmyeong Shin, Joshua Tapia, Nicholas Ferreira, Gabriel Diaz, Moayed Daneshyari, Hyeran Jeon
-
CaptionFool: Universal Image Captioning Model Attacks
Swapnil Parekh
-
MIDAS: Multi-Image Dispersion and Semantic Reconstruction for Jailbreaking MLLMs
Yilian Liu, Xiaojun Jia, Guoshun Nan, Jiuyang Lyu, Zhican Chen, Tao Guan, Shuyuan Luo, Zhongyi Zhai, Yang Liu
-
Learning to Attack: A Bandit Approach to Adversarial Context Poisoning
Ray Telikani, Amir H. Gandomi
-
A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
Ruihao Pan, Suhang Wang
-
BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages
Jason Lucas, Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita, Dominik Macko, Ivan Srba, Robert Moro, Dongwon Lee
-
Improved Adversarial Diffusion Compression for Real-World Video Super-Resolution
Bin Chen, Weiqi Li, Shijie Zhao, Xuanyu Zhang, Junlin Li, Li Zhang, Jian Zhang
-
Analyzing Physical Adversarial Example Threats to Machine Learning in Election Systems
Khaleque Md Aashiq Kamal, Surya Eada, Aayushi Verma, Subek Acharya, Adrian Yemin, Benjamin Fuller, Kaleel Mahmood
-
IU: Imperceptible Universal Backdoor Attack
Hsin Lin, Yan-Lun Chen, Ren-Hung Hwang, Chia-Mu Yu
-
Weight Updates as Activation Shifts: A Principled Framework for Steering
Dyah Adila, John Cooper, Alexander Yun, Avi Trost, Frederic Sala
-
Quoc Minh Nguyen, Trung Le, Jing Wu, Anh Tuan Bui, Mehrtash Harandi
-
Atsuki Sato, Martin Aumüller, Yusuke Matsui
-
Curation Leaks: Membership Inference Attacks against Data Curation for Machine Learning
Dariush Wahdany (1), Matthew Jagielski (2), Adam Dziedzic (1), Franziska Boenisch (1) ((1) CISPA Helmholtz Center for Information Security, (2) Anthropic)
-
ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data
Haodong Zhao, Jinming Hu, Zhaomin Wu, Zongru Wu, Wei Du, Junyi Hou, Caibei Zhao, Zhuosheng Zhang, Bingsheng He, Gongshen Liu
-
Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs
Jingyuan Xie, Wenjie Wang, Ji Wu, Jiandong Gao
-
CIRCLE: A Framework for Evaluating AI from a Real-World Lens
Reva Schwartz, Carina Westling, Morgan Briggs, Marzieh Fadaee, Isar Nejadgholi, Matthew Holmes, Fariza Rashid, Maya Carlyle, Afaf Taïk, Kyra Wilson, Peter Douglas, Theodora Skeadas, Gabriella Waters, Rumman Chowdhury, Thiago Lacerda
-
Gregory Kang Ruey Lau, Hieu Dao, Nicole Kan Hui Lin, Bryan Kian Hsiang Low
-
LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering
Rafid Ishrak Jahan, Fahmid Shahriar Iqbal, Sagnik Ray Choudhury
-
Seungdong Yoa, Sanghyu Yoon, Suhee Yoon, Dongmin Kim, Ye Seul Sim, Junhyun Lee, Woohyung Lim
-
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
Zhicheng Fang, Jingjie Zheng, Chenxu Fu, Wei Xu
-
Controllable Reasoning Models Are Private Thinkers
Haritz Puerto, Haonan Li, Xudong Han, Timothy Baldwin, Iryna Gurevych
-
Half-Truths Break Similarity-Based Retrieval
Bora Kargi, Arnas Uselis, Seong Joon Oh
-
GuardAlign: Test-time Safety Alignment in Multimodal Large Language Models
Xingyu Zhu, Beier Zhu, Junfeng Fang, Shuo Wang, Yin Zhang, Xiang Wang, Xiangnan He
-
TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?
Xiaochuang Yuan, Hui Xu, Silvia Xu, Cui Zou, Jing Xiong
-
Chung-ju Huang, Huiqiang Zhao, Yuanpeng He, Lijian Li, Wenpin Jiao, Zhi Jin, Peixuan Chen, Leye Wang
-
LiaisonAgent: An Multi-Agent Framework for Autonomous Risk Investigation and Governance
Chuanming Tang, Ling Qing, Shifeng Chen
-
Physical Evaluation of Naturalistic Adversarial Patches for Camera-Based Traffic-Sign Detection
Brianna D'Urso, Tahmid Hasan Sakib, Syed Rafay Hasan, Terry N. Guo
-
Challenges in Enabling Private Data Valuation
Yiwei Fu, Tianhao Wang, Varun Chandrasekaran
-
Verifier-Bound Communication for LLM Agents: Certified Bounds on Covert Signaling
Om Tailor
-
He Li, Wenyue He, Weihang Kong, Xingchen Zhang
-
Induced Numerical Instability: Hidden Costs in Multimodal Large Language Models
Wai Tuck Wong, Jun Sun, Arunesh Sinha
-
CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety
Umid Suleymanov, Rufiz Bayramov, Suad Gafarli, Seljan Musayeva, Taghi Mammadov, Aynur Akhundlu, Murat Kantarcioglu
-
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia
-
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Usman Anwar, Julianna Piskorz, David D. Baek, David Africa, Jim Weatherall, Max Tegmark, Christian Schroeder de Witt, Mihaela van der Schaar, David Krueger
-
Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, Hongxin Hu
-
Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang
-
Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
Xiaosen Wang, Zhijin Ge, Bohan Liu, Zheng Fang, Fengfan Zhou, Ruixuan Zhang, Shaokang Wang, Yuyang Luo
-
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors
Abhay Sheshadri, Aidan Ewart, Kai Fronsdal, Isha Gupta, Samuel R. Bowman, Sara Price, Samuel Marks, Rowan Wang
-
Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
Boyang Zhang, Yang Zhang
-
No Caption, No Problem: Caption-Free Membership Inference via Model-Fitted Embeddings
Joonsung Jeon, Woo Jae Kim, Suhyeon Ha, Sooel Son, Sung-Eui Yoon
-
Decomposing Private Image Generation via Coarse-to-Fine Wavelet Modeling
Jasmine Bayrooti, Weiwei Kong, Natalia Ponomareva, Carlos Esteves, Ameesh Makadia, Amanda Prorok
-
Tao Huang, Jiayang Meng, Xu Yang, Chen Hou, Hong Chen
-
Multilingual Safety Alignment Via Sparse Weight Editing
Jiaming Liang, Zhaoxin Wang, Handing Wang
-
Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD
Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen
-
Tackling Privacy Heterogeneity in Differentially Private Federated Learning
Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang
-
DPSQL+: A Differentially Private SQL Library with a Minimum Frequency Rule
Tomoya Matsumoto, Shokichi Takakura, Shun Takagi, Satoshi Hasegawa
-
Q-Tag: Watermarking Quantum Circuit Generative Models
Yang Yang, Yuzhu Long, Han Fang, Zhaoyun Chen, Zhonghui Li, Weiming Zhang, Guoping Guo
-
Layer-Targeted Multilingual Knowledge Erasure in Large Language Models
Taoran Li, Varun Chandrasekaran, Zhiyuan Yu
-
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search
Xun Huang, Simeng Qin, Xiaoshuang Jia, Ranjie Duan, Huanqian Yan, Zhitao Zeng, Fei Yang, Yang Liu, Xiaojun Jia
-
Lifecycle-Integrated Security for AI-Cloud Convergence in Cyber-Physical Infrastructure
S M Zia Ur Rashid, Deepa Gurung, Sonam Raj Gupta, Suman Rath
-
Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection
Marcus Graves
-
Hidden in the Metadata: Stealth Poisoning Attacks on Multimodal Retrieval-Augmented Generation
Kennedy Edemacu, Mohammad Mahdi Shokri
-
Srikumar Nayak
-
Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification
David Condrey
-
Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation
David Condrey
-
AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors
Abhay Sheshadri, Aidan Ewart, Kai Fronsdal, Isha Gupta, Samuel R. Bowman, Sara Price, Samuel Marks, Rowan Wang
-
Beyond Refusal: Probing the Limits of Agentic Self-Correction for Semantic Sensitive Information
Umid Suleymanov, Zaur Rajabov, Emil Mirzazada, Murat Kantarcioglu
-
Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound
Nicholas Dietrich, David McShannon
-
Mario García-Márquez, Nuria Rodríguez-Barroso, M.Victoria Luzón, Francisco Herrera
-
Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes
Xavier Pleimling, Sifat Muhammad Abdullah, Gunjan Balde, Peng Gao, Mainack Mondal, Murtuza Jadliwala, Bimal Viswanath
-
When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng
-
Haoyuan He, Yu Zheng, Jie Zhou, Jiwen Lu
-
Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection
Zheng Gao, Xiaoyu Li, Zhicheng Bao, Xiaoyan Feng, Jiaojiao Jiang
-
Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias
JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Yoonji Lee, Seunghoon Lee, YoungBin Kim
-
Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang
-
Robustness in sparse artificial neural networks trained with adaptive topology
Bendegúz Sulyok, Gergely Palla, Filippo Radicchi, Santo Fortunato
-
Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination
Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Sihan Liu
-
Private and Robust Contribution Evaluation in Federated Learning
Delio Jaramillo Velez, Gergely Biczok, Alexandre Graell i Amat, Johan Ostman, Balazs Pejo
-
RAMSeS: Robust and Adaptive Model Selection for Time-Series Anomaly Detection Algorithms
Mohamed Abdelmaksoud, Sheng Ding, Andrey Morozov, Ziawasch Abedjan
-
Fatemeh Shoaei, Mohammad Pishdar, Mozafar Bag-Mohammadi, Mojtaba Karami
-
The Silent Spill: Measuring Sensitive Data Leaks Across Public URL Repositories
Tarek Ramadan, AbdelRahman Abdou, Mohammad Mannan, Amr Youssef
-
Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions
Lan Zhang, Chengsi Liang, Zeming Zhuang, Yao Sun, Fang Fang, Xiaoyong Yuan, Dusit Niyato
-
Harrison Dahme
-
Manifold of Failure: Behavioral Attraction Basins in Language Models
Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, AmmarnAl-Kahfah, Ken Huang, Blake Gatto
-
Dhiraj Neupane, Richard Dazeley, Mohamed Reda Bouadjenek, Sunil Aryal
-
Training Agents to Self-Report Misbehavior
Bruce W. Lee, Chen Yueh-Han, Tomek Korbak
-
HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems
Idan Habler, Vineeth Sai Narajala, Stav Koren, Amy Chang, Tiffany Saade
-
Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace
Qianlong Lan, Anuj Kaul, Shaun Jones, Stephanie Westrum
-
Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
Niamul Hassan Samin, Md Arifur Rahman, Abdullah Ibne Hanif, Juena Ahmed Noshin, Md Ashikur Rahman
-
CQSA: Byzantine-robust Clustered Quantum Secure Aggregation in Federated Learning
Arnab Nath, Harsh Kasyap
-
Beyond performance-wise Contribution Evaluation in Federated Learning
Balazs Pejo
-
Differentially Private Truncation of Unbounded Data via Public Second Moments
Zilong Cao, Xuan Bi, Hai Zhang
-
Manifold of Failure: Behavioral Attraction Basins in Language Models
Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Blake Gatto
-
Attention to Neural Plagiarism: Diffusion Models Can Plagiarize Your Copyrighted Images!
Zihang Zou, Boqing Gong, Liqiang Wang
-
Online Algorithms with Unreliable Guidance
Julien Dallot, Yuval Emek, Yuval Gil, Maciej Pacut, Stefan Schmid
-
ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
Che Wang, Fuyao Zhang, Jiaming Zhang, Ziqi Zhang, Yinghui Wang, Longtao Huang, Jianbo Gao, Zhong Chen, Wei Yang Bryan Lim
-
Personal Information Parroting in Language Models
Nishant Subramani, Kshitish Ghate, Mona Diab
-
OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
Longxiang Wang, Xiang Zheng, Xuhao Zhang, Yao Zhang, Ye Wu, Cong Wang
-
AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs
Che Wang, Jiaming Zhang, Ziqi Zhang, Zijie Wang, Yinghui Wang, Jianbo Gao, Tao Wei, Zhong Chen, Wei Yang Bryan Lim
-
SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
Yifei Xu, Guilherme Potje, Shivam Shandilya, Tiancheng Yuan, Leonardo de Oliveira Nunes, Rakshanda Agarwal, Saeid Asgari, Adam Atkinson, Emre Kıcıman, Songwu Lu, Ranveer Chandra, Tusher Chakraborty
-
Does Order Matter : Connecting The Law of Robustness to Robust Generalization
Himadri Mandal, Vishnu Varadarajan, Jaee Ponde, Aritra Das, Mihir More, Debayan Gupta
-
"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang
-
Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization
Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi
-
Robust Spiking Neural Networks Against Adversarial Attacks
Shuai Wang, Malu Zhang, Yulin Jiang, Dehao Zhang, Ammar Belatreche, Yu Liang, Yimeng Shan, Zijian Zhou, Yang Yang, Haizhou Li
-
AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents
Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, Jingheng Huan
-
RecoverMark: Robust Watermarking for Localization and Recovery of Manipulated Faces
Haonan An, Xiaohui Ye, Guang Hua, Yihang Tao, Hangcheng Cao, Xiangyu Yu, Yuguang Fang
-
VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models
Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You
-
Oracle-Robust Online Alignment for Large Language Models
Zimeng Li, Mudit Gaur, Vaneet Aggarwal
-
Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning
Yige Liu, Yiwei Lou, Che Wang, Yongzhi Cao, Hanpin Wang
-
High-Dimensional Robust Mean Estimation with Untrusted Batches
Maryam Aliakbarpour, Vladimir Braverman, Yuhan Liu, Junze Yin
-
Assessing the Impact of Speaker Identity in Speech Spoofing Detection
Anh-Tuan Dao, Driss Matrouf, Nicholas Evans
-
Vanishing Watermarks: Diffusion-Based Image Editing Undermines Robust Invisible Watermarking
Fan Guo, Jiyu Kang, Qi Ming, Emily Davis, Finn Carter
-
Shruti Srivastava, Kiranmayee Janardhan, Shaurya Jauhari
-
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
Mengxuan Hu, Vivek V. Datla, Anoop Kumar, Zihan Guan, Sheng Li, Alfy Samuel, Daben Liu
-
Inderjeet Singh, Vikas Pahuja, Aishvariya Priya Rathina Sabapathy, Chiara Picardi, Amit Giloni, Roman Vainshtein, Andrés Murillo, Hisashi Kojima, Motoyoshi Sekiya, Yuki Unno, Junichi Suga
-
Minhui Yu, Yongheng Sun, David S. Lalush, Jason P Mihalik, Pew-Thian Yap, Mingxia Liu
-
Robust AI Evaluation through Maximal Lotteries
Hadi Khalaf, Serena L. Wang, Daniel Halpern, Itai Shapira, Flavio du Pin Calmon, Ariel D. Procaccia
-
Efficient Opportunistic Approachability
Teodor Vanislavov Marinov, Mehryar Mohri, Princewill Okoroafor, Jon Schneider, Julian Zimmert
-
Gabriele Farina, Juan Carlos Perdomo
-
ConformalHDC: Uncertainty-Aware Hyperdimensional Computing with Application to Neural Decoding
Ziyi Liang, Hamed Poursiami, Zhishun Yang, Keiland Cooper, Akhilesh Jaiswal, Maryam Parsa, Norbert Fortin, Babak Shahbaba
-
TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI
Kyeongpil Min, Sangmin Jeon, Jae-Jin Lee, Woojoo Lee
-
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy
-
Self-Purification Mitigates Backdoors in Multimodal Diffusion Language Models
Guangnian Wan, Qi Li, Gongfan Fang, Xinyin Ma, Xinchao Wang
-
VII: Visual Instruction Injection for Jailbreaking Image-to-Video Generation Models
Bowen Zheng, Yongli Xiang, Ziming Hong, Zerong Lin, Chaojian Yu, Tongliang Liu, Xinge You
-
Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao
-
BiRQA: Bidirectional Robust Quality Assessment for Images
Aleksandr Gushchin, Dmitriy S. Vatolin, Anastasia Antsiferova
-
SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images
Aayush Dhakal, Subash Khanal, Srikumar Sastry, Jacob Arndt, Philipe Ambrozio Dias, Dalton Lunga, Nathan Jacobs
-
Wasserstein Distributionally Robust Online Learning
Guixian Chen, Salar Fattahi, Soroosh Shafiee
-
CITED: A Decision Boundary-Aware Signature for GNNs Towards Model Extraction Defense
Bolin Shen, Md Shamim Seraj, Zhan Cheng, Shayok Chakraborty, Yushun Dong
-
CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks
Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong, Fan Yao, Yushun Dong
-
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko
-
Latent Introspection: Models Can Detect Prior Concept Injections
Theia Pearson-Vogel, Martin Vanek, Raymond Douglas, Jan Kulveit
-
Victor Morel, Cristiana Santos, Pontus Carlsson, Joel Ahlinder, Romaric Duvignau
-
Graph-theoretic Agreement Framework for Multi-agent LLM Systems
Muhammad Umar Javed
-
Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems
Moritz Weckbecker, Jonas Müller, Ben Hagag, Michael Mulet
-
Kairan Zhao, Eleni Triantafillou, Peter Triantafillou
-
Hong Wang, Xuwei Fan, Zhipeng Cheng, Yachao Yuan, Minghui Min, Minghui Liwang, Xiaoyu Xia
-
OpenClaw, Moltbook, and ClawdLab: From Agent-Only Social Networks to Autonomous Scientific Research
Lukas Weidener, Marko Brkić, Mihailo Jovanović, Ritvik Singh, Emre Ulgac, Aakaash Meduri
-
KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models
Ce Fang, Zhikun Zhang, Min Chen, Qing Liu, Lu Zhou, Zhe Liu, Yunjun Gao
-
Wei Tao, Yang Dai, Jincai Huang, Qing Tao
-
Gurjot Singh, Prabhjot Singh, Aashima Sharma, Maninder Singh, Ryan Ko
-
When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
Shenyang Chen, Liuwan Zhu
-
Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan, Luong Ha Tien, Ngo Duc Hoang Son, Van-Hau Pham
-
Johannes Ackermann, Michael Noukhovitch, Takashi Ishida, Masashi Sugiyama
-
Agentic Adversarial QA for Improving Domain-Specific LLMs
Vincent Grari, Ciprian Tomoiaga, Sylvain Lamprier, Tatsunori Hashimoto, Marcin Detyniecki
-
FENCE: A Financial and Multimodal Jailbreak Detection Dataset
Mirae Kim, Seonghun Jeong, Youngjun Kwak
-
On the Adversarial Robustness of Discrete Image Tokenizers
Rishika Bhagwatkar, Irina Rish, Nicolas Flammarion, Francesco Croce
-
RoEL: Robust Event-based 3D Line Reconstruction
Gwangtak Bae, Jaeho Shin, Seunggu Kang, Junho Kim, Ayoung Kim, Young Min Kim
-
Distribution-Free Sequential Prediction with Abstentions
Jialin Yu, Moïse Blanchard
-
Yu Bai, Zhe Wang, Jiarui Zhang, Dong-Xiao Zhang, Yinjun Gao, Jun-Jie Zhang
-
Generating adversarial inputs for a graph neural network model of AC power flow
Robert Parker
-
PRISM-FCP: Byzantine-Resilient Federated Conformal Prediction via Partial Sharing
Ehsan Lari, Reza Arablouei, Stefan Werner
-
Interactions that reshape the interfaces of the interacting parties
David I. Spivak
-
On the Generalization and Robustness in Conditional Value-at-Risk
Dinesh Karthik Mulumudi, Piyushi Manupriya, Gholamali Aminian, Anant Raj
-
Dynamic Deception: When Pedestrians Team Up to Fool Autonomous Cars
Masoud Jamshidiyan Tehrani, Marco Gabriel, Jinhan Kim, Paolo Tonella
-
AndroWasm: an Empirical Study on Android Malware Obfuscation through WebAssembly
Diego Soi, Silvia Lucia Sanna, Lorenzo Pisu, Leonardo Regano, Giorgio Giacinto
-
FeatureBleed: Inferring Private Enriched Attributes From Sparsity-Optimized AI Accelerators
Darsh Asher, Farshad Dizani, Joshua Kalyanapu, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz
-
Vishal Srivastava
-
All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
Zeyu Zhang, Ryan Chen, Bradly C. Stadie
-
Rong Fu, Muge Qi, Chunlei Meng, Shuo Yin, Kun Liu, Zhaolu Kang, Simon Fong
-
Robustness and Reasoning Fidelity of Large Language Models in Long-Context Code Question Answering
Kishan Maharaj, Nandakishore Menon, Ashita Saxena, Srikanth Tamilselvam
-
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation
Bogdan Kostić, Conor Fallon, Julian Risch, Alexander Löser
-
What Breaks Embodied AI Security:LLM Vulnerabilities, CPS Flaws,or Something Else?
Boyang Ma, Hechuan Guo, Peizhuo Lv, Minghui Xu, Xuelong Dai, YeChao Zhang, Yijun Yang, Yue Zhang
-
What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data
Dimitri Staufer, Kirsten Morehouse
-
Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
Xiaohan Zhao, Zhaoyi Li, Yaxin Luo, Jiacheng Cui, Zhiqiang Shen
-
AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue
Adib Sakhawat, Fardeen Sadab, Rakin Shahriar
-
ABCD: All Biases Come Disguised
Mateusz Nowak, Xavier Cadet, Peter Chin
-
Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning
Jyotin Goel, Souvik Maji, Pratik Mazumder
-
DAVE: A Policy-Enforcing LLM Spokesperson for Secure Multi-Document Data Sharing
René Brinkhege, Prahlad Menon
-
BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning
Siyuan Liang, Yongcheng Jing, Yingjie Wang, Jiaxing Huang, Ee-chien Chang, Dacheng Tao
-
When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs
Yu Fang, Yuchun Feng, Dong Jing, Jiaqi Liu, Yue Yang, Zhenyu Wei, Daniel Szafir, Mingyu Ding
-
Fail-Closed Alignment for Large Language Models
Zachary Coalson, Beth Sohler, Aiden Gabriel, Sanghyun Hong
-
Discovering Universal Activation Directions for PII Leakage in Language Models
Leo Marchyok, Zachary Coalson, Sungho Keum, Sooel Son, Sanghyun Hong
-
MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu
-
Efficient privacy loss accounting for subsampling and random allocation
Vitaly Feldman, Moshe Shenfeld
-
Canonicalizing Multimodal Contrastive Representation Learning
Sharut Gupta, Sanyam Kansal, Stefanie Jegelka, Phillip Isola, Vikas Garg
-
Guarding the Middle: Protecting Intermediate Representations in Federated Split Learning
Obaidullah Zaland, Sajib Mistry, Monowar Bhuyan
-
Privacy-Preserving Mechanisms Enable Cheap Verifiable Inference of LLMs
Arka Pal, Louai Zahran, William Gvozdjak, Akilesh Potti, Micah Goldblum
-
Privacy in Theory, Bugs in Practice: Grey-Box Auditing of Differential Privacy Libraries
Tudor Cebere, David Erb, Damien Desfontaines, Aurélien Bellet, Jack Fitzsimons
-
TFL: Targeted Bit-Flip Attack on Large Language Model
Jingkai Guo, Chaitali Chakrabarti, Deliang Fan
-
Provable Adversarial Robustness in In-Context Learning
Di Zhang
-
Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
Zachary Coalson, Bo Fang, Sanghyun Hong
-
Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models
Nick Dodson, Xinyu Gao, Qingsong Wang, Yusu Wang, Zhengchao Wan
-
A Theoretical Framework for Modular Learning of Robust Generative Models
Corinna Cortes, Mehryar Mohri, Yutao Zhong
-
Retrieval Collapses When AI Pollutes the Web
Hongyeon Yu, Dongchan Kim, Young-Bum Kim
-
The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models
Jakub Breier, Štefan Kučerák, Xiaolu Hou
-
Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents
Doron Shavit
-
Adib Sakhawat, Fardeen Sadab
-
Benchmarking Adversarial Robustness and Adversarial Training Strategies for Object Detection
Alexis Winter, Jean-Vincent Martini, Romaric Audigier, Angelique Loesch, Bertrand Luvison
-
Arc2Morph: Identity-Preserving Facial Morphing with Arc2Face
Nicolò Di Domenico, Annalisa Franco, Matteo Ferrara, Davide Maltoni
-
Differentially Private Non-convex Distributionally Robust Optimization
Difei Xu, Meng Ding, Zebin Ma, Huanyi Xie, Youming Tao, Aicha Slaitane, Di Wang
-
Vulnerability Analysis of Safe Reinforcement Learning via Inverse Constrained Reinforcement Learning
Jialiang Fan, Shixiong Jiang, Mengyu Liu, Fanxin Kong
-
Sequential Membership Inference Attacks
Thomas Michel, Debabrota Basu, Emilie Kaufmann
-
Protecting the Undeleted in Machine Unlearning
Aloni Cohen, Refael Kohen, Kobbi Nissim, Uri Stemmer
-
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
Yixuan Xiao, Florian Lux, Alejandro Pérez-González-de-Martos, Ngoc Thang Vu
-
Multi-Channel Replay Speech Detection using Acoustic Maps
Michael Neri, Tuomas Virtanen
-
SRFed: Mitigating Poisoning Attacks in Privacy-Preserving Federated Learning with Heterogeneous Data
Yiwen Lu
-
IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages
Priyaranjan Pattnayak, Sanchari Chowdhuri
-
AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang
-
Narrow fine-tuning erodes safety alignment in vision-language agents
Idhant Gulati, Shivam Raval
-
DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs
Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar
-
Automating Agent Hijacking via Structural Template Injection
Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li
-
Scott Thornton
-
Large-scale online deanonymization with LLMs
Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tramèr
-
Xray-Visual Models: Scaling Vision models on Industry Scale Data
Shlok Mishra, Tsung-Yu Lin, Linda Wang, Hongli Xu, Yimin Liu, Michael Hsu, Chaitanya Ahuja, Hao Yuan, Jianpeng Cheng, Hong-You Chen, Haoyuan Xu, Chao Li, Abhijeet Awasthi, Jihye Moon, Don Husa, Michael Ge, Sumedha Singla, Arkabandhu Chowdhury, Phong Dingh, Satya Narayan Shukla, Yonghuan Yang, David Jacobs, Qi Guo, Jun Xiao, Xiangjun Fan, Aashu Singh
-
Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents
Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut
-
Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming
Philip Sosnin, Jodie Knapp, Fraser Kennedy, Josh Collyer, Calvin Tsay
-
NeST: Neuron Selective Tuning for LLM Safety
Sasha Behrouzi, Lichao Wu, Mohamadreza Rostami, Ahmad-Reza Sadeghi
-
The Vulnerability of LLM Rankers to Prompt Injection Attacks
Yu Yin, Shuai Wang, Bevan Koopman, Guido Zuccon
-
Michael Cunningham
-
Visual Persuasion: What Influences Decisions of Vision-Language Models?
Manuel Cherep, Pranav M R, Pattie Maes, Nikhil Singh
-
Farzana Akter, Rakib Hossain, Deb Kanna Roy Toushi, Mahmood Menon Khan, Sultana Amin, Lisan Al Amin
-
Unforgeable Watermarks for Language Models via Robust Signatures
Huijia Lin, Kameron Shahabi, Min Jae Song
-
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
Mohammad Taufeeque, Stefan Heimersheim, Adam Gleave, Chris Cundy
-
Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong
-
The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
Max Springer, Chung Peng Lee, Blossom Metevier, Jane Castleman, Bohdan Turbal, Hayoung Jung, Zeyu Shen, Aleksandra Korolova
-
Effective and Robust Multimodal Medical Image Analysis
Joy Dhar, Nayyar Zaidi, Maryam Haghighat
-
Emergent Morphing Attack Detection in Open Multi-modal Large Language Models
Marija Ivanovska, Vitomir Štruc
-
Guangtao Lyu, Qi Liu, Chenghao Xu, Jiexi Yan, Muli Yang, Xueting Li, Fen Fang, Cheng Deng
-
Mitchell Piehl, Zhaohan Xi, Zuobin Xiong, Pan He, Muchao Ye
-
ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks
Tom A. Splittgerber
-
CEPAE: Conditional Entropy-Penalized Autoencoders for Time Series Counterfactuals
Tomàs Garriga, Gerard Sanz, Eduard Serrahima de Cambra, Axel Brando
-
A Note on Non-Composability of Layerwise Approximate Verification for Neural Inference
Or Zamir
-
Intellicise Wireless Networks Meet Agentic AI: A Security and Privacy Perspective
Rui Meng, Zhidi Zhang, Song Gao, Yaheng Wang, Xiaodong Xu, Yijing Lin, Yiming Liu, Chenyuan Feng, Lexi Xu, Yi Ma, Ping Zhang, Rahim Tafazolli
-
Jie Cao, Zelin Zhang, Qi Li, Jianbing Ni
-
Onto-DP: Constructing Neighborhoods for Differential Privacy on Ontological Databases
Yasmine Hayder (1), Adrien Boiret (1), Cédric Eichler (1), Benjamin Nguyen (1) ((1) PETSCRAFT)
-
Revisiting Backdoor Threat in Federated Instruction Tuning from a Signal Aggregation Perspective
Haodong Zhao, Jinming Hu, Gongshen Liu
-
Natural Privacy Filters Are Not Always Free: A Characterization of Free Natural Filters
Matthew Regehr, Bingshan Hu, Ethan Leeman, Pasin Manurangsi, Pierre Tholoniat, Mathias Lécuyer
-
Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
Valentin Dorseuil (DI-ENS), Jamal Atif (CMAP), Olivier Cappé (DI-ENS)
-
From Tool Orchestration to Code Execution: A Study of MCP Design Choices
Yuval Felendler, Parth A. Gandhi, Idan Habler, Yuval Elovici, Asaf Shabtai
-
Visual Memory Injection Attacks for Multi-Turn Conversations
Christian Schlarmann, Matthias Hein
-
Intent Laundering: AI Safety Datasets Are Not What They Seem
Shahriar Golchin, Marc Wetter
-
Varun Pratap Bhardwaj
-
Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections
Xianglin Yang, Yufei He, Shuo Ji, Bryan Hooi, Jin Song Dong
-
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)
Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, Wenjie Wang
-
Differentially Private Retrieval-Augmented Generation
Tingting Tang, James Flemings, Yongqin Wang, Murali Annavaram
-
Towards Selection as Power: Bounding Decision Authority in Autonomous Agents
Jose Manuel de la Chica Rodriguez, Juan Manuel Vera Díaz
-
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
Lukas Struppek, Adam Gleave, Kellin Pelrine
-
Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization
Shangding Gu
-
Overthinking Loops in Agents: A Structural Risk via MCP Tools
Yohan Lee, Jisoo Jang, Seoyeon Choi, Sangyeop Kim, Seungtaek Choi
-
Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
In Chong Choi, Jiacheng Zhang, Feng Liu, Yiliao Song
-
Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection
Chanhui Lee, Seunghyun Shin, Donggyu Choi, Hae-gon Jeon, Jeany Son
-
Truly Adapting to Adversarial Constraints in Constrained MABs
Francesco Emanuele Stradi, Kalana Kalupahana, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti
-
Boundary Point Jailbreaking of Black-Box LLMs
Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, Geoffrey Irving, Yarin Gal
-
Protecting Language Models Against Unauthorized Distillation through Trace Rewriting
Xinhang Ma, William Yeoh, Ning Zhang, Yevgeniy Vorobeychik
-
Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning
Mohammad Hadi Foroughi, Seyed Hamed Rastegar, Mohammad Sabokrou, Ahmad Khonsari
-
Weight space Detection of Backdoors in LoRA Adapters
David Puertolas Merenciano, Ekaterina Vasyagina, Raghav Dixit, Kevin Zhu, Ruizhe Li, Javier Ferrando, Maheep Chaudhary
-
Closing the Distribution Gap in Adversarial Training for LLMs
Chengzhi Hu, Jonas Dornbusch, David Lüdke, Stephan Günnemann, Leo Schwinn
-
Is Mamba Reliable for Medical Imaging?
Banafsheh Saber Latibari, Najmeh Nazari, Daniel Brignac, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis
-
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu
-
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu
-
ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI
Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng
-
Benchmarking at the Edge of Comprehension
Samuele Marro, Jialin Yu, Emanuele La Malfa, Oishi Deb, Jiawei Li, Yibo Yang, Ebey Abraham, Sunando Sengupta, Eric Sommerlade, Michael Wooldridge, Philip Torr
-
Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, Philip Torr
-
Zhenhong Zhou, Yuanhe Zhang, Hongwei Cai, Moayad Aloqaily, Ouns Bouachir, Linsey Pang, Prakhar Mehrotra, Kun Wang, Qingsong Wen
-
When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift
Max Fomin
-
In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes
Trishit Mondal, Ameya D. Jagtap
-
ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI
Haibo Tong, Feifei Zhao, Linghao Feng, Ruoyu Wu, Ruolin Chen, Lu Jia, Zhou Zhao, Jindong Li, Tenglong Li, Erliang Lin, Shuai Yang, Enmeng Lu, Yinqian Sun, Qian Zhang, Zizhe Ruan, Jinyu Fan, Zeyang Yue, Ping Wu, Huangrui Li, Chengyi Sun, Yi Zeng
-
Hamza Reguieg, Mohamed El Kamili, Essaid Sabir
-
AISA: Awakening Intrinsic Safety Awareness in Large Language Models against Jailbreak Attacks
Weiming Song, Xuan Xie, Ruiping Yin
-
Privacy-Concealing Cooperative Perception for BEV Scene Segmentation
Song Wang, Lingling Li, Marcus Santos, Guanghui Wang
-
Mitigating the Safety-utility Trade-off in LLM Alignment via Adaptive Safe Context Learning
Yanbo Wang, Minzheng Wang, Jian Liang, Lu Wang, Yongcan Yu, Ran He
-
Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges
Ruomeng Ding, Yifei Pang, He Sun, Yizhong Wang, Zhiwei Steven Wu, Zhun Deng
-
MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer
Heng Zhi, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen
-
PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training
Yuhan Cheng, Hancheng Ye, Hai Helen Li, Jingwei Sun, Yiran Chen
-
Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe
Somnath Banerjee
-
AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks
Yuqi Jia, Ruiqi Wang, Xilong Wang, Chong Xiang, Neil Gong
-
Detecting Deepfakes with Multivariate Soft Blending and CLIP-based Image-Text Alignment
Jingwei Li, Jiaxin Tong, Pengfei Wu
-
Evaluating Robustness of Reasoning Models on Parameterized Logical Problems
Naïm Es-sebbani, Esteban Marquer, Yakoub Salhi, Zied Bouraoui
-
Consistency of Large Reasoning Models Under Multi-Turn Attacks
Yubo Li, Ramayya Krishnan, Rema Padman
-
TensorCommitments: A Lightweight Verifiable Inference for Language Models
Oguzhan Baser, Elahe Sadeghi, Eric Wang, David Ribeiro Alves, Sam Kazemian, Hong Kang, Sandeep P. Chinchali, Sriram Vishwanath
-
RAT-Bench: A Comprehensive Benchmark for Text Anonymization
Nataša Krčo, Zexi Yao, Matthieu Meeus, Yves-Alexandre de Montjoye
-
Quantization-Robust LLM Unlearning via Low-Rank Adaptation
João Vitor Boer Abitante, Joana Meneguzzo Pasquali, Luan Fonseca Garcia, Ewerton de Oliveira, Thomas da Silva Paula, Rodrigo C. Barros, Lucas S. Kupssinskü
-
A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models
Yash Deo, Yan Jia, Toni Lassila, Victoria J Hodge, Alejandro F Frang, Chenghao Qian, Siyuan Kang, Ibrahim Habli
-
Realistic Face Reconstruction from Facial Embeddings via Diffusion Models
Dong Han, Yong Li, Joachim Denzler
-
Jiyong Uhm, Minseok Kim, Michalis Polychronakis, Hyungjoon Koo
-
Neighborhood Blending: A Lightweight Inference-Time Defense Against Membership Inference Attacks
Osama Zafar, Shaojie Zhan, Tianxi Ji, Erman Ayday
-
Backdoor Attacks on Contrastive Continual Learning for IoT Systems
Alfous Tim, Kuniyilh Simi D
-
OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage
Akshat Naik, Jay Culligan, Yarin Gal, Philip Torr, Rahaf Aljundi, Alasdair Paren, Adel Bibi
-
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
Xu Li, Simon Yu, Minzhou Pan, Yiyou Sun, Bo Li, Dawn Song, Xue Lin, Weiyan Shi
-
Backdooring Bias in Large Language Models
Anudeep Das, Prach Chantasantitam, Gurjot Singh, Lipeng He, Mariia Ponomarenko, Florian Kerschbaum
-
SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs
Mohamed Shaaban, Mohamed Elmahallawy
-
CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation
Tengjie Zhu, Guanyu Cai, Yang Zhaohui, Guanzhu Ren, Haohui Xie, ZiRui Wang, Junsong Wu, Jingbo Wang, Xiaokang Yang, Yao Mu, Yichao Yan, Yichao Yan
-
Soft Contamination Means Benchmarks Test Shallow Generalization
Ari Spiesberger, Juan J. Vazquez, Nicky Pochinkov, Tomáš Gavenčiak, Peli Grietzer, Gavin Leech, Nandi Schoots
-
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Renjun Xu, Yang Yan
-
Sparse Autoencoders are Capable LLM Jailbreak Mitigators
Yannick Assogba, Jacopo Cortellazzi, Javier Abad, Pau Rodriguez, Xavier Suau, Arno Blaas
-
Semantic-aware Adversarial Fine-tuning for CLIP
Jiacheng Zhang, Jinhao Li, Hanxun Huang, Sarah M. Erfani, Benjamin I.P. Rubinstein, Feng Liu
-
BlackCATT: Black-box Collusion Aware Traitor Tracing in Federated Learning
Elena Rodríguez-Lois, Fabio Brau, Maura Pintor, Battista Biggio, Fernando Pérez-González
-
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
-
MalTool: Malicious Tool Attacks on LLM Agents
Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, Neil Gong
-
Peak + Accumulation: A Proxy-Level Scoring Formula for Multi-Turn LLM Attack Detection
J Alex Corll
-
Blind Gods and Broken Screens: Architecting a Secure, Intent-Centric Mobile Agent Operating System
Zhenhua Zou, Sheng Guo, Qiuyang Zhan, Lepeng Zhao, Shuo Li, Qi Li, Ke Xu, Mingwei Xu, Zhuotao Liu
-
Théo Lasnier, Wissam Antoun, Francis Kulumba, Djamé Seddah
-
Frank Xiao, Santiago Aranguri
-
Mitigating Gradient Inversion Risks in Language Models via Token Obfuscation
Xinguo Feng, Zhongkui Ma, Zihan Wang, Alsharif Abuadbba, Guangdong Bai
-
Embedding Inversion via Conditional Masked Diffusion Language Models
Han Xiao
-
Soham Bakshi, Sunrit Chakraborty
-
When Fusion Helps and When It Breaks: View-Aligned Robustness in Same-Source Financial Imaging
Rui Ma
-
Not-in-Perspective: Towards Shielding Google's Perspective API Against Adversarial Negation Attacks
Michail S. Alexiou, J. Sukarno Mertoguno
-
Hayfa Dhabhi, Kashyap Thimmaraju
-
Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
J Rosser, Robert Kirk, Edward Grefenstette, Jakob Foerster, Laura Ruis
-
Steer2Edit: From Activation Steering to Component-Level Editing
Chung-En Sun, Ge Yan, Zimo Wang, Tsui-Wei Weng
-
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies
Chenxu Wang, Chaozhuo Li, Songyang Liu, Zejian Chen, Jinyu Hou, Ji Qi, Rui Li, Litian Zhang, Qiwei Ye, Zheng Liu, Xu Chen, Xi Zhang, Philip S. Yu
-
Robust Vision Systems for Connected and Autonomous Vehicles: Security Challenges and Attack Vectors
Sandeep Gupta, Roberto Passerone
-
Perception with Guarantees: Certified Pose Estimation via Reachability Analysis
Tobias Ladner, Yasser Shoukry, Matthias Althoff
-
Gaurang Sharma, Harri Polonen, Juha Pajula, Jutta Suksi, Jussi Tohka
-
Xinwei Zhang, Li Bai, Tianwei Zhang, Youqian Zhang, Qingqing Ye, Yingnan Zhao, Ruochen Du, Haibo Hu
-
Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation
Michael Zuo, Inwon Kang, Stacy Patterson, Oshani Seneviratne
-
Online Learning in MDPs with Partially Adversarial Transitions and Losses
Ofir Schlisselberg, Tal Lancewicki, Yishay Mansour
-
Towards Poisoning Robustness Certification for Natural Language Generation
Mihnea Ghitu, Matthew Wicker
-
Tracking Finite-Time Lyapunov Exponents to Robustify Neural ODEs
Tobias Wöhrer, Christian Kuehn
-
Linear Model Extraction via Factual and Counterfactual Queries
Daan Otto, Jannis Kurtz, Dick den Hertog, Ilker Birbil
-
Robust Processing and Learning: Principles, Methods, and Wireless Applications
Shixiong Wang, Wei Dai, Li-Chun Wang, Geoffrey Ye Li
-
Evaluating Disentangled Representations for Controllable Music Generation
Laura Ibáñez-Martínez, Chukwuemeka Nkama, Andrea Poltronieri, Xavier Serra, Martín Rocamora
-
Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation
Zhisheng Qi, Utkarsh Sahu, Li Ma, Haoyu Han, Ryan Rossi, Franck Dernoncourt, Mahantesh Halappanavar, Nesreen Ahmed, Yushun Dong, Yue Zhao, Yu Zhang, Yu Wang
-
Privacy Amplification for BandMF via $b$-Min-Sep Subsampling
Andy Dong, Arun Ganesh
-
Parallel Composition for Statistical Privacy
Dennis Breutigam, Rüdiger Reischuk
-
Trustworthy Agentic AI Requires Deterministic Architectural Boundaries
Manish Bhattarai, Minh Vu
-
Not-in-Perspective: Towards Shielding Google's Perspective API Against Adversarial Negation Attacks
Michail S. Alexiou, J. Sukarno Mertoguno
-
AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models
Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang
-
Hayfa Dhabhi, Kashyap Thimmaraju
-
Hybrid Responsible AI-Stochastic Approach for SLA Compliance in Multivendor 6G Networks
Emanuel Figetakis, Ahmed Refaey Hussein
-
Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions
J Rosser, Robert Kirk, Edward Grefenstette, Jakob Foerster, Laura Ruis
-
Steer2Edit: From Activation Steering to Component-Level Editing
Chung-En Sun, Ge Yan, Zimo Wang, Tsui-Wei Weng
-
The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies
Chenxu Wang, Chaozhuo Li, Songyang Liu, Zejian Chen, Jinyu Hou, Ji Qi, Rui Li, Litian Zhang, Qiwei Ye, Zheng Liu, Xu Chen, Xi Zhang, Philip S. Yu
-
Robust Vision Systems for Connected and Autonomous Vehicles: Security Challenges and Attack Vectors
Sandeep Gupta, Roberto Passerone
-
Perception with Guarantees: Certified Pose Estimation via Reachability Analysis
Tobias Ladner, Yasser Shoukry, Matthias Althoff
-
Gaurang Sharma, Harri Polonen, Juha Pajula, Jutta Suksi, Jussi Tohka
-
Xinwei Zhang, Li Bai, Tianwei Zhang, Youqian Zhang, Qingqing Ye, Yingnan Zhao, Ruochen Du, Haibo Hu
-
Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation
Michael Zuo, Inwon Kang, Stacy Patterson, Oshani Seneviratne
-
Statistical Roughness-Informed Machine Unlearning
Mohammad Partohaghighi, Roummel Marcia, Bruce J. West, YangQuan Chen
-
Online Learning in MDPs with Partially Adversarial Transitions and Losses
Ofir Schlisselberg, Tal Lancewicki, Yishay Mansour
-
Towards Poisoning Robustness Certification for Natural Language Generation
Mihnea Ghitu, Matthew Wicker
-
Tracking Finite-Time Lyapunov Exponents to Robustify Neural ODEs
Tobias Wöhrer, Christian Kuehn
-
Linear Model Extraction via Factual and Counterfactual Queries
Daan Otto, Jannis Kurtz, Dick den Hertog, Ilker Birbil
-
Robust Processing and Learning: Principles, Methods, and Wireless Applications
Shixiong Wang, Wei Dai, Li-Chun Wang, Geoffrey Ye Li
-
Evaluating Disentangled Representations for Controllable Music Generation
Laura Ibáñez-Martínez, Chukwuemeka Nkama, Andrea Poltronieri, Xavier Serra, Martín Rocamora
-
Benchmarking Knowledge-Extraction Attack and Defense on Retrieval-Augmented Generation
Zhisheng Qi, Utkarsh Sahu, Li Ma, Haoyu Han, Ryan Rossi, Franck Dernoncourt, Mahantesh Halappanavar, Nesreen Ahmed, Yushun Dong, Yue Zhao, Yu Zhang, Yu Wang
-
Privacy Amplification for BandMF via $b$-Min-Sep Subsampling
Andy Dong, Arun Ganesh
-
Parallel Composition for Statistical Privacy
Dennis Breutigam, Rüdiger Reischuk
-
Trustworthy Agentic AI Requires Deterministic Architectural Boundaries
Manish Bhattarai, Minh Vu
-
Data Sharing with Endogenous Choices over Differential Privacy Levels
Raef Bassily, Kate Donahue, Diptangshu Sen, Annuo Zhao, Juba Ziani
-
The Hidden Costs of Domain Fine-Tuning: Pii-Bearing Data Degrades Safety and Increases Leakage
Jayesh Choudhari, Piyush Kumar Singh
-
NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels
Junfeng Fang, Nachuan Chen, Houcheng Jiang, Dan Zhang, Fei Shen, Xiang Wang, Xiangnan He, Tat-Seng Chua
-
Steer2Edit: From Activation Steering to Component-Level Editing
Chung-En Sun, Ge Yan, Zimo Wang, Tsui-Wei Weng
-
What do Geometric Hallucination Detection Metrics Actually Measure?
Eric Yeats, John Buckheit, Sarah Scullen, Brendan Kennedy, Loc Truong, Davis Brown, Bill Kay, Cliff Joslyn, Tegan Emerson, Michael J. Henry, John Emanuello, Henry Kvinge
-
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
Georgios Syros, Evan Rose, Brian Grinstead, Christoph Kerschbaumer, William Robertson, Cristina Nita-Rotaru, Alina Oprea
-
Feature salience -- not task-informativeness -- drives machine learning model explanations
Benedict Clark, Marta Oliveira, Rick Wilming, Stefan Haufe
-
One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning
Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi
-
Scott Thornton
-
What do Geometric Hallucination Detection Metrics Actually Measure?
Eric Yeats, John Buckheit, Sarah Scullen, Brendan Kennedy, Loc Truong, Davis Brown, Bill Kay, Cliff Joslyn, Tegan Emerson, Michael J. Henry, John Emanuello, Henry Kvinge
-
MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks
Georgios Syros, Evan Rose, Brian Grinstead, Christoph Kerschbaumer, William Robertson, Cristina Nita-Rotaru, Alina Oprea
-
RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
Matthias Templ, Oscar Thees, Roman Müller
-
Generalizing GNNs with Tokenized Mixture of Experts
Xiaoguang Guo, Zehong Wang, Jiazheng Li, Shawn Spitzel, Qi Yang, Kaize Ding, Jundong Li, Chuxu Zhang
-
One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning
Kotekar Annapoorna Prabhu, Andrew Gan, Zahra Ghodsi
-
Scott Thornton
-
Ziwei Wang, Yuanhe Zhang, Jing Chen, Zhenhong Zhou, Ruichao Liang, Ruiying Du, Ju Jia, Cong Wu, Yang Liu
-
Yuhang Wang, Feiming Xu, Zheng Lin, Guangyu He, Yuzhe Huang, Haichang Gao, Zhenxing Niu, Shiguo Lian, Zhaoxiang Liu
-
Igor Santos-Grueiro
-
Debate is efficient with your time
Jonah Brown-Cohen, Geoffrey Irving, Simon C. Marshall, Ilan Newman, Georgios Piliouras, Mario Szegedy
-
Longling Geng, Andy Ouyang, Theodore Wu, Daphne Barretto, Matthew John Hayes, Rachael Cooper, Yuqiao Zeng, Sameer Vijay, Gia Ancone, Ankit Rai, Matthew Wolfman, Patrick Flanagan, Edward Y. Chang
-
Generating Adversarial Events: A Motion-Aware Point Cloud Framework
Hongwei Ren, Youxin Jiang, Qifei Gu, Xiangqian Wu
-
Jaylen Jones, Zhehao Zhang, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun
-
Grokking in Linear Models for Logistic Regression
Nataraj Das, Atreya Vedantam, Chandrashekar Lakshminarayanan
-
Reinforcement Learning with Backtracking Feedback
Bilgehan Sel, Vaishakh Keshava, Phillip Wallis, Lukas Rutishauser, Ming Jin, Dingcheng Li
-
Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs
Ahmed Salem, Andrew Paverd, Sahar Abdelnabi
-
Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs
Yukun Jiang, Hai Huang, Mingjie Li, Yage Zhang, Michael Backes, Yang Zhang
-
StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors
Suraj Ranganath, Atharv Ramesh
-
Is Reasoning Capability Enough for Safety in Long-Context Language Models?
Yu Fu, Haz Sameen Shahgir, Huanli Gong, Zhipeng Wei, N. Benjamin Erichson, Yue Dong
-
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
Yuting Ning, Jaylen Jones, Zhehao Zhang, Chentao Ye, Weitong Ruan, Junyi Li, Rahul Gupta, Huan Sun
-
Paradox of De-identification: A Critique of HIPAA Safe Harbour in the Age of LLMs
Lavender Y. Jiang, Xujin Chris Liu, Kyunghyun Cho, Eric K. Oermann
-
Distribution-Free Robust Functional Predict-Then-Optimize
Yash Patel, Ambuj Tewari
-
RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks
Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel
-
Learning Credal Ensembles via Distributionally Robust Optimization
Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez
-
Projected Gradient Ascent for Efficient Reward-Guided Updates with One-Step Generative Models
Jisung Hwang, Minhyuk Sung
-
Xiaotong Liu, Shao-Bo Lin, Jun Fan, Ding-Xuan Zhou
-
Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks
Yanzhang Fu, Zizheng Guo, Jizhou Luo
-
Data Reconstruction: Identifiability and Optimization with Sample Splitting
Yujie Shen, Zihan Wang, Jian Qian, Qi Lei
-
Stress-Testing Alignment Audits With Prompt-Level Strategic Deception
Oliver Daniels, Perusha Moodley, Ben Marlin, David Lindner
-
Distributionally Robust Optimization via Generative Ambiguity Modeling
Jiaqi Wen, Jianyi Yang
-
Evasion of IoT Malware Detection via Dummy Code Injection
Sahar Zargarzadeh, Mohammad Islam
-
Large Language Lobotomy: Jailbreaking Mixture-of-Experts via Expert Silencing
Jona te Lintelo, Lichao Wu, Stjepan Picek
-
Cyclic Adaptive Private Synthesis for Sharing Real-World Data in Education
Hibiki Ito, Chia-Yu Hsu, Hiroaki Ogata
-
Xiaoxu Peng, Dong Zhou, Jianwen Zhang, Guanghui Sun, Anh Tu Ngo, Anupam Chattopadhyay
-
Feature salience -- not task-informativeness -- drives machine learning model explanations
Benedict Clark, Marta Oliveira, Rick Wilming, Stefan Haufe
-
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
Chaeyun Kim, YongTaek Lim, Kihyun Kim, Junghwan Kim, Minwoo Kim
-
What Is the Geometry of the Alignment Tax?
Robin Young
-
Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving
Jungwon Seo, Ferhat Ozgur Catak, Chunming Rong, Jaeyeon Jang
-
Selective Fine-Tuning for Targeted and Robust Concept Unlearning
Mansi, Avinash Kori, Francesca Toni, Soteris Demetriou
-
Rui Li, Zeyu Zhang, Xiaohe Bo, Quanyu Dai, Chaozhuo Li, Feng Wen, Xu Chen
-
Structure-Aware Robust Counterfactual Explanations via Conditional Gaussian Network Classifiers
Zhan-Yi Liao, Jaewon Yoo, Hao-Tsung Yang, Po-An Chen
-
Majid Ghasemi, Mark Crowley
-
Liying Wang, Madison Lee, Yunzhang Jiang, Steven Chen, Kewei Sha, Yunhe Feng, Frank Wong, Lisa Hightow-Weidman, Weichao Yuwen
-
Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model
Tianyi Wang, Huawei Fan, Yuanchao Shu, Peng Cheng, Cong Wang
-
Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms
Vaibhav Shukla, Hardik Sharma, Adith N Reganti, Soham Wasmatkar, Bagesh Kumar, Vrijendra Singh
-
Boyang Xia, Weiyou Tian, Qingnan Ren, Jiaqi Huang, Jie Xiao, Shuo Lu, Kai Wang, Lynn Ai, Eric Yang, Bill Shi
-
Robustness of Vision Language Models Against Split-Image Harmful Input Attacks
Md Rafi Ur Rashid, MD Sadik Hossain Shanto, Vishnu Asutosh Dasu, Shagufta Mehnaz
-
The Judge Who Never Admits: Hidden Shortcuts in LLM-based Evaluation
Arash Marioriyad, Omid Ghahroodi, Ehsaneddin Asgari, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah
-
Deepfake Synthesis vs. Detection: An Uneven Contest
Md. Tarek Hasan, Sanjay Saha, Shaojing Fan, Swakkhar Shatabda, Terence Sim
-
Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation
Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi, Marco Canini
-
Liisa Janssens, Laura Middeldorp
-
CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution
Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, Tomas Pfister
-
Privacy-Preserving Covert Communication Using Encrypted Wearable Gesture Recognition
Tasnia Ashrafi Heya, Sayed Erfan Arefin
-
Anonymization-Enhanced Privacy Protection for Mobile GUI Agents: Available but Invisible
Lepeng Zhao, Zhenhua Zou, Shuo Li, Zhuotao Liu
-
UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding
Jiaming He, Fuming Luo, Hongwei Li, Wenbo Jiang, Wenshu Fan, Zhenbo Shi, Xudong Jiang, Yi Yu
-
NAAMSE: Framework for Evolutionary Security Evaluation of Agents
Kunal Pai, Parth Shah, Harshil Patel
-
Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought?
Alexander von Recum, Leander Girrbach, Zeynep Akata
-
UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding
Jiaming He, Fuming Luo, Hongwei Li, Wenbo Jiang, Wenshu Fan, Zhenbo Shi, Xudong Jiang, Yi Yu
-
NAAMSE: Framework for Evolutionary Security Evaluation of Agents
Kunal Pai, Parth Shah, Harshil Patel
-
Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought?
Alexander von Recum, Leander Girrbach, Zeynep Akata
-
AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management
Ruoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang
-
MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots
Yuhao Wang, Shengfang Zhai, Guanghao Jin, Yinpeng Dong, Linyi Yang, Jiaheng Zhang
-
Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents
Sai Puppala, Ismail Hossain, Md Jahangir Alam, Yoonpyo Lee, Jay Yoo, Tanzim Ahad, Syed Bahauddin Alam, Sajedul Talukder
-
Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation
Jiangnan Fang, Cheng-Tse Liu, Hanieh Deilamsalehy, Nesreen K. Ahmed, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi
-
Incorruptible Neural Networks: Training Models that can Generalize to Large Internal Perturbations
Philip Jacobson, Ben Feinberg, Suhas Kumar, Sapan Agarwal, T. Patrick Xiao, Christopher Bennett
-
Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control
Yonghui Yang, Wenjian Tao, Jilong Liu, Xingyu Zhu, Junfeng Fang, Weibiao Huang, Le Wu, Richang Hong, Tat-Sent Chua
-
ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
Bohdan Turbal, Iryna Voitsitska, Lesia Semenova
-
On Generation in Metric Spaces
Jiaxun Li, Vinod Raman, Ambuj Tewari
-
Aegis: Towards Governance, Integrity, and Security of AI Voice Agents
Xiang Li, Pin-Yu Chen, Wenqi Wei
-
Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook
Yunbei Zhang, Kai Mei, Ming Liu, Janet Wang, Dimitris N. Metaxas, Xiao Wang, Jihun Hamm, Yingqiang Ge
-
Cheol Woo Kim, Davin Choo, Tzeh Yuan Neoh, Milind Tambe
-
Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks
Yu-Che Tsai, Hsiang Hsiao, Kuan-Yu Chen, Shou-De Lin
-
Extended to Reality: Prompt Injection in 3D Environments
Zhuoheng Li, Ying Chen
-
ShallowJail: Steering Jailbreaks against Large Language Models
Shang Liu, Hanyu Pei, Zeyan Liu
-
BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron
Abdullah Arafat Miah, Kevin Vu, Yu Bi
-
The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models
Haley Duba-Sullivan, Steven R. Young, Emma J. Reid
-
Temperature Scaling Attack Disrupting Model Confidence in Federated Learning
Kichang Lee, Jaeho Jin, JaeYeon Park, Songkuk Kim, JeongGil Ko
-
Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds
Rawisara Lohanimit, Yankun Wu, Amelia Katirai, Yuta Nakashima, Noa Garcia
-
Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting
Joshua Ward, Chi-Hua Wang, Guang Cheng
-
Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit
Qi Sun, Ahmed Abdo, Luis Burbano, Ziyang Li, Yaxing Yao, Alvaro Cardenas, Yinzhi Cao
-
Refining the Information Bottleneck via Adversarial Information Separation
Shuai Ning, Zhenpeng Wang, Lin Wang, Bing Chen, Shuangrong Liu, Xu Wu, Jin Zhou, Bo Yang
-
Robustness Beyond Known Groups with Low-rank Adaptation
Abinitha Gourabathina, Hyewon Jeong, Teya Bergamaschi, Marzyeh Ghassemi, Collin Stultz
-
Trojans in Artificial Intelligence (TrojAI) Final Report
Kristopher W. Reese, Taylor Kulp-McDowall, Michael Majurski, Tim Blattner, Derek Juba, Peter Bajcsy, Antonio Cardone, Philippe Dessauw, Alden Dima, Anthony J. Kearsley, Melinda Kleczynski, Joel Vasanth, Walid Keyrouz, Chace Ashcraft, Neil Fendley, Ted Staley, Trevor Stout, Josh Carney, Greg Canal, Will Redman, Aurora Schmidt, Cameron Hickert, William Paul, Jared Markowitz, Nathan Drenkow, David Shriver, Marissa Connor, Keltin Grimes, Marco Christiani, Hayden Moore, Jordan Widjaja, Kasimir Gabert, Uma Balakrishnan, Satyanadh Gundimada, John Jacobellis, Sandya Lakkur, Vitus Leung, Jon Roose, Casey Battaglino, Farinaz Koushanfar, Greg Fields, Xihe Gu, Yaman Jandali, Xinqiao Zhang, Akash Vartak, Tim Oates, Ben Erichson, Michael Mahoney, Rauf Izmailov, Xiangyu Zhang, Guangyu Shen, Siyuan Cheng, Shiqing Ma, XiaoFeng Wang, Haixu Tang, Di Tang, Xiaoyi Chen, Zihao Wang, Rui Zhu, Susmit Jha, Xiao Lin, Manoj Acharya, Wenchao Li, Chao Chen
-
Lite-BD: A Lightweight Black-box Backdoor Defense via Reviving Multi-Stage Image Transformations
Abdullah Arafat Miah, Yu Bi
-
Generating High-quality Privacy-preserving Synthetic Data
David Yavo, Richard Khoury, Christophe Pere, Sadoune Ait Kaci Azzou
-
Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers
Mona Rajhans, Vishal Khawarey
-
TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
Sung-Hoon Yoon, Ruizhi Qian, Minda Zhao, Weiyue Li, Mengyu Wang
-
Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study
Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Leo Yu Zhang
-
LIBERO-X: Robustness Litmus for Vision-Language-Action Models
Guodong Wang, Chenkai Zhang, Qingjie Liu, Jinjin Zhang, Jiancheng Cai, Junjie Liu, Xinmin Liu
-
Perturbing the Phase: Analyzing Adversarial Robustness of Complex-Valued Neural Networks
Florian Eilers, Christof Duhme, Xiaoyi Jiang
-
Exploring Sparsity and Smoothness of Arbitrary $\ell_p$ Norms in Adversarial Attacks
Christof Duhme, Florian Eilers, Xiaoyi Jiang
-
AEGIS: Adversarial Target-Guided Retention-Data-Free Robust Concept Erasure from Diffusion Models
Fengpeng Li, Kemou Li, Qizhou Wang, Bo Han, Jiantao Zhou
-
TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering
Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla
-
Endogenous Resistance to Activation Steering in Language Models
Alex McKenzie, Keenan Pepper, Stijn Servaes, Martin Leitgab, Murat Cubuktepe, Mike Vaiana, Diogo de Lucena, Judd Rosenblatt, Michael S. A. Graziano
-
MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs
Junhyeok Lee, Han Jang, Kyu Sung Choi
-
Do Prompts Guarantee Safety? Mitigating Toxicity from LLM Generations through Subspace Intervention
Himanshu Singh, Ziwei Xu, A. V. Subramanyam, Mohan Kankanhalli
-
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
Mingqian Feng, Xiaodong Liu, Weiwei Yang, Jialin Song, Xuekai Zhu, Chenliang Xu, Jianfeng Gao
-
Exploring Specular Reflection Inconsistency for Generalizable Face Forgery Detection
Hongyan Fei, Zexi Jia, Chuanwei Huang, Jinchao Zhang, Jie Zhou
-
Universal Anti-forensics Attack against Image Forgery Detection via Multi-modal Guidance
Haipeng Li, Rongxuan Peng, Anwei Luo, Shunquan Tan, Changsheng Chen, Anastasia Antsiferova
-
Vinh Hoang, Sebastian Krumscheid, Holger Rauhut, Raúl Tempone
-
Adversarial Learning in Games with Bandit Feedback: Logarithmic Pure-Strategy Maximin Regret
Shinji Ito, Haipeng Luo, Arnab Maiti, Taira Tsuchiya, Yue Wu
-
Sajad Ashkezari
-
Confundo: Learning to Generate Robust Poison for Practical RAG Systems
Haoyang Hu, Zhejun Jiang, Yueming Lyu, Junyuan Zhang, Yi Liu, Ka-Ho Chow
-
Ying Song, Balaji Palanisamy
-
Identifying Adversary Tactics and Techniques in Malware Binaries with an LLM Agent
Zhou Xuan, Xiangzhe Xu, Mingwei Zheng, Louis Zheng-Hua Tan, Jinyao Guo, Tiantai Zhang, Le Yu, Chengpeng Wang, Xiangyu Zhang
-
Guowei Guan, Yurong Hao, Jiaming Zhang, Tiantong Wu, Fuyao Zhang, Tianxiang Chen, Longtao Huang, Cyril Leung, Wei Yang Bryan Lim
-
Subgraph Reconstruction Attacks on Graph RAG Deployments with Practical Defenses
Minkyoo Song, Jaehan Kim, Myungchul Kang, Hanna Kim, Seungwon Shin, Sooel Son
-
Hema Karnam Surendrababu (1), Nithin Nagaraj (1) ((1) National Institute of Advanced Studies, Indian Institute of Science Campus, Bengaluru, India)
-
TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking
Mengyao Du, Han Fang, Haokai Ma, Gang Yang, Quanjun Yin, Shouling Ji, Ee-Chien Chang
-
"Tab, Tab, Bug'': Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs
Yunlong Lyu, Yixuan Tang, Peng Chen, Tian Dong, Xinyu Wang, Zhiqiang Dong, Hao Chen
-
Plato's Form: Toward Backdoor Defense-as-a-Service for LLMs with Prototype Representations
Chen Chen, Yuchen Sun, Jiaxin Gao, Yanwen Jia, Xueluan Gong, Qian Wang, Kwok-Yan Lam
-
An Adaptive Differentially Private Federated Learning Framework with Bi-level Optimization
Jin Wang, Hui Ma, Fei Xing, Ming Yan
-
Can LLM Safety Be Ensured by Constraining Parameter Regions?
Zongmin Li, Jian Su, Farah Benamara, Aixin Sun
-
ATEX-CF: Attack-Informed Counterfactual Explanations for Graph Neural Networks
Yu Zhang, Sean Bin Yang, Arijit Khan, Cuneyt Gurcan Akcora
-
REBEL: Hidden Knowledge Recovery via Evolutionary-Based Evaluation Loop
Patryk Rybak, Paweł Batorski, Paul Swoboda, Przemysław Spurek
-
Navita Goyal, Hal Daumé III
-
GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt
Mark Russinovich, Yanan Cai, Keegan Hines, Giorgio Severi, Blake Bullwinkel, Ahmed Salem
-
Private and interpretable clinical prediction with quantum-inspired tensor train models
José Ramón Pareja Monturiol, Juliette Sinnott, Roger G. Melko, Mohammad Kohandel
-
$f$-FUM: Federated Unlearning via min--max and $f$-divergence
Radmehr Karimian, Amirhossein Bagheri, Meghdad Kurmanji, Nicholas D. Lane, Gholamali Aminian
-
Algebraic Robustness Verification of Neural Networks
Yulia Alexandr, Hao Duan, Guido Montúfar
-
Copyright Detective: A Forensic System to Evidence LLMs Flickering Copyright Leakage Risks
Guangwei Zhang, Jianing Zhu, Cheng Qian, Neil Gong, Rada Mihalcea, Zhaozhuo Xu, Jingrui He, Jiaqi Ma, Yun Huang, Chaowei Xiao, Bo Li, Ahmed Abbasi, Dongwon Lee, Heng Ji, Denghui Zhang
-
Learning Where It Matters: Geometric Anchoring for Robust Preference Alignment
Youngjae Cho, Jongsuk Kim, Ji-Hoon Kim
-
Reliable and Responsible Foundation Models: A Comprehensive Survey
Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas Zollo, Vikash Sehwag, Jixuan Leng, Jiuhai Chen, Yuxin Wen, Huan Zhang, Zhun Deng, Linjun Zhang, Pavel Izmailov, Pang Wei Koh, Yulia Tsvetkov, Andrew Wilson, Jiaheng Zhang, James Zou, Cihang Xie, Hao Wang, Philip Torr, Julian McAuley, David Alvarez-Melis, Florian Tramèr, Kaidi Xu, Suman Jana, Chris Callison-Burch, Rene Vidal, Filippos Kokkinos, Mohit Bansal, Beidi Chen, Huaxiu Yao
-
From Helpfulness to Toxic Proactivity: Diagnosing Behavioral Misalignment in LLM Agents
Xinyue Wang, Yuanhe Zhang, Zhengshuo Gong, Haoran Gao, Fanyu Meng, Zhenhong Zhou, Li Sun, Yang Liu, Sen Su
-
RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
Zeming Wei, Qiaosheng Zhang, Xia Hu, Xingcheng Xu
-
How Few-shot Demonstrations Affect Prompt-based Defenses Against LLM Jailbreak Attacks
Yanshu Wang, Shuaishuai Yang, Jingjing He, Tong Yang
-
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
Jiacheng Liang, Yuhui Wang, Tanqiu Jiang, Ting Wang
-
A Human-Centered Privacy Approach (HCP) to AI
Luyi Sun, Wei Xu, Zaifeng Gao
-
Casey Ford, Madison Van Doren, Emily Dix
-
Vishruti Kakkad (1), Paul Chung (2), Hanan Hibshi (1 and 3), Maverick Woo (1) ((1) Carnegie Mellon University, (2) University of California, San Diego, (3) King Abdulaziz University)
-
Farzia Hossain, Samanta Ghosh, Shahida Begum, B. M. Shahria Alam, Mohammad Tahmid Noor, Md Parvez Mia, Nishat Tasnim Niloy
-
Youngji Roh, Hyunjin Cho, Jaehyung Kim
-
Tianyu Chen, Chujia Hu, Ge Gao, Dongrui Liu, Xia Hu, Wenjie Wang
-
CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs
Yuxuan Liu, Yuntian Shi, Kun Wang, Haoting Shen, Kun Yang
-
Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility
Mengxuan Wang, Yuxin Chen, Gang Xu, Tao He, Hongjie Jiang, Ming Li
-
TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System
Wenzhe Fan, Tommaso Tognoli, Henry Peng Zou, Chunyu Miao, Yibo Wang, Xinhua Zhang
-
The Trigger in the Haystack: Extracting and Reconstructing LLM Backdoor Triggers
Blake Bullwinkel, Giorgio Severi, Keegan Hines, Amanda Minnich, Ram Shankar Siva Kumar, Yonatan Zunger
-
R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
Jingyi Zhang, Tianyi Lin, Huanjin Yao, Xiang Lan, Shunyu Liu, Jiaxing Huang
-
Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, Albert No
-
Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Luoyu Chen, Shui Yu
-
APEX: Probing Neural Networks via Activation Perturbation
Tao Ren, Xiaoyu Luo, Qiongxiu Li
-
WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents
Xilong Wang, Yinuo Liu, Zhun Wang, Dawn Song, Neil Gong
-
Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective
Hao Fang, Tianyi Zhang, Tianqu Zhuang, Jiawei Kong, Kuofeng Gao, Bin Chen, Leqi Liang, Shu-Tao Xia, Ke Xu
-
Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation
Ting Xiang, Jinhui Zhao, Changjian Chen, Zhuo Tang
-
Haoran Li, Renyang Liu, Hongjia Liu, Chen Wang, Long Yin, Jian Xu
-
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
Yi Yu, Qixin Zhang, Shuhan Ye, Xun Lin, Qianshan Wei, Kun Wang, Wenhan Yang, Dacheng Tao, Xudong Jiang
-
Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity
Jiao Sun
-
Prakhar Godara, Frederick Callaway, Marcelo G. Mattar
-
Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models
Hicham Eddoubi, Umar Faruk Abdullahi, Fadi Hassan
-
BlockRR: A Unified Framework of RR-type Algorithms for Label Differential Privacy
Haixia Liu, Yi Ding
-
From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity
Renaud Gaucher, Aymeric Dieuleveut, Hadrien Hendrikx
-
Most Convolutional Networks Suffer from Small Adversarial Perturbations
Amit Daniely, Idan Mehalel
-
Bixing Wu, Yuhong Zhao, Zongli Ye, Jiachen Lian, Xiangyu Yue, Gopala Anumanchipalli
-
SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network
Cristian Manca, Christian Scano, Giorgio Piras, Fabio Brau, Maura Pintor, Battista Biggio
-
Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense
Fatima Ezzeddine, Osama Zammar, Silvia Giordano, Omran Ayoub
-
Robust Intervention Learning from Emergency Stop Interventions
Ethan Pronovost, Khimya Khetarpal, Siddhartha Srinivasa
-
DF-LoGiT: Data-Free Logic-Gated Backdoor Attacks in Vision Transformers
Xiaozuo Shen, Yifei Cai, Rui Ning, Chunsheng Xin, Hongyi Wu
-
Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Chaowei Xiao
-
SEW: Strengthening Robustness of Black-box DNN Watermarking via Specificity Enhancement
Huming Qiu, Mi Zhang, Junjie Sun, Peiyi Chen, Xiaohan Zhang, Min Yang
-
Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks
Yi Yu, Qixin Zhang, Shuhan Ye, Xun Lin, Qianshan Wei, Kun Wang, Wenhan Yang, Dacheng Tao, Xudong Jiang
-
Hao Li, Ruoyao Wen, Shanghao Shi, Ning Zhang, Chaowei Xiao
-
When AI Persuades: Adversarial Explanation Attacks on Human Trust in AI-Assisted Decision Making
Shutong Fan, Lan Zhang, Xiaoyong Yuan
-
Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Gilles Bareilles, Wassim Bouaziz, Julien Fageot, El-Mahdi El-Mhamdi
-
Principles of Lipschitz continuity in neural networks
Róisín Luo
-
SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network
Cristian Manca, Christian Scano, Giorgio Piras, Fabio Brau, Maura Pintor, Battista Biggio
-
MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang
-
Adversarial Reward Auditing for Active Detection and Mitigation of Reward Hacking
Mohammad Beigi, Ming Jin, Junshan Zhang, Qifan Wang, Lifu Huang
-
Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron
Sicheng Shen, Mingyang Lv, Han Shen, Jialin Wu, Binghao Wang, Zhou Yang, Guobin Shen, Dongcheng Zhao, Feifei Zhao, Yi Zeng
-
On the Fragility of AI-Based Channel Decoders under Small Channel Perturbations
Haoyu Lei, Mohammad Jalali, Chin Wa Lau, Farzan Farnia
-
Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment
Zehua Cheng, Jianwei Yang, Wei Dai, Jiahao Sun
-
Efficient Adversarial Attacks on High-dimensional Offline Bandits
Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah
-
Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory
Langyuan Cui, Chun Kai Ling, Hwee Tou Ng
-
Bingzheng Wang, Xiaoyan Gu, Hongbo Xu, Hongcheng Li, Zimo Yu, Jiang Zhou, Weiping Wang
-
RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok-Yan Lam
-
Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework
Alsharif Abuadbba, Nazatul Sultan, Surya Nepal, Sanjay Jha
-
FiLoRA: Focus-and-Ignore LoRA for Controllable Feature Reliance
Hyunsuk Chung, Caren Han, Yerin Choi, Seungyeon Ji, Jinwoo Kim, Eun-Jung Holden, Kyungreem Han
-
RACA: Representation-Aware Coverage Criteria for LLM Safety Testing
Zeming Wei, Zhixin Zhang, Chengcan Wu, Yihao Zhang, Xiaokun Luan, Meng Sun
-
Decoupling Generalizability and Membership Privacy Risks in Neural Networks
Xingli Fang, Jung-Eun Kim
-
David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning
Samuel Nellessen, Tal Kachman
-
Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings
Doohyun Kim, Donghwa Kang, Kyungjae Lee, Hyeongboo Baek, Brent Byunghoon Kang
-
Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs
Yen-Shan Chen, Zhi Rui Tam, Cheng-Kuang Wu, Yun-Nung Chen
-
Pengyu Li, Lingling Zhang, Zhitao Gao, Yanrui Wu, Yuxuan Dong, Huan Liu, Bifan Wei, Jun Liu
-
Haobo Wang, Weiqi Luo, Xiaojun Jia, Xiaochun Cao
-
Zeyan Wang, Zhengmao Liu, Yongxin Cai, Chi Li, Xiaoying Tang, Jingchao Chen, Zibin Pan, Jing Qiu
-
On Stability and Robustness of Diffusion Posterior Sampling for Bayesian Inverse Problems
Yiming Yang, Xiaoyuan Cheng, Yi He, Kaiyu Li, Wenxuan Yuan, Zhuo Sun
-
AICD Bench: A Challenging Benchmark for AI-Generated Code Detection
Daniil Orel, Dilshod Azizov, Indraneil Paul, Yuxia Wang, Iryna Gurevych, Preslav Nakov
-
Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents
Pengfei He, Ash Fox, Lesly Miculicich, Stefan Friedli, Daniel Fabian, Burak Gokturk, Jiliang Tang, Chen-Yu Lee, Tomas Pfister, Long T. Le
-
Alignment-Aware Model Adaptation via Feedback-Guided Optimization
Gaurav Bhatt, Aditya Chinchure, Jiawei Zhou, Leonid Sigal
-
Embedding Perturbation may Better Reflect the Uncertainty in LLM Reasoning
Qihao Wen, Jiahao Wang, Yang Nan, Pengfei He, Ravi Tandon, Han Xu
-
Making Bias Non-Predictive: Training Robust LLM Judges via Reinforcement Learning
Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen, Dawn Song, Bingsheng He
-
Privacy Amplification by Missing Data
Simon Roburin (LPSM), Rafaël Pinot (LPSM ), Erwan Scornet (LPSM )
-
Witnessd: Proof-of-process via Adversarial Collapse
David Condrey
-
HPE: Hallucinated Positive Entanglement for Backdoor Attacks in Federated Self-Supervised Learning
Jiayao Wang, Yang Song, Zhendong Zhao, Jiale Zhang, Qilin Wu, Wenliang Yuan, Junwu Zhu, Dongfang Zhao
-
Guaranteeing Privacy in Hybrid Quantum Learning through Theoretical Mechanisms
Hoang M. Ngo, Tre' R. Jeter, Incheol Shin, Wanli Xing, Tamer Kahveci, My T. Thai
-
Ali Mahdavi, Santa Aghapour, Azadeh Zamanifar, Amirfarhad Farhadi
-
Recommender system in X inadvertently profiles ideological positions of users
Paul Bouchaud, Pedro Ramaciotti
-
Monotonicity as an Architectural Bias for Robust Language Models
Patrick Cooper, Alireza Nadali, Ashutosh Trivedi, Alvaro Velasquez
-
Evaluating False Alarm and Missing Attacks in CAN IDS
Nirab Hossain, Pablo Moriano
-
Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space
Sidahmed Benabderrahmane, Petko Valtchev, James Cheney, Talal Rahwan
-
From Task Solving to Robust Real-World Adaptation in LLM Agents
Pouya Pezeshkpour, Estevam Hruschka
-
CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment
Zhengbang Yang, Yisheng Zhong, Junyuan Hong, Zhuangdi Zhu
-
Wenqi Guo, Shan Du
-
Learning Better Certified Models from Empirically-Robust Teachers
Alessandro De Palma
-
Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks
Bohan Wang, Zewen Liu, Lu Lin, Hui Liu, Li Xiong, Ming Jin, Wei Jin
-
Membership Inference Attacks from Causal Principles
Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet
-
Composition for Pufferfish Privacy
Jiamu Bai, Guanlin He, Xin Gu, Daniel Kifer, Kiwan Maeng
-
A Comparative Study of Adversarial Robustness in CNN and CNN-ANFIS Architectures
Kaaustaaub Shankar, Bharadwaj Dogga, Kelly Cohen
-
Decoupling Generalizability and Membership Privacy Risks in Neural Networks
Xingli Fang, Jung-Eun Kim
-
Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks
Bohan Wang, Zewen Liu, Lu Lin, Hui Liu, Li Xiong, Ming Jin, Wei Jin
-
MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
Xiaoyu Wen, Zhida He, Han Qi, Ziyu Wan, Zhongtian Ma, Ying Wen, Tianhang Zheng, Xingcheng Xu, Chaochao Lu, Qiaosheng Zhang
-
Semantic Containment as a Fundamental Property of Emergent Misalignment
Rohan Saxena
-
MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support
António Farinhas, Nuno M. Guerreiro, José Pombal, Pedro Henrique Martins, Laura Melton, Alex Conway, Cara Dochat, Maya D'Eon, Ricardo Rei
-
Building Better Deception Probes Using Targeted Instruction Pairs
Vikram Natarajan, Devina Jain, Shivam Arora, Satvik Golechha, Joseph Bloom
-
GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability
Xueyi Li, Zhuoneng Zhou, Zitao Liu, Yongdong Wu, Weiqi Luo
-
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
Kaiyuan Cui, Yige Li, Yutao Wu, Xingjun Ma, Sarah Erfani, Christopher Leckie, Hanxun Huang
-
HierCon: Hierarchical Contrastive Attention for Audio Deepfake Detection
Zhili Nicholas Liang, Soyeon Caren Han, Qizhou Wang, Christopher Leckie
-
Statistical MIA: Rethinking Membership Inference Attack for Reliable Unlearning Auditing
Jialong Sun, Zeming Wei, Jiaxuan Zou, Jiacheng Gong, Guanheng Wang, Chengyang Dong, Jialong Li, Bo Liu
-
Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation
Chengran Yang, Zichao Wei, Heminghao Deng, Jinfeng Jiang, Zhensu Sun, Ting Zhang, Tianyi Wu, Ming Wen, David Lo
-
Don't Judge a Book by its Cover: Testing LLMs' Robustness Under Logical Obfuscation
Abhilekh Borah, Shubhra Ghosh, Kedar Joshi, Aditya Kumar Guru, Kripabandhu Ghosh
-
Improving Robustness of Vision-Language-Action Models by Restoring Corrupted Visual Inputs
Daniel Yezid Guarnizo Orjuela, Leonardo Scappatura, Veronica Di Gennaro, Riccardo Andrea Izzo, Gianluca Bardaro, Matteo Matteucci
-
Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
Songping Wang, Qinglong Liu, Yueming Lyu, Ning Li, Ziwen He, Caifeng Shan
-
Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection
Chen Chen, Dion Hoe-Lian Goh
-
Single-Edge Node Injection Threats to GNN-Based Security Monitoring in Industrial Graph Systems
Wenjie Liang, Ranhui Yan, Jia Cai, You-Gan Wang
-
Self-Generative Adversarial Fine-Tuning for Large Language Models
Shiguang Wu, Yaqing Wang, Quanming Yao
-
Key Principles of Graph Machine Learning: Representation, Robustness, and Generalization
Yassine Abbahaddou
-
Shangzhe Li, Xuchao Zhang, Chetan Bansal, Weitong Zhang
-
Equivalence of Privacy and Stability with Generalization Guarantees in Quantum Learning
Ayanava Dasgupta, Naqueeb Ahmad Warsi, Masahito Hayashi
-
To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack
Terry Yue Zhuo, Yangruibo Ding, Wenbo Guo, Ruijie Meng
-
Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models
Eliron Rahimi, Elad Hirshel, Rom Himelstein, Amit LeVi, Avi Mendelson, Chaim Baskin
-
Multi-Agent Teams Hold Experts Back
Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, James Zou
-
Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model Explicit
Yangfan Deng, Anirudh Nakra, Min Wu
-
Multi-Agent Teams Hold Experts Back
Aneesh Pappu, Batu El, Hancheng Cao, Carmelo di Nolfo, Yanchao Sun, Meng Cao, James Zou
-
Inference-Only Prompt Projection for Safe Text-to-Image Generation with TV Guarantees
Minhyuk Lee, Hyekyung Yoon, Myungjoo Kang
-
Self-Guard: Defending Large Reasoning Models via enhanced self-reflection
Jingnan Zheng, Jingjun Xu, Yanzhen Luo, Chenhang Cui, Gelei Deng, Zhenkai Liang, Xiang Wang, An Zhang, Tat-Seng Chua
-
Text is All You Need for Vision-Language Model Jailbreaking
Yihang Chen, Zhao Xu, Youyuan Jiang, Tianle Zheng, Cho-Jui Hsieh
-
Naen Xu, Hengyu An, Shuo Shi, Jinghuai Zhang, Chunyi Zhou, Changjiang Li, Tianyu Du, Zhihui Fu, Jun Wang, Shouling Ji
-
Quality-Diversity Optimization as Multi-Objective Optimization
Xi Lin, Ping Guo, Yilu Liu, Qingfu Zhang, Jianyong Sun
-
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models
Wenbin Xing, Quanxing Zha, Lizheng Zu, Mengran Li, Ming Li, Junchi Yan
-
Jailbreaking LLMs via Calibration
Yuxuan Lu, Yongkang Guo, Yuqing Kong
-
Bypassing Prompt Injection Detectors through Evasive Injections
Md Jahedur Rahman, Ihsen Alouani
-
Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
Anxin Guo, Jingwei Li
-
Unifying Adversarial Robustness and Training Across Text Scoring Models
Manveer Singh Tamber, Hosna Oyarhoseini, Jimmy Lin
-
Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering
Guangtao Lyu, Xinyi Cheng, Qi Liu, Chenghao Xu, Jiexi Yan, Muli Yang, Fen Fang, Cheng Deng
-
Konstantinos Moutselos, Ilias Maglogiannis
-
Towards Building Non-Fine-Tunable Foundation Models
Ziyao Wang, Nizhang Li, Pingzhi Li, Guoheng Sun, Tianlong Chen, Ang Li
-
Dongbin Jiao, Zisheng Chen, Xianyi Wang, Jintao Shi, Shengcai Liu, Shi Yan
-
Sparsity-Aware Unlearning for Large Language Models
Yuze Wang, Yujia Tong, Ke Xu, Jingling Yuan, Jiawei Jiang, Chuang Hu
-
Riemannian Flow Matching for Disentangled Graph Domain Adaptation
Yingxu Wang, Xinwang Liu, Mengzhu Wang, Siyang Gao, Nan Yin
-
Provably Protecting Fine-Tuned LLMs from Training Data Extraction
Tom Segal, Asaf Shabtai, Yuval Elovici
-
Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation
Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng
-
Safety-Efficacy Trade Off: Robustness against Data-Poisoning
Diego Granziol
-
Multivariate Time Series Data Imputation via Distributionally Robust Regularization
Che-Yi Liao, Zheng Dong, Gian-Gabriel Garcia, Kamran Paynabar
-
RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
Amitesh Vatsa, Zhixian Xie, Wanxin Jin
-
DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems
Haoran Ou, Kangjie Chen, Gelei Deng, Hangcheng Liu, Jie Zhang, Tianwei Zhang, Kwok-Yan Lam
-
Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
Anxin Guo, Jingwei Li
-
Naen Xu, Hengyu An, Shuo Shi, Jinghuai Zhang, Chunyi Zhou, Changjiang Li, Tianyu Du, Zhihui Fu, Jun Wang, Shouling Ji
-
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling
Mingqian Feng, Xiaodong Liu, Weiwei Yang, Chenliang Xu, Christopher White, Jianfeng Gao
-
Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory
Yuhao Zhan, Tianyu Fan, Linxuan Huang, Zirui Guo, Chao Huang
-
Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks
Nathaniel Mitrani Hadida, Sassan Bhanji, Cameron Tice, Puria Radmard
-
THINKSAFE: Self-Generated Safety Alignment for Reasoning Models
Seanie Lee, Sangwoo Park, Yumin Choi, Gyeongman Kim, Minki Kang, Jihun Yun, Dongmin Park, Jongho Park, Sung Ju Hwang
-
Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Xueyi Ke, Qixing Zhang, Bingquan Shen, Alex Kot, Xudong Jiang
-
Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPs
Ali Asadi, Krishnendu Chatterjee, Ehsan Goharshady, Mehrdad Karrabi, Alipasha Montaseri, Carlo Pagano
-
FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks
Naen Xu, Jinghuai Zhang, Ping He, Chunyi Zhou, Jun Wang, Zhihui Fu, Tianyu Du, Zhaoxiang Wang, Shouling Ji
-
Whispers of Wealth: Red-Teaming Google's Agent Payments Protocol via Prompt Injection
Tanusree Debi, Wentian Zhu
-
Charles Westphal, Keivan Navaie, Fernando E. Rosas
-
Protecting Private Code in IDE Autocomplete using Differential Privacy
Evgeny Grigorenko, David Stanojević, David Ilić, Egor Bogomolov, Kostadin Cvejoski
-
A Real-Time Privacy-Preserving Behavior Recognition System via Edge-Cloud Collaboration
Huan Song, Shuyu Tian, Junyi Hao, Cheng Yuan, Zhenyu Jia, Jiawei Shao, Xuelong Li
-
Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
Xiaoxuan Guo, Yuankun Xie, Haonan Cheng, Jiayi Zhou, Jian Liu, Hengyan Huang, Long Ye, Qin Zhang
-
Yanghao Su, Wenbo Zhou, Tianwei Zhang, Qiu Han, Weiming Zhang, Nenghai Yu, Jie Zhang
-
From Similarity to Vulnerability: Key Collision Attack on LLM Semantic Caching
Zhixiang Zhang, Zesen Liu, Yuchong Xie, Quanfeng Huang, Dongdong She
-
Saeid Jamshidi, Omar Abdul Wahab, Rolando Herrero, Foutse Khomh
-
Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models
Ye Yu, Haibo Jin, Yaoning Yu, Jun Zhuang, Haohan Wang
-
Layer-wise Swapping for Generalizable Multilingual Safety
Hyunseo Shin, Wonseok Hwang
-
Safer Policy Compliance with Dynamic Epistemic Fallback
Joseph Marvin Imperial, Harish Tayyar Madabushi
-
AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs
Jaehee Kim, Pilsung Kang
-
Semantic Leakage from Image Embeddings
Yiyi Chen, Qiongkai Xu, Desmond Eliott, Qiongxiu Li, Johannes Bjerva
-
Lingua-SafetyBench: A Benchmark for Safety Evaluation of Multilingual Vision-Language Models
Enyi Shi, Pengyang Shao, Yanxin Zhang, Chenhang Cui, Jiayi Lyu, Xu Xie, Xiaobo Xia, Fei Shen, Tat-Seng Chua
-
Yilong Huang, Songze Li
-
Rethinking Transferable Adversarial Attacks on Point Clouds from a Compact Subspace Perspective
Keke Tang, Xianheng Liu, Weilong Peng, Xiaofei Wang, Daizong Liu, Peican Zhu, Can Lu, Zhihong Tian
-
Zhiyuan Cao, Zeyu Ma, Chenhao Yang, Han Zheng, Mingang Chen
-
Rethinking Anonymity Claims in Synthetic Data Generation: A Model-Centric Privacy Attack Perspective
Georgi Ganev, Emiliano De Cristofaro
-
RPWithPrior: Label Differential Privacy in Regression
Haixia Liu, Ruifan Huang
-
VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection
Weizhi Liu, Yue Li, Zhaoxia Yin
-
The Semantic Trap: Do Fine-tuned LLMs Learn Vulnerability Root Cause or Just Functional Pattern?
Feiyang Huang, Yuqiang Sun, Fan Zhang, Ziqi Yang, Han Liu, Yang Liu
-
Trojan-Resilient NTT: Protecting Against Control Flow and Timing Faults on Reconfigurable Platforms
Rourab Paul, Krishnendu Guha, Amlan Chakrabarti
-
Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning
Abhishek Mishra, Mugilan Arulvanan, Reshma Ashok, Polina Petrova, Deepesh Suranjandass, Donnie Winkelmann
-
RobustDebias: Debiasing Language Models using Distributionally Robust Optimization
Deep Gandhi, Katyani Singh, Nidhi Hegde
-
Learning Robust Reasoning through Guided Adversarial Self-Play
Shuozhe Li, Vaishnav Tadiparthi, Kwonjoon Lee, Nakul Agarwal, Hossein Nourkhiz Mahjoub, Ehsan Moradi Pari, Lizhang Chen, Amy Zhang, Liu Leqi
-
The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization
Manyi Li, Yufan Liu, Lai Jiang, Bing Li, Yuming Li, Weiming Hu
-
AI-Generated Image Detectors Overrely on Global Artifacts: Evidence from Inpainting Exchange
Elif Nebioglu, Emirhan Bilgiç, Adrian Popescu
-
Semantics-Preserving Evasion of LLM Vulnerability Detectors
Luze Sun, Alina Oprea, Eric Wong
-
Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection
Kunal Mukherjee, Zulfikar Alom, Tran Gia Bao Ngo, Cuneyt Gurcan Akcora, Murat Kantarcioglu
-
On the Assessment of Sensitivity of Autonomous Vehicle Perception
Apostol Vassilev, Munawar Hasan, Edward Griffor, Honglan Jin, Pavel Piliptchak, Mahima Arora, Thoshitha Gamage
-
Ignacy Kolton, Kacper Marzol, Paweł Batorski, Marcin Mazur, Paul Swoboda, Przemysław Spurek
-
RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance
Miao Lin, Feng Yu, Rui Ning, Lusi Li, Jiawei Chen, Qian Lou, Mengxin Zheng, Chunsheng Xin, Hongyi Wu
-
A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode
Zeyuan He, Yupeng Chen, Lang Lin, Yihan Wang, Shenxu Chang, Eric Sommerlade, Philip Torr, Junchi Yu, Adel Bibi, Jialin Yu
-
Fed-Listing: Federated Label Distribution Inference in Graph Neural Networks
Suprim Nakarmi, Junggab Son, Yue Zhao, Zuobin Xiong
-
"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval
Jiate Li, Defu Cao, Li Li, Wei Yang, Yuehan Qin, Chenxiao Yu, Tiannuo Yang, Ryan A. Rossi, Yan Liu, Xiyang Hu, Yue Zhao
-
The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models
Yupeng Chen, Junchi Yu, Aoxi Liu, Philip Torr, Adel Bibi
-
Non-Intrusive Graph-Based Bot Detection for E-Commerce Using Inductive Graph Neural Networks
Sichen Zhao, Zhiming Xue, Yalun Qi, Xianling Zeng, Zihan Yu
-
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling
Mingqian Feng, Xiaodong Liu, Weiwei Yang, Chenliang Xu, Christopher White, Jianfeng Gao
-
AST-PAC: AST-guided Membership Inference for Code
Roham Koohestani, Ali Al-Kaswan, Jonathan Katzy, Maliheh Izadi
-
"Someone Hid It": Query-Agnostic Black-Box Attacks on LLM-Based Retrieval
Jiate Li, Defu Cao, Li Li, Wei Yang, Yuehan Qin, Chenxiao Yu, Tiannuo Yang, Ryan A. Rossi, Yan Liu, Xiyang Hu, Yue Zhao
-
Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs
Xiang Zheng, Yutao Wu, Hanxun Huang, Yige Li, Xingjun Ma, Bo Li, Yu-Gang Jiang, Cong Wang
-
TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning
Mingzu Liu, Hao Fang, Runmin Cong
-
Making Models Unmergeable via Scaling-Sensitive Loss Landscape
Minwoo Jang, Hoyoung Kim, Jabin Koo, Jungseul Ok
-
Output-Space Search: Targeting LLM Generations in a Frozen Encoder-Defined Output Space
Tobias Materzok
-
Adaptive and Robust Cost-Aware Proof of Quality for Decentralized LLM Inference Networks
Arther Tian, Alex Ding, Frank Chen, Simon Wu, Aaron Chan
-
Namkyung Yoon, Sanghong Kim, Hwangnam Kim
-
DUET: Distilled LLM Unlearning from an Efficiently Contextualized Teacher
Yisheng Zhong, Zhengbang Yang, Zhuangdi Zhu
-
Achraf Hsain, Ahmed Abdelkader, Emmanuel Baldwin Mbaya, Hamoud Aljamaan
-
The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Vinay Chamola, Dhruv Kumar
-
On the Adversarial Robustness of Large Vision-Language Models under Visual Token Compression
Xinwei Zhang, Hangcheng Liu, Li Bai, Hao Wang, Qingqing Ye, Tianwei Zhang, Haibo Hu
-
Gauge-invariant representation holonomy
Vasileios Sevetlidis, George Pavlidis
-
FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning
Xiaoyu Xu, Minxin Du, Kun Fang, Zi Liang, Yaxin Xiao, Zhicong Huang, Cheng Hong, Qingqing Ye, Haibo Hu
-
TraceRouter: Robust Safety for Large Foundation Models via Path-Level Intervention
Chuancheng Shi, Shangze Li, Wenjun Lu, Wenhua Wu, Cong Wang, Zifeng Cheng, Fei Shen, Tat-Seng Chua
-
Alexander Loth, Martin Kappes, Marc-Oliver Pahl
-
Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models
Archer Wang, Emile Anand, Yilun Du, Marin Soljačić
-
Latent Adversarial Regularization for Offline Preference Optimization
Enyi Jiang, Yibo Jacky Zhang, Yinglun Xu, Andreas Haupt, Nancy Amato, Sanmi Koyejo
-
Enhancing Conversational Agents via Task-Oriented Adversarial Memory Adaptation
Yimin Deng, Yuqing Fu, Derong Xu, Yejing Wang, Wei Ni, Jingtong Gao, Xiaopeng Li, Chengxu Liu, Xiao Han, Guoshuai Zhao, Xiangyu Zhao, Li Zhu, Xueming Qian
-
Quantifying Noise in Language Generation
Aaron Li, Ian Zhang
-
Beyond Forgetting: Machine Unlearning Elicits Controllable Side Behaviors and Capabilities
Tien Dang, The-Hai Nguyen, Dinh Mai Phuong, Nguyen Minh Phuong, Hoang Thanh-Tung, Le-Minh Nguyen, Naoya Inoue
-
LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models
Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim, Chris Thomas
-
Xinan He, Kaiqing Lin, Yue Zhou, Jiaming Zhong, Wei Ye, Wenhui Yi, Bing Fan, Feng Ding, Haodong Li, Bo Cao, Bin Li
-
Visual-Guided Key-Token Regularization for Multimodal Large Language Model Unlearning
Chengyi Cai, Zesheng Ye, Peike Li, Bo Han, Jianzhong Qi, Feng Liu
-
Rethinking Self-Training Based Cross-Subject Domain Adaptation for SSVEP Classification
Weiguang Wang, Yong Liu, Yingjie Gao, Guangyuan Xu
-
Factored Causal Representation Learning for Robust Reward Modeling in RLHF
Yupei Yang, Lin Yang, Wanxi Deng, Lin Qu, Fan Feng, Biwei Huang, Shikui Tu, Lei Xu
-
Representation Unlearning: Forgetting through Information Compression
Antonio Almudévar, Alfonso Ortega
-
Sampling-Free Privacy Accounting for Matrix Mechanisms under Random Allocation
Jan Schuchardt, Nikita Kalinin
-
LoRA and Privacy: When Random Projections Help (and When They Don't)
Yaxi Hu, Johanna Düngler, Bernhard Schölkopf, Amartya Sanyal
-
Knowledge Vector Weakening: Efficient Training-free Unlearning for Large Vision-Language Models
Yejin Kim, Dongjun Hwang, Sungmin Cha, Junsuk Choe
-
Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models
Sidney Bender, Marco Morik
-
Jonas Möller, Erik Imgrund, Thorsten Eisenhofer, Konrad Rieck
-
Mengqi Chen, Thomas B. Berrett, Theodoros Damoulas, Michele Caprio
-
Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise
Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao
-
RerouteGuard: Understanding and Mitigating Adversarial Risks for LLM Routing
Wenhui Zhang, Huiyu Xu, Zhibo Wang, Zhichao Li, Zeqing He, Xuelin Wei, Kui Ren
-
ICL-EVADER: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses
Ningyuan He, Ronghong Huang, Qianqian Tang, Hongyu Wang, Xianghang Mi, Shanqing Guo
-
Pedro H. Barcha Correia, Ryan W. Achjian, Diego E. G. Caetano de Oliveira, Ygor Acacio Maria, Victor Takashi Hayashi, Marcos Lopes, Charles Christian Miers, Marcos A. Simplicio Jr
-
Stealthy Poisoning Attacks Bypass Defenses in Regression Settings
Javier Carnerero-Cano, Luis Muñoz-González, Phillippa Spencer, Emil C. Lupu
-
The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples
Hsiang Hsu, Pradeep Niroula, Zichang He, Ivan Brugere, Freddy Lecue, Chun-Fu Chen
-
Jailbreaks on Vision Language Model via Multimodal Reasoning
Aarush Noheria, Yuguang Yao
-
Puyu Wang, Junyu Zhou, Philipp Liznerski, Marius Kloft
-
ZK-HybridFL: Zero-Knowledge Proof-Enhanced Hybrid Ledger for Federated Learning
Amirhossein Taherpour, Xiaodong Wang
-
Hair-Trigger Alignment: Black-Box Evaluation Cannot Guarantee Post-Update Alignment
Yavuz Bakman, Duygu Nur Yaldiz, Salman Avestimehr, Sai Praneeth Karimireddy
-
Chanwoo Park, Chanwoo Kim
-
Xiaogeng Liu, Xinyan Wang, Yechao Zhang, Sanjay Kariyappa, Chong Xiang, Muhao Chen, G. Edward Suh, Chaowei Xiao
-
Shaping capabilities with token-level data filtering
Neil Rathi, Alec Radford
-
Stay in Character, Stay Safe: Dual-Cycle Adversarial Self-Evolution for Safety Role-Playing Agents
Mingyang Liao, Yichen Wan, shuchen wu, Chenxi Miao, Xin Shen, Weikang Li, Yang Li, Deguo Xia, Jizhou Huang
-
Stealthy Poisoning Attacks Bypass Defenses in Regression Settings
Javier Carnerero-Cano, Luis Muñoz-González, Phillippa Spencer, Emil C. Lupu
-
Tianwei Lin, Zuyi Zhou, Xinda Zhao, Chenke Wang, Xiaohong Li, Yu Chen, Chuanrui Hu, Jian Pei, Yafeng Deng
-
Towards Compact and Robust DNNs via Compression-aware Sharpness Minimization
Jialuo He, Huangxun Chen
-
Self Voice Conversion as an Attack against Neural Audio Watermarking
Yigitcan Özer, Wanying Ge, Zhe Zhang, Xin Wang, Junichi Yamagishi
-
Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
Robin Singh, Aditya Yogesh Nair, Fabio Palumbo, Florian Barbaro, Anna Dyka, Lohith Rachakonda
-
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
Rohan Asthana, Vasileios Belagiannis
-
GNN Explanations that do not Explain and How to find Them
Steve Azzolin, Stefano Teso, Bruno Lepri, Andrea Passerini, Sagar Malhotra
-
Post-Training Fairness Control: A Single-Train Framework for Dynamic Fairness in Recommendation
Weixin Chen, Li Chen, Yuhan Zhao
-
A Dialectic Pipeline for Improving LLM Robustness
Sara Candussio
-
One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking
Tanmay Karmakar, Sourav Saha, Debapriyo Majumdar, Surjyanee Halder
-
SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks
Xin Zhang, Zijin Yang, Kejiang Chen, Linfeng Ma, Weiming Zhang, Nenghai Yu
-
UnlearnShield: Shielding Forgotten Privacy against Unlearning Inversion
Lulu Xue, Shengshan Hu, Wei Lu, Ziqi Zhou, Yufei Song, Jianhong Cheng, Minghui Li, Yanjun Zhang, Leo Yu Zhang
-
Reinforcement Unlearning via Group Relative Policy Optimization
Efstratios Zaradoukas, Bardh Prenkaj, Gjergji Kasneci
-
Reference-Free Spectral Analysis of EM Side-Channels for Always-on Hardware Trojan Detection
Mahsa Tahghigh, Hassan Salmani
-
Mohsen Hatami, Van Tuan Pham, Hozefa Lakadawala, Yu Chen
-
LIFT: Byzantine Resilient Hub-Sampling
Mohamed Amine Legheraba (NPA), Nour Rachdi (NPA), Maria Gradinariu Potop-Butucaru (NPA), Sébastien Tixeuil (NPA, IUF)
-
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Jarrod Barnes
-
How does information access affect LLM monitors' ability to detect sabotage?
Rauno Arike, Raja Mehta Moreno, Rohan Subramani, Shubhorup Biswas, Francis Rhys Ward
-
ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack
Xingwei Lin, Wenhao Lin, Sicong Cao, Jiahao Yu, Renke Huang, Lei Xue, Chunming Wu
-
SIGMA-PPG: Statistical-prior Informed Generative Masking Architecture for PPG Foundation Model
Zongheng Guo, Tao Chen, Yang Jiao, Yi Pan, Xiao Hu, Manuela Ferrario
-
BadDet+: Robust Backdoor Attacks for Object Detection
Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak
-
Towards Sensitivity-Aware Language Models
Dren Fazlija, Iyiola E. Olatunji, Daniel Kudenko, Sandipan Sikdar
-
Robust Federated Learning for Malicious Clients using Loss Trend Deviation Detection
Deepthy K Bhaskar, Minimol B, Binu V P
-
Yizhong Ding
-
Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability
Rohan Asthana, Vasileios Belagiannis
-
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Jarrod Barnes
-
Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving
Sutej Kulgod, Sean Ye, Sanchit Tanwar, Christoffer Heckman
-
GNN Explanations that do not Explain and How to find Them
Steve Azzolin, Stefano Teso, Bruno Lepri, Andrea Passerini, Sagar Malhotra
-
Weiran Guo, Bing Bo, Shaoxiang Wu, Jingsheng Yang
-
Privacy-Preserving Model Transcription with Differentially Private Synthetic Distillation
Bochao Liu, Shiming Ge, Pengju Wang, Shikun Li, Tongliang Liu
-
SHIELD: An Auto-Healing Agentic Defense Framework for LLM Resource Exhaustion Attacks
Nirhoshan Sivaroopan, Kanchana Thilakarathna, Albert Zomaya, Manu, Yi Guo, Jo Plested, Tim Lynar, Jack Yang, Wangli Yang
-
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
Kishan Panaganti, Zhenwen Liang, Wenhao Yu, Haitao Mi, Dong Yu
-
When Benchmarks Leak: Inference-Time Decontamination for LLMs
Jianzhe Chai, Yu Zhe, Jun Sakuma
-
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection
Quy-Anh Dang, Chris Ngo
-
LLM-VA: Resolving the Jailbreak-Overrefusal Trade-off via Vector Alignment
Haonan Zhang, Dongxia Wang, Yi Liu, Kexin Chen, Wenhai Wang
-
RvB: Automating AI System Hardening via Iterative Red-Blue Games
Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao
-
Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs
Chi Zhang, Wenxuan Ding, Jiale Liu, Mingrui Wu, Qingyun Wu, Ray Mooney
-
Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?
Ahrii Kim, Seong-heum Kim
-
Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs
Xiangyang Zhu, Yuan Tian, Zicheng Zhang, Qi Jia, Chunyi Li, Renrui Zhang, Heng Li, Zongrui Wang, Wei Sun
-
Implicit Non-Causal Factors are Out via Dataset Splitting for Domain Generalization Object Detection
Zhilong Zhang, Lei Zhang, Qing He, Shuyin Xia, Guoyin Wang, Fuxiang Huang
-
Sen Nie, Jie Zhang, Zhuo Wang, Shiguang Shan, Xilin Chen
-
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
-
GraphDLG: Exploring Deep Leakage from Gradients in Federated Graph Learning
Shuyue Wei, Wantong Chen, Tongyu Wei, Chen Gong, Yongxin Tong, Lizhen Cui
-
Bandits in Flux: Adversarial Constraints in Dynamic Environments
Tareq Si Salem
-
Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models
Harsh Chaudhari, Ethan Rathbum, Hanna Foerster, Jamie Hayes, Matthew Jagielski, Milad Nasr, Ilia Shumailov, Alina Oprea
-
Task-Centric Policy Optimization from Misaligned Motion Priors
Ziang Zheng, Kai Feng, Yi Nie, Shentao Qin
-
Proactive Hardening of LLM Defenses with HASTE
Henry Chen, Victor Aranda, Samarth Keshari, Ryan Heartfield, Nicole Nichols
-
Evaluating Nova 2.0 Lite model under Amazon's Frontier Model Safety Framework
Satyapriya Krishna, Matteo Memelli, Tong Wang, Abhinav Mohanty, Claire O'Brien Rajkumar, Payal Motwani, Rahul Gupta, Spyros Matsoukas
-
LLMs Can Unlearn Refusal with Only 1,000 Benign Samples
Yangyang Guo, Ziwei Xu, Si Liu, Zhiming Zheng, Mohan Kankanhalli
-
Shuning Zhang, Qucheng Zang, Yongquan `Owen' Hu, Jiachen Du, Xueyang Wang, Yan Kong, Xinyi Fu, Suranga Nanayakkara, Xin Yi, Hewu Li
-
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
Yuxiang Wang, Hongyu Liu, Dekun Chen, Xueyao Zhang, Zhizheng Wu
-
Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
Jinlin Liu, Wei Chen, Xiaojin Zhang
-
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
Yuetian Chen, Kaiyuan Zhang, Yuntao Du, Edoardo Stoppa, Charles Fleming, Ashish Kundu, Bruno Ribeiro, Ninghui Li
-
Benchmarking LLAMA Model Security Against OWASP Top 10 For LLM Applications
Nourin Shahin, Izzat Alsmadi
-
Quick Change Detection in Discrete-Time in Presence of a Covert Adversary
Amir Reza Ramtin, Philippe Nain, Don Towsley
-
VoxMorph: Scalable Zero-shot Voice Identity Morphing via Disentangled Embeddings
Bharath Krishnamurthy, Ajita Rattani
-
Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu
-
Membership Inference Attacks Against Fine-tuned Diffusion Language Models
Yuetian Chen, Kaiyuan Zhang, Yuntao Du, Edoardo Stoppa, Charles Fleming, Ashish Kundu, Bruno Ribeiro, Ninghui Li
-
Quickest Change Detection in Discrete-Time in Presence of a Covert Adversary
Amir Reza Ramtin, Philippe Nain, Don Towsley
-
MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs
Dezhang Kong, Zhuxi Wu, Shiqi Liu, Zhicheng Tan, Kuichen Lu, Minghao Li, Qichen Liu, Shengyu Chu, Zhenhua Xu, Xuan Liu, Meng Han
-
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
Zhewen Tan, Wenhan Yu, Jianfeng Si, Tongxin Liu, Kaiqi Guan, Huiyan Jin, Jiawen Tao, Xiaokun Yuan, Duohe Ma, Xiangzheng Zhang, Tong Yang, Lin Sun
-
Ignacio Antequera-Sánchez, Juan Luis Suárez-Díaz, Rosana Montes, Francisco Herrera
-
Seyed Amir Hosseini, Maryam Abdolali, Amirhosein Tavakkoli, Fardin Ayar, Ehsan Javanmardi, Manabu Tsukada, Mahdi Javanmardi
-
Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah
-
Unknown Unknowns: Why Hidden Intentions in LLMs Evade Detection
Devansh Srivastav, David Pape, Lea Schönherr
-
Gabriel Lee Jun Rong, Christos Korgialas, Dion Jia Xu Ho, Pai Chet Ng, Xiaoxiao Miao, Konstantinos N. Plataniotis
-
Differentiable Architecture Search for Adversarially Robust Quantum Computer Vision
Mohamed Afane, Quanjiang Long, Haoting Shen, Ying Mao, Junaid Farooq, Ying Wang, Juntao Chen
-
Multimodal Privacy-Preserving Entity Resolution with Fully Homomorphic Encryption
Susim Roy, Nalini Ratha
-
Counterfactual Explanations on Robust Perceptual Geodesics
Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta
-
Comparison requires valid measurement: Rethinking attack success rate comparisons in AI red teaming
Alexandra Chouldechova, A. Feder Cooper, Solon Barocas, Abhinav Palia, Dan Vann, Hanna Wallach
-
AttenMIA: LLM Membership Inference Attack through Attention Signals
Pedram Zaree, Md Abdullah Al Mamun, Yue Dong, Ihsen Alouani, Nael Abu-Ghazaleh
-
Robust Learning of a Group DRO Neuron
Guyang Cao, Shuyao Li, Sushrut Karmalkar, Jelena Diakonikolas
-
Santanu Das, Jatin Batra
-
Structural Gender Bias in Credit Scoring: Proxy Leakage
Navya SD, Sreekanth D, SS Uma Sankari
-
LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models
Kai Hu, Haoqi Hu, Matt Fredrikson
-
Information Hidden in Gradients of Regression with Target Noise
Arash Jamshidi, Katsiaryna Haitsiukevich, Kai Puolamäki
-
GUIGuard: Toward a General Framework for Privacy-Preserving GUI Agents
Yanxi Wang, Zhiling Zhang, Wenbo Zhou, Weiming Zhang, Jie Zhang, Qiannan Zhu, Yu Shi, Shuxin Zheng, Jiyan He
-
Malicious Repurposing of Open Science Artefacts by Using Large Language Models
Zahra Hashemi, Zhiqiang Zhong, Jun Pang, Wei Zhao
-
Dynamic Mask-Based Backdoor Attack Against Vision AI Models: A Case Study on Mushroom Detection
Zeineb Dridi, Jihen Bennaceur, Amine Ben Hassouna
-
A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy
Claire O'Brien, Jessica Seto, Dristi Roy, Aditya Dwivedi, Sunishchal Dev, Kevin Zhu, Sean O'Brien, Ashwinee Panda, Ryan Lagasse
-
NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey
Dhiman Goswami, Jai Kruthunz Naveen Kumar, Sanchari Das
-
Krittin Pachtrachai, Petmongkon Pornpichitsuwan, Wachiravit Modecrua, Touchapon Kraisingkorn
-
Counterfactual Explanations on Robust Perceptual Geodesics
Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta
-
Jiahe Guo, Xiangran Guo, Yulin Hu, Zimo Long, Xingyu Sui, Xuda Zhi, Yongbo Huang, Hao He, Weixiang Zhao, Yanyan Zhao, Bing Qin
-
A Systemic Evaluation of Multimodal RAG Privacy
Ali Al-Lawati, Suhang Wang
-
Coding-Enforced Resilient and Secure Aggregation for Hierarchical Federated Learning
Shudi Weng, Ming Xiao, Mikael Skoglund
-
Daniel Commey, Matilda Nkoom, Yousef Alsenani, Sena G. Hounsinou, Garth V. Crosby
-
Adriana Watson
-
Thomas Heverin
-
CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries
Deep Mehta
-
Lattice: Generative Guardrails for Conversational Agents
Emily Broadhurst, Tawab Safi, Joseph Edell, Vashisht Ganesh, Karime Maamari
-
David Condrey
-
Robust Privacy: Inference-Time Privacy through Certified Robustness
Jiankai Jin, Xiangzheng Zhang, Zhao Liu, Deyue Zhang, Quanchen Zou
-
Res-MIA: A Training-Free Resolution-Based Membership Inference Attack on Federated Learning Models
Mohammad Zare, Pirooz Shamsinejadbabaki
-
Prompt and Circumstances: Evaluating the Efficacy of Human Prompt Inference in AI-Generated Art
Khoi Trinh, Scott Seidenberger, Joseph Spracklen, Raveen Wijewickrama, Bimal Viswanath, Murtuza Jadliwala, Anindya Maiti
-
Physical Prompt Injection Attacks on Large Vision-Language Models
Chen Ling, Kai Hu, Hangcheng Liu, Xingshuo Han, Tianwei Zhang, Changhai Ou
-
Unintended Memorization of Sensitive Information in Fine-Tuned Language Models
Marton Szep, Jorge Marin Ruiz, Georgios Kaissis, Paulina Seidl, Rüdiger von Eisenhart-Rothe, Florian Hinterwimmer, Daniel Rueckert
-
Reconstructing Training Data from Adapter-based Federated Large Language Models
Silong Chen, Yuchuan Luo, Guilin Deng, Yi Liu, Min Xu, Shaojing Fu, Xiaohua Jia
-
Narek Maloyan, Dmitry Namiot
-
Alireza Salemi, Hamed Zamani
-
The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents
Chen Chen, Kim Young Il, Yuan Yang, Wenhao Su, Yilin Zhang, Xueluan Gong, Qian Wang, Yongsen Zheng, Ziyao Liu, Kwok-Yan Lam
-
Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes
Gautam Siddharth Kashyap, Harsh Joshi, Niharika Jain, Ebad Shabbir, Jiechao Gao, Nipun Joshi, Usman Naseem
-
OTI: A Model-free and Visually Interpretable Measure of Image Attackability
Jiaming Liang, Haowei Liu, Chi-Man Pun
-
Sponge Tool Attack: Stealthy Denial-of-Efficiency against Tool-Augmented Agentic Reasoning
Qi Li, Xinchao Wang
-
Narek Maloyan, Dmitry Namiot
-
Reconstructing Protected Biometric Templates from Binary Authentication Results
Eliron Rahimi, Margarita Osadchy, Orr Dunkelman
-
EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding
Luca Cerovaz, Michele Mancusi, Emanuele Rodolà
-
To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
Yicheng Bao, Xuhong Wang, Xin Tan
-
To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
Yicheng Bao, Xuhong Wang, Qiaosheng Zhang, Chaochao Lu, Xia Hu, Xin Tan
-
Dongshen Peng, Yi Wang, Carl Preiksaitis, Christian Rose
-
Preventing the Collapse of Peer Review Requires Verification-First AI
Lei You, Lele Cao, Iryna Gurevych
-
DeMark: A Query-Free Black-Box Attack on Deepfake Watermarking Defenses
Wei Song, Zhenchang Xing, Liming Zhu, Yulei Sui, Jingling Xue
-
SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment
Xianya Fang, Xianying Luo, Yadong Wang, Xiang Chen, Yu Tian, Zequn Sun, Rui Liu, Jun Fang, Naiqiang Tan, Yuanning Cui, Sheng-Jun Huang
-
Beyond Superficial Unlearning: Sharpness-Aware Robust Erasure of Hallucinations in Multimodal LLMs
Xianya Fang, Feiyang Ren, Xiang Chen, Yu Tian, Zhen Bi, Haiyang Yu, Sheng-Jun Huang
-
Provably Robust Bayesian Counterfactual Explanations under Model Changes
Jamie Duell, Xiuyi Fan
-
LLM-Based Adversarial Persuasion Attacks on Fact-Checking Systems
João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton
-
Persona Jailbreaking in Large Language Models
Jivnesh Sandhan, Fei Cheng, Tushar Sandhan, Yugo Murawaki
-
Is Length Really A Liability? An Evaluation of Multi-turn LLM Conversations using BoolQ
Karl Neergaard, Le Qiu, Emmanuele Chersoni
-
EMemBench: Interactive Benchmarking of Episodic Memory for VLM Agents
Xinze Li, Ziyue Zhu, Siyuan Liu, Yubo Ma, Yuhang Zang, Yixin Cao, Aixin Sun
-
White-Box Sensitivity Auditing with Steering Vectors
Hannah Cyberey, Yangfeng Ji, David Evans
-
Soumitri Chattopadhyay, Basar Demir, Marc Niethammer
-
Maxence Noble, Gonzalo Iñaki Quintana, Benjamin Aubin, Clément Chadebec
-
On the Effects of Adversarial Perturbations on Distribution Robustness
Yipei Wang, Zhaoying Pan, Xiaoqian Wang
-
Henri Nikoleit, Ankit Anand, Anurag Murty Naredla, Heiko Röglin
-
Bethan Evans, Jared Tanner
-
I Guess That's Why They Call it the Blues: Causal Analysis for Audio Classifiers
David A. Kelly, Hana Chockler
-
Secure Intellicise Wireless Network: Agentic AI for Coverless Semantic Steganography Communication
Rui Meng, Song Gao, Bingxuan Xu, Xiaodong Xu, Jianqiao Chen, Nan Ma, Pei Xiao, Ping Zhang, Rahim Tafazolli
-
Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models
Zhining Liu, Tianyi Wang, Xiao Lin, Penghao Ouyang, Gaotang Li, Ze Yang, Hui Liu, Sumit Keswani, Vishwa Pardeshi, Huijun Zhao, Wei Fan, Hanghang Tong
-
Learning to Collaborate: An Orchestrated-Decentralized Framework for Peer-to-Peer LLM Federation
Inderjeet Singh, Eleonore Vissol-Gaudin, Andikan Otung, Motoyoshi Sekiya
-
TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion
Saideep Sreekumar, Zeng Wang, Akashdeep Saha, Weihua Xiao, Minghao Shao, Muhammad Shafique, Ozgur Sinanoglu, Ramesh Karri, Johann Knechtel
-
Qinkai Yu, Chong Zhang, Gaojie Jin, Tianjin Huang, Wei Zhou, Wenhui Li, Xiaobo Jin, Bo Huang, Yitian Zhao, Guang Yang, Gregory Y.H. Lip, Yalin Zheng, Aline Villavicencio, Yanda Meng
-
How does Graph Structure Modulate Membership-Inference Risk for Graph Neural Networks?
Megha Khosla
-
Falsifying Predictive Algorithm
Amanda Coston
-
Preventing the Collapse of Peer Review Requires Verification-First AI
Lei You, Lele Cao, Iryna Gurevych
-
Ee Wei Seah, Yongsen Zheng, Naga Nikshith, Mahran Morsidi, Gabriel Waikin Loh Matienzo, Nigel Gay, Akriti Vij, Benjamin Chua, En Qi Ng, Sharmini Johnson, Vanessa Wilfred, Wan Sie Lee, Anna Davidson, Catherine Devine, Erin Zorer, Gareth Holvey, Harry Coppock, James Walpole, Jerome Wynee, Magda Dubois, Michael Schmatz, Patrick Keane, Sam Deverett, Bill Black, Bo Yan, Bushra Sabir, Frank Sun, Hao Zhang, Harriet Farlow, Helen Zhou, Lingming Dong, Qinghua Lu, Seung Jang, Sharif Abuadbba, Simon O'Callaghan, Suyu Ma, Tom Howroyd, Cyrus Fung, Fatemeh Azadi, Isar Nejadgholi, Krishnapriya Vishnubhotla, Pulei Xiong, Saeedeh Lohrasbi, Scott Buffett, Shahrear Iqbal, Sowmya Vajjala, Anna Safont-Andreu, Luca Massarelli, Oskar van der Wal, Simon Möller, Agnes Delaborde, Joris Duguépéroux, Nicolas Rolin, Romane Gallienne, Sarah Behanzin, Tom Seimandi, Akiko Murakami, Takayuki Semitsu, Teresa Tsukiji, Angela Kinuthia, Michael Michie, Stephanie Kasaon, Jean Wangari, Hankyul Baek, Jaewon Noh, Kihyuk Nam, Sang Seo, Sungpil Shin, Taewhi Lee, Yongsu Kim
-
Data-Free Privacy-Preserving for LLMs via Model Inversion and Selective Unlearning
Xinjie Zhou, Zhihui Yang, Lechao Cheng, Sai Wu, Gang Chen
-
Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems
Mengyu Yao, Ziqi Zhang, Ning Luo, Shaofei Li, Yifeng Cai, Xiangqun Chen, Yao Guo, Ding Li
-
Mingyu Yu, Lana Liu, Zhehao Zhao, Wei Wang, Sujuan Qin
-
Counterfactual Training: Teaching Models Plausible and Actionable Explanations
Patrick Altmeyer, Aleksander Buszydlik, Arie van Deursen, Cynthia C. S. Liem
-
Uncertainty-guided Generation of Dark-field Radiographs
Lina Felsner, Henriette Bast, Tina Dorosti, Florian Schaff, Franz Pfeiffer, Daniela Pfeiffer, Julia Schnabel
-
Provable Robustness in Multimodal Large Language Models via Feature Space Smoothing
Song Xia, Meiwen Ding, Chenqi Kong, Wenhan Yang, Xudong Jiang
-
Attributing and Exploiting Safety Vectors through Global Optimization in Large Language Models
Fengheng Chu, Jiahao Chen, Yuhong Wang, Jun Wang, Zhihui Fu, Shouling Ji, Songze Li
-
SoK: Challenges in Tabular Membership Inference Attacks
Cristina Pêra, Tânia Carvalho, Maxime Cordy, Luís Antunes
-
On damage of interpolation to adversarial robustness in regression
Jingfu Peng, Yuhong Yang
-
Nazmul Islam, Mohammad Zulkernine
-
A New Paradigm for Trusted Respiratory Monitoring Via Consumer Electronics-grade Radar Signals
Xinyu Li, Jinyang Huang, Feng-Qi Cui, Meng Wang, Peng Zhao, Meng Li, Dan Guo, Meng Wang
-
NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs
Khoa Nguyen, Khiem Ton, NhatHai Phan, Issa Khalil, Khang Tran, Cristian Borcea, Ruoming Jin, Abdallah Khreishah, My T. Thai
-
CodeGuard: Improving LLM Guardrails in CS Education
Nishat Raihan, Noah Erdachew, Jayoti Devi, Joanna C. S. Santos, Marcos Zampieri
-
Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems
Shuhua Yang, Jiahao Zhang, Yilong Wang, Dongwon Lee, Suhang Wang
-
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation
Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Yunxiang Zhang, Moontae Lee, Hao Peng, Lu Wang, Honglak Lee
-
Semantic-Guided Unsupervised Video Summarization
Haizhou Liu, Haodong Jin, Yiming Wang, Hui Yu
-
NeuroFilter: Privacy Guardrails for Conversational LLM Agents
Saswat Das, Ferdinando Fioretto
-
Yijin Zhou, Xiaoya Lu, Dongrui Liu, Junchi Yan, Jing Shao
-
Transfer Learning from One Cancer to Another via Deep Learning Domain Adaptation
Justin Cheung, Samuel Savine, Calvin Nguyen, Lin Lu, Alhassan S. Yasin
-
Andrea Protani, Riccardo Taiello, Marc Molina Van Den Bosch, Luigi Serio
-
Auditing Language Model Unlearning via Information Decomposition
Anmol Goel, Alan Ritter, Iryna Gurevych
-
BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation
Andrey Moskalenko, Danil Kuznetsov, Irina Dudko, Anastasiia Iasakova, Nikita Boldyrev, Denis Shepelev, Andrei Spiridonov, Andrey Kuznetsov, Vlad Shakhuro
-
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
Anmol Goel, Cornelius Emde, Sangdoo Yun, Seong Joon Oh, Martin Gubri
-
Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks
Sahar Tahmasebi, Eric Müller-Budack, Ralph Ewerth
-
Safeguarding Facial Identity against Diffusion-based Face Swapping via Cascading Pathway Disruption
Liqin Wang, Qianyue Hu, Wei Lu, Xiangyang Luo
-
Erosion Attack for Adversarial Training to Enhance Semantic Segmentation Robustness
Yufei Song, Ziqi Zhou, Menghao Deng, Yifan Hu, Shengshan Hu, Minghui Li, Leo Yu Zhang
-
Deep Leakage with Generative Flow Matching Denoiser
Isaac Baglin, Xiatian Zhu, Simon Hadfield
-
Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning
Shuonan Yang, Yuchen Zhang, Zeyu Fu
-
SpooFL: Spoofing Federated Learning
Isaac Baglin, Xiatian Zhu, Simon Hadfield
-
Zhihao Chen, Zirui Gong, Jianting Ning, Yanjun Zhang, Leo Yu Zhang
-
HyperNet-Adaptation for Diffusion-Based Test Case Generation
Oliver Weißl, Vincenzo Riccio, Severin Kacianka, Andrea Stocco
-
STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model
Yuang Qi, Na Zhao, Qiyi Yao, Benlong Wu, Weiming Zhang, Nenghai Yu, Kejiang Chen
-
Multi-Targeted Graph Backdoor Attack
Md Nabi Newaz Khan, Abdullah Arafat Miah, Yu Bi
-
QUAIL: Quantization Aware Unlearning for Mitigating Misinformation in LLMs
Himanshu Mishra, Kanwal Mehreen
-
AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains
Adam Szelestey, Sofie van Engelen, Tianhao Huang, Justin Snelders, Qintao Zeng, Songgaojun Deng
-
Hybrid Vision Transformer_GAN Attribute Neutralizer for Mitigating Bias in Chest X_Ray Diagnosis
Jobeal Solomon, Ali Mohammed Mansoor Alsahag, Seyed Sahand Mohammadi Ziabari
-
Jiazhu Xie, Bowen Li, Heyu Fu, Chong Gao, Ziqi Xu, Fengling Han
-
Single-Pixel Vision-Language Model for Intrinsic Privacy-Preserving Behavioral Intelligence
Hongjun An, Yiliang Song, Jiawei Shao, Zhe Sun, Xuelong Li
-
Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique
Joyjit Roy, Samaresh Kumar Singh
-
AgenticRed: Optimizing Agentic Systems for Automated Red-teaming
Jiayi Yuan, Jonathan Nöther, Natasha Jaques, Goran Radanović
-
Christopher Kao, Vanshika Vats, James Davis
-
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
Jackson Kaunismaa, Avery Griffin, John Hughes, Christina Q. Knight, Mrinank Sharma, Erik Jones
-
Fan Huang, Haewoon Kwak, Jisun An
-
Quadratic Upper Bound for Boosting Robustness
Euijin You, Hyang-Won Lee
-
Asymmetric regularization mechanism for GAN training with Variational Inequalities
Spyridon C. Giagtzoglou, Mark H.M. Winands, Barbara Franci
-
Nikita Kuzmin, Songting Liu, Kong Aik Lee, Eng Siong Chng
-
Kyung Ho Lim, Byung-Hoon Kim
-
Zhaopeng Zhang, Pengcheng Sun, Lan Zhang, Chen Tang, Jiewei Lai, Yunhao Wang, Hui Jin
-
The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning
Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang
-
Feng Ding, Wenhui Yi, Xinan He, Mengyao Xiao, Jianfeng Xu, Jianqiang Du
-
VTONGuard: Automatic Detection and Authentication of AI-Generated Virtual Try-On Content
Shengyi Wu, Yan Hong, Shengyao Chen, Zheng Wang, Xianbing Sun, Jiahui Zhan, Jun Lan, Jianfu Zhang
-
Equivariant Learning for Unsupervised Image Dehazing
Zhang Wen, Jiangwei Xie, Dongdong Chen
-
Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning
Tairan Huang, Qingqing Ye, Yulin Jin, Jiawei Lian, Yi Wang, Haibo Hu
-
DRGW: Learning Disentangled Representations for Robust Graph Watermarking
Jiasen Li, Yanwei Liu, Zhuoyi Shang, Xiaoyan Gu, Weiping Wang
-
Orthogonium : A Unified, Efficient Library of Orthogonal and 1-Lipschitz Building Blocks
Thibaut Boissin (IRIT-MISFIT), Franck Mamalet, Valentin Lafargue (ANITI, IMT), Mathieu Serrurier (IRIT-MISFIT)
-
Xiaohong Yang, Tong Xie, Minghui Liwang, Chikai Shang, Yang Lu, Zhenzhen Jiao, Liqun Fu, Seyyedali Hosseinalipour
-
Inverting Self-Organizing Maps: A Unified Activation-Based Framework
Alessandro Londei, Matteo Benati, Denise Lanzieri, Vittorio Loreto
-
PAC-Private Responses with Adversarial Composition
Xiaochen Zhu, Mayuri Sridhar, Srinivas Devadas
-
SecureSplit: Mitigating Backdoor Attacks in Split Learning
Zhihao Dou, Dongfei Cui, Weida Wang, Anjun Gao, Yueyang Quan, Mengyao Ma, Viet Vo, Guangdong Bai, Zhuqing Liu, Minghong Fang
-
When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models
Ruihan Hu, Yu-Ming Shang, Wei Luo, Ye Tao, Xi Zhang
-
PINA: Prompt Injection Attack against Navigation Agents
Jiani Liu, Yixin He, Lanlan Fan, Qidi Zhong, Yushi Cheng, Meng Zhang, Yanjiao Chen, Wenyuan Xu
-
Huadi Zheng, Li Cheng, Yan Ding
-
Bingxin Xu, Yuzhang Shang, Binghui Wang, Emilio Ferrara
-
Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum
Mohammed Salah Al-Radhi, Riad Larbi, Mátyás Bartalis, Géza Németh
-
How Worst-Case Are Adversarial Attacks? Linking Adversarial and Statistical Robustness
Giulio Rossolini
-
LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models
Mengyu Sun, Ziyuan Yang, Andrew Beng Jin Teoh, Junxu Liu, Haibo Hu, Yi Zhang
-
Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs
Yiyang Lu, Jinwen He, Yue Zhao, Kai Chen, Ruigang Liang
-
Uncovering and Understanding FPR Manipulation Attack in Industrial IoT Networks
Mohammad Shamim Ahsan, Peng Liu
-
LLM Security and Safety: Insights from Homotopy-Inspired Prompt Obfuscation
Luis Lazo, Hamed Jelodar, Roozbeh Razavi-Far
-
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models
Rishit Chugh
-
SoundBreak: A Systematic Study of Audio-Only Adversarial Attacks on Trimodal Models
Aafiya Hussain, Gaurav Srivastava, Alvi Ishmam, Zaber Hakim, Chris Thomas
-
SPGCL: Effective Graph Contrastive Learning via SVD-Guided Structural Perturbation
Hao Deng, Yingping Li, Shuiping Gou, Bo Liu
-
MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction
Wenqi Zhang, Yulin Shen, Changyue Jiang, Jiarun Dai, Geng Hong, Xudong Pan
-
Diego Gosmar, Deborah A. Dahl
-
Context and Transcripts Improve Detection of Deepfake Audios of Public Figures
Chongyang Gao, Marco Postiglione, Julian Baldwin, Natalia Denisenko, Isabel Gortner, Luke Fosdick, Chiara Pulice, Sarit Kraus, V.S. Subrahmanian
-
Your Privacy Depends on Others: Collusion Vulnerabilities in Individual Differential Privacy
Johannes Kaiser, Alexander Ziller, Eleni Triantafillou, Daniel Rückert, Georgios Kaissis
-
Membership Inference Test: Auditing Training Data in Object Classification Models
Gonzalo Mancera, Daniel DeAlcala, Aythami Morales, Ruben Tolosana, Julian Fierrez
-
On the Evidentiary Limits of Membership Inference for Copyright Auditing
Murat Bilgehan Ertan, Emirhan Böge, Min Chen, Kaleel Mahmood, Marten van Dijk
-
Chengyin Hu, Xiang Chen, Zhe Jia, Weiwen Shi, Fengyu Zhang, Jiujiang Guo, Yiwei Wei
-
Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift
Daniel Vennemeyer, Punya Syon Pandey, Phan Anh Duong, Michael Umeokoli, Samuel Ratnam
-
Race, Ethnicity and Their Implication on Bias in Large Language Models
Shiyue Hu, Ruizhe Li, Yanjun Gao
-
ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation
Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta, Iryna Gurevych
-
Adversarial Alignment: Ensuring Value Consistency in Large Language Models for Sensitive Domains
Yuan Gao, Zhigang Liu, Xinyu Yao, Bo Chen, Xiaobing Zhao
-
Unlearning in LLMs: Methods, Evaluation, and Open Challenges
Tyler Lizzo, Larry Heck
-
OI-Bench: An Option Injection Benchmark for Evaluating LLM Susceptibility to Directive Interference
Yow-Fu Liou, Yu-Chien Tang, Yu-Hsiang Liu, An-Zi Yen
-
Sockpuppetting: Jailbreaking LLMs Without Optimization Through Output Prefix Injection
Asen Dotsinski, Panagiotis Eustratiadis
-
Trust Me, I'm an Expert: Decoding and Steering Authority Bias in Large Language Models
Priyanka Mary Mammen, Emil Joswin, Shankar Venkitachalam
-
VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness
Qimao Chen, Fang Li, Shaoqing Xu, Zhiyi Lai, Zixun Xie, Yuechen Luo, Shengyin Jiang, Hanbing Li, Long Chen, Bing Wang, Yi Zhang, Zhi-Xin Yang
-
Proxy Robustness in Vision Language Models is Effortlessly Transferable
Xiaowei Fu, Fuxiang Huang, Lei Zhang
-
Chan Naseeb, Adeel Ashraf Cheema, Hassan Sami, Tayyab Afzal, Muhammad Omair, Usman Habib
-
MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic
Wei Wang, Quoc-Toan Ly, Chong Yu, Jun Bai
-
NeuroShield: A Neuro-Symbolic Framework for Adversarial Robustness
Ali Shafiee Sarvestani, Jason Schmidt, Arman Roohi
-
Verifying Local Robustness of Pruned Safety-Critical Networks
Minh Le, Phuong Cao
-
Mohoshin Ara Tahera, Sabbir Rahman, Shuvalaxmi Dass, Sharif Ullah, Mahmoud Abouyessef
-
Adversarial News and Lost Profits: Manipulating Headlines in LLM-Driven Algorithmic Trading
Advije Rizvani, Giovanni Apruzzese, Pavel Laskov
-
DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems
Suyang Sun, Weifei Jin, Yuxin Cao, Wei Song, Jie Hao
-
KinGuard: Hierarchical Kinship-Aware Fingerprinting to Defend Against Large Language Model Stealing
Zhenhua Xu, Xiaoning Tian, Wenjun Zeng, Wenpeng Xing, Tianliang Lu, Gaolei Li, Chaochao Chen, Meng Han
-
PrivFly: A Privacy-Preserving Self-Supervised Framework for Rare Attack Detection in IoFT
Safaa Menssouri, El Mehdi Amhoud
-
Xiaolei Zhang, Xiaojun Jia, Liquan Chen, Songze Li
-
CORVUS: Red-Teaming Hallucination Detectors via Internal Signal Camouflage in Large Language Models
Nay Myat Min, Long H. Pham, Hongyu Zhang, Jun Sun
-
In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
Anudeex Shetty, Aditya Joshi, Salil S. Kanhere
-
Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers
Yixuan Du, Chenxiao Yu, Haoyan Xu, Ziyi Wang, Yue Zhao, Xiyang Hu
-
A Two-Stage Globally-Diverse Adversarial Attack for Vision-Language Pre-training Models
Wutao Chen, Huaqin Zou, Chen Wan, Lifeng Huang
-
Efficient Privacy-Preserving Retrieval Augmented Generation with Distance-Preserving Encryption
Huanyi Ye, Jiale Guo, Ziyao Liu, Kwok-Yan Lam
-
Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao
-
Adversarial Defense in Vision-Language Models: An Overview
Xiaowei Fu, Lei Zhang
-
Ashish Raj Shekhar, Shiven Agarwal, Priyanuj Bordoloi, Yash Shah, Tejas Anvekar, Vivek Gupta
-
Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs
Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu
-
De-Anonymization at Scale via Tournament-Style Attribution
Lirui Zhang, Huishuai Zhang
-
Privacy-Preserving Federated Learning with Verifiable Fairness Guarantees
Mohammed Himayath Ali, Mohammed Aqib Abdullah, Syed Muneer Hussin, Mohammed Mudassir Uddin, Shahnawaz Alam
-
Towards Robust Universal Perturbation Attacks: A Float-Coded, Penalty-Driven Evolutionary Approach
Shiqi Wang, Mahdi Khosravy, Neeraj Gupta, Olaf Witkowski
-
Xiaofeng Luo, Jiayi He, Jiawen Kang, Ruichen Zhang, Zhaoshui He, Ekram Hossain, Dong In Kim
-
TrojanPraise: Jailbreak LLMs via Benign Fine-Tuning
Zhixin Xie, Xurui Song, Jun Luo
-
ASAS-BridgeAMM: Trust-Minimized Cross-Chain Bridge AMM with Failure Containment
Shengwei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski
-
DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing
Jinwei Hu, Shiyuan Meng, Yi Dong, Xiaowei Huang
-
Stablecoin Design with Adversarial-Robust Multi-Agent Systems via Trust-Weighted Signal Aggregation
Shengwei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski
-
Privacy-Preserving Federated Learning with Verifiable Fairness Guarantees
Mohammed Himayath Ali, Mohammed Aqib Abdullah, Syed Muneer Hussain, Mohammed Mudassir Uddin, Shahnawaz Alam
-
Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
Yi Qian, Kunwei Qian, Xingbang He, Ligeng Chen, Jikang Zhang, Tiantai Zhang, Haiyang Wei, Linzhang Wang, Hao Wu, Bing Mao
-
Xiaomei Zhang, Zhaoxi Zhang, Leo Yu Zhang, Yanjun Zhang, Guanhong Tao, Shirui Pan
-
\textit{FocaLogic}: Logic-Based Interpretation of Visual Model Decisions
Chenchen Zhao, Muxi Chen, Qiang Xu
-
Powerful Training-Free Membership Inference Against Autoregressive Language Models
David Ilić, David Stanojević, Kostadin Cvejoski
-
SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data
Bing Hu, Yixin Li, Asma Bahamyirou, Helen Chen
-
Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Michele Panariello, Xin Wang, Nicholas Evans, Emmanuel Vincent, Junichi Yamagishi, Massimiliano Todisco
-
Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence
Kaijie Mo, Siddhartha Venkatayogi, Chantal Shaib, Ramez Kouzy, Wei Xu, Byron C. Wallace, Junyi Jessy Li
-
Haonan An, Guang Hua, Wei Du, Hangcheng Cao, Yihang Tao, Guowen Xu, Susanto Rahardja, Yuguang Fang
-
A Unified Masked Jigsaw Puzzle Framework for Vision and Language Models
Weixin Ye, Wei Wang, Yahui Liu, Yue Song, Bin Ren, Wei Bi, Rita Cucchiara, Nicu Sebe
-
EmoLat: Text-driven Image Sentiment Transfer via Emotion Latent Space
Jing Zhang, Bingjie Fan, Jixiang Zhu, Zhe Wang
-
RCDN: Real-Centered Detection Network for Robust Face Forgery Identification
Wyatt McCurdy, Xin Zhang, Yuqi Song, Min Gao
-
Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits
Ming Shi
-
Adversarial Drift-Aware Predictive Transfer: Toward Durable Clinical AI
Xin Xiong, Zijian Guo, Haobo Zhu, Chuan Hong, Jordan W Smoller, Tianxi Cai, Molei Liu
-
Richik Chakraborty, Lawrence Liu, Syed Hasnain
-
Gradient Structure Estimation under Label-Only Oracles via Spectral Sensitivity
Jun Liu, Leo Yu Zhang, Fengpeng Li, Isao Echizen, Jiantao Zhou
-
Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits
Ming Shi
-
Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents
Kaiyu Zhou, Yongsen Zheng, Yicheng He, Meng Xue, Xueluan Gong, Yuji Wang, Kwok-Yan Lam
-
Spectral Characterization and Mitigation of Sequential Knowledge Editing Collapse
Chi Zhang, Mengqi Zhang, Xiaotian Ye, Runxi Cheng, Zisheng Zhou, Ying Zhou, Pengjie Ren, Zhumin Chen
-
Aiman Al Masoud, Marco Arazzi, Antonino Nocera
-
Marco Arazzi, Antonino Nocera
-
Hierarchical Orthogonal Residual Spread for Precise Massive Editing in Large Language Models
Xiaojie Gu, Guangxu Chen, Yuheng Yang, Jingxin Han, Andi Zhang
-
Eilam Shapira, Roi Reichart, Moshe Tennenholtz
-
Building Production-Ready Probes For Gemini
János Kramár, Joshua Engels, Zheng Wang, Bilal Chughtai, Rohin Shah, Neel Nanda, Arthur Conmy
-
Membership Inference on LLMs in the Wild
Jiatong Yi, Yanyang Li
-
AJAR: Adaptive Jailbreak Architecture for Red-teaming
Yipu Dou, Wang Yang
-
VidLeaks: Membership Inference Attacks Against Text-to-Video Models
Li Wang, Wenyu Chen, Ning Yu, Zheng Li, Shanqing Guo
-
Backdoor Attacks on Multi-modal Contrastive Learning
Simi D Kuniyilh, Rita Machacy
-
QUPID: A Partitioned Quantum Neural Network for Anomaly Detection in Smart Grid
Hoang M. Ngo, Tre' R. Jeter, Jung Taek Seo, My T. Thai
-
Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG
Haoze Guo, Ziqi Wei
-
Risk-Aware Human-in-the-Loop Framework with Adaptive Intrusion Response for Autonomous Vehicles
Dawood Wasif, Terrence J. Moore, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Frederica F. Nelson, Jin-Hee Cho
-
Attesting Model Lineage by Consisted Knowledge Evolution with Fine-Tuning Trajectory
Zhuoyi Shang, Jiasen Li, Pengzhen Chen, Yanwei Liu, Xiaoyan Gu, Weiping Wang
-
Telling Human and Machine Handwriting Apart
Luis A. Leiva, Moises Diaz, Nuwan T. Attygalle, Miguel A. Ferrer, Rejean Plamondon
-
Zhikang Shen, Jianrong Lu, Haiyuan Wan, Jianhai Chen
-
Measurement for Opaque Systems: Multi-source Triangulation with Interpretable Machine Learning
Margaret Foster
-
Xingjun Ma, Yixu Wang, Hengyuan Xu, Yutao Wu, Yifan Ding, Yunhan Zhao, Zilong Wang, Jiabin Hua, Ming Wen, Jianan Liu, Ranjie Duan, Yifeng Gao, Yingshui Tan, Yunhao Chen, Hui Xue, Xin Wang, Wei Cheng, Jingjing Chen, Zuxuan Wu, Bo Li, Yu-Gang Jiang
-
Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing
Yinzhi Zhao, Ming Wang, Shi Feng, Xiaocui Yang, Daling Wang, Yifei Zhang
-
Ziyi Ding, Chenfei Ye-Hao, Zheyuan Wang, Xiao-Ping Zhang
-
Understanding and Preserving Safety in Fine-Tuned LLMs
Jiawen Zhang, Yangfan Hu, Kejia Chen, Lipeng He, Jiachen Ma, Jian Lou, Dan Li, Jian Liu, Xiaohu Yang, Ruoxi Jia
-
ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack
Hao Li, Yankai Yang, G. Edward Suh, Ning Zhang, Chaowei Xiao
-
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, Leo Zhang
-
Contextual StereoSet: Stress-Testing Bias Alignment Robustness in Large Language Models
Abhinaba Basu, Pavan Chakraborty
-
Adversarial Evasion Attacks on Computer Vision using SHAP Values
Frank Mollard, Marcus Becker, Florian Roehrbein
-
Yutao Mou, Zhangchi Xue, Lijun Li, Peiyang Liu, Shikun Zhang, Wei Ye, Jing Shao
-
The Straight and Narrow: Do LLMs Possess an Internal Moral Path?
Luoming Hu, Jingjie Zeng, Liang Yang, Hongfei Lin
-
The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models
Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, Jack Lindsey
-
Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, Nazia Tasnim, Farig Sadeque
-
Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay
Hao Wang, Yanting Wang, Hao Li, Rui Li, Lei Sha
-
Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models
Peng-Fei Zhang, Zi Huang
-
SRAW-Attack: Space-Reweighted Adversarial Warping Attack for SAR Target Recognition
Yiming Zhang, Weibo Qin, Yuntian Liu, Feng Wang
-
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD
Murat Bilgehan Ertan, Marten van Dijk
-
CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning
Yuanjie Zhao, Junnan Qiu, Yue Ding, Jie Li
-
Mohoshin Ara Tahera, Karamveer Singh Sidhu, Shuvalaxmi Dass, Sajal Saha
-
Privacy Enhanced PEFT: Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD
Pradip Kunwar, Minh Vu, Maanak Gupta, Manish Bhattarai
-
Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection
Yuansen Liu, Yixuan Tang, Anthony Kum Hoe Tun
-
Differentially Private Inference for Longitudinal Linear Regression
Getoar Sopa, Marco Avella Medina, Cynthia Rush
-
SecMLOps: A Comprehensive Framework for Integrating Security Throughout the MLOps Lifecycle
Xinrui Zhang, Pincan Zhao, Jason Jaskolka, Heng Li, Rongxing Lu
-
Yuting Liang, Ke Yi
-
DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference
Parisa Rabbani, Priyam Sahoo, Ruben Mathew, Aishee Mondal, Harshita Ketharaman, Nimet Beyza Bozdag, Dilek Hakkani-Tür
-
Chetan Pathade, Vinod Dhimam, Sheheryar Ahmad, Ilsa Lareb
-
Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models
Peng-Fei Zhang, Zi Huang
-
STaR: Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models
Jingjing Zhou, Gaoxiang Cong, Li Su, Liang Li
-
Blue Teaming Function-Calling Agents
Greta Dolcetti, Giulio Zizzo, Sergio Maffeis
-
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware
Ben Nassi, Bruce Schneier, Oleg Brodt
-
UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning
Feng Zhang, Shijia Li, Chunmao Zhang, Zhanyu Ma, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He, Jingwen Xu, Han Liu
-
StegoStylo: Squelching Stylometric Scrutiny through Steganographic Stitching
Robert Dilworth
-
From Snow to Rain: Evaluating Robustness, Calibration, and Complexity of Model-Based Robust Training
Josué Martínez-Martínez, Olivia Brown, Giselle Zeno, Pooya Khorrami, Rajmonda Caceres
-
Identifying Models Behind Text-to-Image Leaderboards
Ali Naseh, Yuefeng Peng, Anshuman Suri, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
-
BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning
Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang
-
Merged Bitcoin: Proof of Work Blockchains with Multiple Hash Types
Christopher Blake, Chen Feng, Xuachao Wang, Qianyu Yu
-
SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails
Zhiyi Mou, Jingyuan Yang, Zeheng Qian, Wangze Ni, Tianfang Xiao, Ning Liu, Chen Zhang, Zhan Qin, Kui Ren
-
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents
Hanna Foerster, Robert Mullins, Tom Blanchard, Nicolas Papernot, Kristina Nikolić, Florian Tramèr, Ilia Shumailov, Cheng Zhang, Yiren Zhao
-
Shahrzad Sayyafzadeh, Hongmei Chi, Shonda Bernadin
-
Malware Classification using Diluted Convolutional Neural Network with Fast Gradient Sign Method
Ashish Anand, Bhupendra Singh, Sunil Khemka, Bireswar Banerjee, Vishi Singh Bhatia, Piyush Ranjan
-
Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents
Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph
-
Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety
Caitlin A. Stamatis, Jonah Meyerhoff, Richard Zhang, Olivier Tieleman, Matteo Malgaroli, Thomas D. Hull
-
Bayesian Robust Financial Trading with Adversarial Synthetic Market Data
Haochong Xia, Simin Li, Ruixiao Xu, Zhixia Zhang, Hongxiang Wang, Zhiqian Liu, Teng Yao Long, Molei Qin, Chuqiao Zong, Bo An
-
Test-Time Detoxification without Training or Learning Anything
Baturay Saglam, Dionysis Kalogerias
-
Oleg Brodt, Elad Feldman, Bruce Schneier, Ben Nassi
-
BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning
Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang
-
Evaluating the Ability of Explanations to Disambiguate Models in a Rashomon Set
Kaivalya Rawal, Eoin Delaney, Zihao Fu, Sandra Wachter, Chris Russell
-
Q-realign: Piggybacking Realignment on Quantization for Safe and Efficient LLM Deployment
Qitao Tan, Xiaoying Song, Ningxi Cheng, Ninghao Liu, Xiaoming Zhai, Lingzi Hong, Yanzhi Wang, Zhen Xiang, Geng Yuan
-
DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection
Zhenhua Xu, Yiran Zhao, Mengting Zhong, Dezhang Kong, Changting Lin, Tong Qiao, Meng Han
-
SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models
Renyang Liu, Kangjie Chen, Han Qiu, Jie Zhang, Kwok-Yan Lam, Tianwei Zhang, See-Kiong Ng
-
RULERS: Locked Rubrics and Evidence-Anchored Scoring for Robust LLM Evaluation
Yihan Hong, Huaiyuan Yao, Bolin Shen, Wanpeng Xu, Hua Wei, Yushun Dong
-
STAR: Detecting Inference-time Backdoors in LLM Reasoning via State-Transition Amplification Ratio
Seong-Gyu Park, Sohee Park, Jisu Lee, Hyunsik Na, Daeseon Choi
-
RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis
Zhengwei Tao, Bo Li, Jialong Wu, Guochen Yan, Huanyao Zhang, Jiahao Xu, Haitao Mi, Wentao Zhang
-
Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, Liming Zhu, Wenjie Zhang
-
Além do Desempenho: Um Estudo da Confiabilidade de Detectores de Deepfakes
Lucas Lopes, Rayson Laroca, André Grégio
-
RAVEN: Erasing Invisible Watermarks via Novel View Synthesis
Fahad Shamshad, Nils Lukas, Karthik Nandakumar
-
Baiting AI: Deceptive Adversary Against AI-Protected Industrial Infrastructures
Aryan Pasikhani, Prosanta Gope, Yang Yang, Shagufta Mehnaz, Biplab Sikdar
-
MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization
Yongtong Gu, Songze Li, Xia Hu
-
Double Strike: Breaking Approximation-Based Side-Channel Countermeasures for DNNs
Lorenzo Casalino, Maria Méndez Real, Jean-Christophe Prévotet, Rubén Salvador
-
Evaluating Role-Consistency in LLMs for Counselor Training
Eric Rudolph, Natalie Engert, Jens Albrecht
-
ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models
Zhenhua Xu, Haobo Zhang, Zhebo Wang, Qichen Liu, Haitao Xu, Wenpeng Xing, Meng Han
-
Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu
-
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Seongyun Lee, Yongrae Jo, Minju Seo, Moontae Lee, Minjoon Seo
-
Yixiao Peng, Hao Hu, Feiyang Li, Xinye Cao, Yingchang Jiang, Jipeng Tang, Guoshun Nan, Yuling Liu
-
Safe-FedLLM: Delving into the Safety of Federated Large Language Models
Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang
-
Defenses Against Prompt Attacks Learn Surface Heuristics
Shawn Li, Chenxiao Yu, Zhiyu Ni, Hao Li, Charith Peris, Chaowei Xiao, Yue Zhao
-
Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment
Haozhong Wang, Zhuo Li, Yibo Yang, He Zhao, Hongyuan Zha, Dandan Guo
-
BlindU: Blind Machine Unlearning without Revealing Erasing Data
Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu
-
Xinyi Wu, Geng Hong, Yueyue Chen, MingXuan Liu, Feier Jin, Xudong Pan, Jiarun Dai, Baojun Liu
-
Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation
Jiao Xu, Xin Chen, Lihe Zhang
-
MCP-ITP: An Automated Framework for Implicit Tool Poisoning in MCP
Ruiqi Li, Zhiqiang Wang, Yunhao Yao, Xiang-Yang Li
-
Graph Inference Towards ICD Coding
Xiaoxiao Deng
-
Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion
Li Zheng, Liangbin Xie, Jiantao Zhou, He YiMin
-
Proof of Reasoning for Privacy Enhanced Federated Blockchain Learning at the Edge
James Calo, Benny Lo
-
SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations
Mohammed Himayath Ali, Mohammed Aqib Abdullah, Mohammed Mudassir Uddin, Shahnawaz Alam
-
Reward-Preserving Attacks For Robust Reinforcement Learning
Lucas Schott, Elies Gherbi, Hatem Hajri, Sylvain Lamprier
-
Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks
Xingran Chen, Parimal Parag, Rohit Bhagat, Salim El Rouayheb
-
MacPrompt: Maraconic-guided Jailbreak against Text-to-Image Models
Xi Ye, Yiwen Liu, Lina Wang, Run Wang, Geying Yang, Yufei Hou, Jiayi Yu
-
Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety
Can Jin, Rui Wu, Tong Che, Qixin Zhang, Hongwu Peng, Jiahui Zhao, Zhenting Wang, Wenqi Wei, Ligong Han, Zhao Zhang, Yuan Cao, Ruixiang Tang, Dimitris N. Metaxas
-
Semantic Gravity Wells: Why Negative Constraints Backfire
Shailesh Rana
-
LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing
Surya Subramani, Hashim Ali, Hafiz Malik
-
SecMoE: Communication-Efficient Secure MoE Inference via Select-Then-Compute
Bowen Shen, Yuyue Chen, Peng Yang, Bin Zhang, Xi Zhang, Zoe L. Jiang
-
Paraphrasing Adversarial Attack on LLM-as-a-Reviewer
Masahiro Kaneko
-
Yunrui Gu, Zhenzhe Gao, Cong Kong, Zhaoxia Yin
-
Hallucinations Live in Variance
Aaron R. Flouro, Shawn P. Chadwick
-
Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems
Hongyan Chang, Ergute Bao, Xinjian Luo, Ting Yu
-
The AI Cognitive Trojan Horse: How Large Language Models May Bypass Human Epistemic Vigilance
Andrew D. Maynard
-
When Should We Introduce Safety Interventions During Pretraining?
Dylan Sam, Sachin Goyal, Pratyush Maini, Alexander Robey, J. Zico Kolter
-
A Backpropagation-Free Feedback-Hebbian Network for Continual Learning Dynamics
Josh Li
-
Robust Mean Estimation under Quantization
Pedro Abdalla, Junren Chen
-
United We Defend: Collaborative Membership Inference Defenses in Federated Learning
Li Bai, Junxu Liu, Sen Zhang, Xinwei Zhang, Qingqing Ye, Haibo Hu
-
How Secure is Secure Code Generation? Adversarial Prompts Put LLM Defenses to the Test
Melissa Tessa, Iyiola E. Olatunji, Aicha War, Jacques Klein, Tegawendé F. Bissyandé
-
When Should We Introduce Safety Interventions During Pretraining?
Dylan Sam, Sachin Goyal, Pratyush Maini, Alexander Robey, J. Zico Kolter
-
SafePro: Evaluating the Safety of Professional-Level AI Agents
Kaiwen Zhou, Shreedhar Jangam, Ashwin Nagarajan, Tejas Polu, Suhas Oruganti, Chengzhi Liu, Ching-Chen Kuo, Yuting Zheng, Sravana Narayanaraju, Xin Eric Wang
-
SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use
Pratyush Desai, Luoxi Tang, Yuqiao Meng, Zhaohan Xi
-
PRISP: Privacy-Safe Few-Shot Personalization via Lightweight Adaptation
Junho Park, Dohoon Kim, Taesup Moon
-
Hongjun An, Yiliang Song, Jiangan Chen, Jiawei Shao, Chi Zhang, Xuelong Li
-
Qiang Zhang, Elena Emma Wang, Jiaming Li, Xichun Wang
-
Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection
Qingyu Liu, Yitao Zhang, Zhongjie Ba, Chao Shuai, Peng Cheng, Tianhang Zheng, Zhibo Wang
-
On the Adversarial Robustness of 3D Large Vision-Language Models
Chao Liu, Ngai-Man Cheung
-
Chen Gong, Kecen Li, Zinan Lin, Tianhao Wang
-
VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference
Minfeng Qi, Dongyang He, Qin Wang, Lefeng Zhang
-
StablePDENet: Enhancing Stability of Operator Learning for Solving Differential Equations
Chutian Huang, Chang Ma, Kaibo Wang, Yang Xiang
-
Stavros Tsimpoukis, Dimitrios Tyrovolas, Sotiris Ioannidis, Maria Kafesaki, Ian F. Akyildiz, George K. Karagiannidis, Christos K. Liaskos
-
Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning
Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo, Trong Nghia Hoang, Hyuk-Yoon Kwon, My T. Thai
-
Mara Pleasure, Ekaterina Redekop, Dhakshina Ilango, Zichen Wang, Vedrana Ivezic, Kimberly Flores, Israa Laklouk, Jitin Makker, Gregory Fishbein, Anthony Sisk, William Speier, Corey W. Arnold
-
UMLoc: Uncertainty-Aware Map-Constrained Inertial Localization with Quantified Bounds
Mohammed S. Alharbi, Shinkyu Park
-
Imtiaz Ali Soomro, Hamood Ur Rehman, S. Jawad Hussain ID, Adeel Iqbal, Waqas Khalid, Heejung Yu ID
-
Incentive Mechanism Design for Privacy-Preserving Decentralized Blockchain Relayers
Boutaina Jebari, Khalil Ibrahimi, Hamidou Tembine, Mounir Ghogho
-
Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning
Quan Minh Nguyen, Min-Seon Kim, Hoang M. Ngo, Trong Nghia Hoang, Hyuk-Yoon Kwon, My T. Thai
-
G M Shahariar, Zabir Al Nazi, Md Olid Hasan Bhuiyan, Zhouxing Shi
-
Zhaoqi Wang, Zijian Zhang, Daqing He, Pengtao Kou, Xin Li, Jiamou Liu, Jincheng An, Yong Liu
-
STELP: Secure Transpilation and Execution of LLM-Generated Programs
Swapnil Shinde, Sahil Wadhwa, Andy Luo, Emily Chen
-
HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors
Jingxiao Yang, Ping He, Tianyu Du, Sun Bing, Xuhong Zhang
-
The Echo Chamber Multi-Turn LLM Jailbreak
Ahmad Alobaid (NeuralTrust), Martí Jordà Roca (NeuralTrust), Carlos Castillo (ICREA and UPF), Joan Vendrell (NeuralTrust)
-
VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit
Junda Lin, Zhaomeng Zhou, Zhi Zheng, Shuochen Liu, Tong Xu, Yong Chen, Enhong Chen
-
SAFE: Secure and Accurate Federated Learning for Privacy-Preserving Brain-Computer Interfaces
Tianwang Jia, Xiaoqing Chen, Dongrui Wu
-
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Haoming Xu, Ningyuan Zhao, Yunzhi Yao, Weihong Xu, Hongru Wang, Xinle Deng, Shumin Deng, Jeff Z. Pan, Huajun Chen, Ningyu Zhang
-
Tianshi Li
-
The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence
Herun Wan, Jiaying Wu, Minnan Luo, Fanxiao Li, Zhi Zeng, Min-Yen Kan
-
Adrian Serrano, Erwan Umlil, Ronan Thomas
-
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
Songze Li, Ruishi He, Xiaojun Jia, Jun Wang, Zhihui Fu
-
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Malik, Shuchi Mishra
-
Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification
Zenghao Duan, Zhiyi Yin, Zhichao Shi, Liang Pang, Shaoling Jing, Zihe Huang, Jiayi Wu, Yu Yan, Jingcheng Deng, Huawei Shen, Xueqi Cheng
-
Agentic AI Microservice Framework for Deepfake and Document Fraud Detection in KYC Pipelines
Chandra Sekhar Kubam
-
Tara Bogavelli, Oluwanifemi Bamgbose, Gabrielle Gauthier Melançon, Fanny Riols, Roshnee Sharma
-
Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models
Hoang-Chau Luong, Lingwei Chen
-
FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
Zhi Yang, Runguo Li, Qiqi Qiang, Jiashun Wang, Fangqi Lou, Mengping Li, Dongpo Cheng, Rui Xu, Heng Lian, Shuo Zhang, Xiaolong Liang, Xiaoming Huang, Zheng Wei, Zhaowei Liu, Xin Guo, Huacan Wang, Ronghao Chen, Liwen Zhang
-
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
Jua Han, Jaeyoon Seo, Jungbin Min, Jean Oh, Jihie Kim
-
Jivnesh Sandhan, Harshit Jaiswal, Fei Cheng, Yugo Murawaki
-
Falsifying Sparse Autoencoder Reasoning Features in Language Models
George Ma, Zhongyuan Liang, Irene Y. Chen, Somayeh Sojoudi
-
Adversarial Network Imagination: Causal LLMs and Digital Twins for Proactive Telecom Mitigation
Vignesh Sriram, Yuqiao Meng, Luoxi Tang, Zhaohan Xi
-
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents
Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, Yugang Jiang
-
Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models
Can Xu, Lingyong Yan, Jiayi Wu, Haosen Wang, Shuaiqiang Wang, Yuchen Li, Jizhou Huang, Dawei Yin, Xiang Li
-
Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang
-
ResMAS: Resilience Optimization in LLM-based Multi-agent Systems
Zhilun Zhou, Zihan Liu, Jiahe Liu, Qingyu Shao, Yihan Wang, Kun Shao, Depeng Jin, Fengli Xu
-
Defense Against Indirect Prompt Injection via Tool Result Parsing
Qiang Yu, Xinran Cheng, Chuanyi Liu
-
An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions
Avik Dutta, Harshit Nigam, Hosein Hasanbeig, Arjun Radhakrishna, Sumit Gulwani
-
Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models
Arghyadeep Das, Sai Sreenivas Chintha, Rishiraj Girmal, Kinjal Pandey, Sharvi Endait
-
Shuliang Liu, Xingyu Li, Hongyi Liu, Yibo Yan, Bingchen Duan, Qi Zheng, Dong Fang, Lingfeng Su, Xuming Hu
-
Advancing Language Models for Code-related Tasks
Zhao Tian
-
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
Hoagy Cunningham, Jerry Wei, Zihan Wang, Andrew Persic, Alwin Peng, Jordan Abderrachid, Raj Agarwal, Bobby Chen, Austin Cohen, Andy Dau, Alek Dimitriev, Rob Gilson, Logan Howard, Yijin Hua, Jared Kaplan, Jan Leike, Mu Lin, Christopher Liu, Vladimir Mikulik, Rohit Mittapalli, Clare O'Hara, Jin Pan, Nikhil Saxena, Alex Silverstein, Yue Song, Xunjie Yu, Giulio Zhou, Ethan Perez, Mrinank Sharma
-
DSC2025 -- ViHallu Challenge: Detecting Hallucination in Vietnamese LLMs
Anh Thi-Hoang Nguyen, Khanh Quoc Tran, Tin Van Huynh, Phuoc Tan-Hoang Nguyen, Cam Tan Nguyen, Kiet Van Nguyen
-
Huawei Zheng, Xinqi Jiang, Sen Yang, Shouling Ji, Yingcai Wu, Dazhen Deng
-
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
Subhadeep Roy, Gagan Bhatia, Steffen Eger
-
MAGA-Bench: Machine-Augment-Generated Text via Alignment Detection Benchmark
Anyang Song, Ying Cheng, Yiqian Xu, Rui Feng
-
AM$^3$Safety: Towards Data Efficient Alignment of Multi-modal Multi-turn Safety for MLLMs
Han Zhu, Jiale Chen, Chengkun Cai, Shengjie Sun, Haoran Li, Yujin Zhou, Chi-Min Chan, Pengcheng Wen, Lei Li, Sirui Han, Yike Guo
-
Lionel Z. Wang, Yusheng Zhao, Jiabin Luo, Xinfeng Li, Lixu Wang, Yinan Peng, Haoyang Li, XiaoFeng Wang, Wei Dong
-
Masatomo Yoshida, Haruto Namura, Nicola Adami, Masahiro Okuda
-
Higher-Order Adversarial Patches for Real-Time Object Detectors
Jens Bayer, Stefan Becker, David Münch, Michael Arens, Jürgen Beyerer
-
Decentralized Privacy-Preserving Federal Learning of Computer Vision Models on Edge Devices
Damian Harenčák, Lukáš Gajdošech, Martin Madaras
-
When Models Manipulate Manifolds: The Geometry of a Counting Task
Wes Gurnee, Emmanuel Ameisen, Isaac Kauvar, Julius Tarng, Adam Pearce, Chris Olah, Joshua Batson
-
Sequential Subspace Noise Injection Prevents Accuracy Collapse in Certified Unlearning
Polina Dolgova, Sebastian U. Stich
-
Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them
Mohamed Nabeel, Oleksii Starov
-
Wonwoo Choi, Minjae Seo, Minkyoo Song, Hwanjo Heo, Seungwon Shin, Myoungsung You
-
A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes
Sahaya Jestus Lazer, Kshitiz Aryal, Maanak Gupta, Elisa Bertino
-
Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models
Badhan Chandra Das, Md Tasnim Jawad, Joaquin Molto, M. Hadi Amini, Yanzhao Wu
-
Prasanna Kumar
-
Anh-Kiet Duong, Petra Gomez-Krämer, Hoàng-Ân Lê, Minh-Tan Pham
-
Inverting Non-Injective Functions with Twin Neural Network Regression
Sebastian J. Wetzel
-
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules
Di Wu, Yanyan Zhao, Xin Lu, Mingzhe Li, Bing Qin
-
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
Su-Hyeon Kim, Hyundong Jin, Yejin Lee, Yo-Sub Han
-
ALERT: Zero-shot LLM Jailbreak Detection via Internal Discrepancy Amplification
Xiao Lin, Philip Li, Zhichen Zeng, Tingwei Li, Tianxin Wei, Xuying Ning, Gaotang Li, Yuzhong Chen, Hanghang Tong
-
Inference Attacks Against Graph Generative Diffusion Models
Xiuling Wang, Xin Huang, Guibo Luo, Jianliang Xu
-
Xiaoyu Luo, Yiyi Chen, Qiongxiu Li, Johannes Bjerva
-
What Matters For Safety Alignment?
Xing Li, Hui-Ling Zhen, Lihao Yin, Xianzhi Yu, Zhenhua Dong, Mingxuan Yuan
-
Siyuan Li, Xi Lin, Jun Wu, Zehao Liu, Haoyu Li, Tianjie Ju, Xiang Chen, Jianhua Li
-
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts
Zhihao Zhu, Jiafeng Liang, Shixin Jiang, Jinlan Fu, Ming Liu, Guanglu Sun, See-Kiong Ng, Bing Qin
-
Binh Nguyen, Thai Le
-
Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases
Hui Huang, Xuanxin Wu, Muyun Yang, Yuki Arase
-
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
Quy-Anh Dang, Chris Ngo, Truong-Son Hy
-
HearSay Benchmark: Do Audio LLMs Leak What They Hear?
Jin Wang, Liang Lin, Kaiwen Luo, Weiliu Wang, Yitian Chen, Moayad Aloqaily, Xuehai Tang, Zhenhong Zhou, Kun Wang, Li Sun, Qingsong Wen
-
RADAR: Retrieval-Augmented Detector with Adversarial Refinement for Robust Fake News Detection
Song-Duo Ma, Yi-Hung Liu, Hsin-Yu Lin, Pin-Yu Chen, Hong-Yan Huang, Shau-Yung Hsu, Yun-Nung Chen
-
Yu Yan, Sheng Sun, Mingfeng Li, Zheming Yang, Chiwei Zhu, Fei Ma, Benfeng Xu, Min Liu
-
SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems
Andreea-Elena Bodea, Stephen Meisenbacher, Alexandra Klymenko, Florian Matthes
-
Stable Language Guidance for Vision-Language-Action Models
Zhihao Zhan, Yuhao Chen, Jiaying Zhou, Qinhan Lv, Hao Liu, Keze Wang, Liang Lin, Guangrun Wang
-
Towards Real-world Lens Active Alignment with Unlabeled Data via Domain Adaptation
Wenyong Lia, Qi Jiang, Weijian Hu, Kailun Yang, Zhanjun Zhang, Wenjun Tian, Kaiwei Wang, Jian Bai
-
Scanner-Induced Domain Shifts Undermine the Robustness of Pathology Foundation Models
Erik Thiringer, Fredrik K. Gustafsson, Kajsa Ledesma Eriksson, Mattias Rantalainen
-
Detecting Semantic Backdoors in a Mystery Shopping Scenario
Arpad Berta, Gabor Danner, Istvan Hegedus, Mark Jelasity
-
M. Yin, K. G. Ravindran, C. Hadjipanayi, A. Bannon, A. Rapeaux, C. Della Monica, T. S. Lande, Derk-Jan Dijk, T. G. Constandinou
-
Beyond Physical Labels: Redefining Domains for Robust WiFi-based Gesture Recognition
Xiang Zhang, Huan Yan, Jinyang Huang, Bin Liu, Yuanhao Feng, Jianchun Liu, Meng Li, Fusang Zhang, Zhi Liu
-
Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense
Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He
-
Eren Kocadag, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag
-
Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs
Dinesh Srivasthav P, Ashok Urlana, Rahul Mishra, Bala Mallikarjunarao Garlapati, Ponnurangam Kumaraguru
-
From Domains to Instances: Dual-Granularity Data Synthesis for LLM Unlearning
Xiaoyu Xu, Minxin Du, Zitong Li, Zi Liang, Zhibiao Guo, Shiyu Zhang, Peizhao Hu, Qingqing Ye, Haibo Hu
-
MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking
Iago Alves Brito, Walcy Santos Rezende Rios, Julia Soares Dollis, Diogo Fernandes Costa Silva, Arlindo Rodrigues Galvão Filho
-
Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models
San Kim, Gary Geunbae Lee
-
ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models
Sharanya Dasgupta, Arkaprabha Basu, Sujoy Nath, Swagatam Das
-
Ji Guo, Wenbo Jiang, Yansong Lin, Yijing Liu, Ruichen Zhang, Guomin Lu, Aiguo Chen, Xinshuo Han, Hongwei Li, Dusit Niyato
-
Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models
Hang Fu, Wanli Peng, Yinghan Zhou, Jiaxuan Wu, Juan Wen, Yiming Xue
-
You Only Anonymize What Is Not Intent-Relevant: Suppressing Non-Intent Privacy Evidence
Weihao Shen, Yaxin Xu, Shuang Li, Wei Chen, Yuqin Lan, Meng Yuan, Fuzhen Zhuang
-
Privacy at Scale in Networked Healthcare
M. Amin Rahimian, Benjamin Panny, James Joshi
-
SearchAttack: Red-Teaming LLMs against Knowledge-to-Action Threats under Online Web Search
Yu Yan, Sheng Sun, Mingfeng Li, Zheming Yang, Chiwei Zhu, Fei Ma, Benfeng Xu, Min Liu, Qi Li
-
What Matters For Safety Alignment?
Xing Li, Hui-Ling Zhen, Lihao Yin, Xianzhi Yu, Zhenhua Dong, Mingxuan Yuan
-
Extracting books from production language models
Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang
-
Jie Peng, Weiyu Li, Stefan Vlaski, Qing Ling
-
Adversarial Question Answering Robustness: A Multi-Level Error Analysis and Mitigation Study
Agniv Roy Choudhury, Vignesh Ponselvan Rajasingh
-
Window-based Membership Inference Attacks Against Fine-tuned Large Language Models
Yuetian Chen, Yuntao Du, Kaiyuan Zhang, Ashish Kundu, Charles Fleming, Bruno Ribeiro, Ninghui Li
-
JPU: Bridging Jailbreak Defense and Unlearning via On-Policy Path Rectification
Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Zhaoye Li, Bin Ji, Baosheng Wang, Jie Yu
-
ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
Peiran Li, Jan Fillies, Adrian Paschke
-
LeafLife: An Explainable Deep Learning Framework with Robustness for Grape Leaf Disease Recognition
B. M. Shahria Alam, Md. Nasim Ahmed
-
Multi-Turn Jailbreaking of Aligned LLMs via Lexical Anchor Tree Search
Devang Kulshreshtha, Hang Su, Chinmay Hegde, Haohan Wang
-
Xiangzhe Yuan, Zhenhao Zhang, Haoming Tang, Siying Hu
-
Yihan Wei, Shenghai Yuan, Tianchen Deng, Boyang Lou, Enwen Hu
-
LesionTABE: Equitable AI for Skin Lesion Detection
Rocio Mexia Diaz, Yasmin Greenway, Petru Manescu
-
Hana Yahia (CAS), Bruno Figliuzzi (CMM), Florent Di Meglio (CAS), Laurent Gerbaud (GEOSCIENCES), Stephane Menand, Mohamed Mahjoub
-
Adversarial Contrastive Learning for LLM Quantization Attacks
Dinghong Song, Zhiwei Xu, Hai Wan, Xibin Zhao, Pengfei Su, Dong Li
-
Quality Degradation Attack in Synthetic Data
Qinyi Liu, Dong Liu, Farhad Vadiee, Mohammad Khalil, Pedro P. Vergara Barrios
-
Context-aware Privacy Bounds for Linear Queries
Heng Zhao (1), Sara Saeidian (1 and 2), Tobias J. Oechtering (1) ((1) KTH Royal Institute of Technology, (2) Inria Saclay)
-
Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis
Mengze Hong, Di Jiang, Zeying Xie, Weiwei Zhao, Guan Wang, Chen Jason Zhang
-
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
Akarsh Kumar, Ryan Bahlous-Boldi, Prafull Sharma, Phillip Isola, Sebastian Risi, Yujin Tang, David Ha
-
AI-Driven Cybersecurity Threats: A Survey of Emerging Risks and Defensive Strategies
Sai Teja Erukude, Viswa Chaitanya Marella, Suhasnadh Reddy Veluru
-
Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks
Zhakshylyk Nurlanov, Frank R. Schmidt, Florian Bernard
-
Rendering Data Unlearnable by Exploiting LLM Alignment Mechanisms
Ruihan Zhang, Jun Sun
-
A Novel Unified Approach to Deepfake Detection
Lord Sen, Shyamapada Mukherjee
-
GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models
Xiangdong Hu, Yangyang Jiang, Qin Hu, Xiaojun Jia
-
Scott Thornton
-
DeepLeak: Privacy Enhancing Hardening of Model Explanations Against Membership Leakage
Firas Ben Hmida, Zain Sbeih, Philemon Hailemariam, Birhanu Eshete
-
Gaurav Sarraf, Vibhor Pal
-
Cross-Language Speaker Attribute Prediction Using MIL and RL
Sunny Shu, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag
-
Beyond Immediate Activation: Temporally Decoupled Backdoor Attacks on Time Series Forecasting
Zhixin Liu, Xuanlin Liu, Sihan Xu, Yaqiong Qiao, Ying Zhang, Xiangrui Cai
-
Multi-Agent-Driven Cognitive Secure Communications in Satellite-Terrestrial Networks
Yujie Ling, Zan Li, Lei Guan, Zheng Zhang, Shengyu Zhang, Tony Q.S. Quek
-
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs
Dasol Choi, DongGeon Lee, Brigitta Jesica Kartono, Helena Berndt, Taeyoun Kwon, Joonwon Jang, Haon Park, Hwanjo Yu, Minsuk Kahng
-
MindChat: A Privacy-preserving Large Language Model for Mental Health Support
Dong Xue, Jicheng Tu, Ming Wang, Xin Yan, Fangzhou Liu, Jie Hu
-
Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization
Jiwei Guan, Haibo Jin, Haohan Wang
-
Qi Wei, Junchao Fan, Zhao Yang, Jianhua Wang, Jingkai Mao, Xiaolin Chang
-
MORE: Multi-Objective Adversarial Attacks on Speech Recognition
Xiaoxue Gao, Zexin Li, Yiming Chen, Nancy F. Chen
-
Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance
Jiawen Zhang, Lipeng He, Kejia Chen, Jian Lou, Jian Liu, Xiaohu Yang, Ruoxi Jia
-
Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior
Bo Yin, Qi Li, Runpeng Yu, Xinchao Wang
-
Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models
Antonio Colacicco, Vito Guida, Dario Di Palma, Fedelucio Narducci, Tommaso Di Noia
-
Hidden State Poisoning Attacks against Mamba-based Language Models
Alexandre Le Mercier, Chris Develder, Thomas Demeester
-
FMVP: Masked Flow Matching for Adversarial Video Purification
Duoxun Tang, Xueyi Zhang, Chak Hin Wang, Xi Xiao, Dasen Dai, Xinhang Jiang, Wentao Shi, Rui Li, Qing Li
-
UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk
Intae Jeon, Yujeong Kwon, Hyungjoon Koo
-
FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks
Chenyu Hu, Qiming Hu, Sinan Chen, Nianyu Li, Mingyue Zhang, Jialong Li
-
A Differentiable Adversarial Framework for Task-Aware Data Subsampling
Jiacheng Lyu, Bihua Bao
-
Learning with Monotone Adversarial Corruptions
Kasper Green Larsen, Chirag Pabbaraju, Abhishek Shetty
-
Dina El Zein, James Henderson
-
Hanzaleh Akbari Nodehi, Viveck R. Cadambe, Mohammad Ali Maddah-Ali
-
Structural Representations for Cross-Attack Generalization in AI Agent Threat Detection
Vignesh Iyer
-
Local Layer-wise Differential Privacy in Federated Learning
Yunbo Li, Jiaping Gui, Fanchao Meng, Yue Wu
-
From Chat Control to Robot Control: The Backdoors Left Open for the Sake of Safety
Neziha Akalin, Alberto Giaretta
-
Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala
-
SWaRL: Safeguard Code Watermarking via Reinforcement Learning
Neusha Javidnia, Ruisi Zhang, Ashish Kundu, Farinaz Koushanfar
-
Fanzhe Fu
-
Junyu Liu, Zirui Li, Qian Niu, Zequn Zhang, Yue Xun, Wenlong Hou, Shujun Wang, Yusuke Iwasawa, Yutaka Matsuo, Kan Hatakeyama-Sato
-
Learning Resilient Elections with Adversarial GNNs
Hao Xiang Li, Yash Shah, Lorenzo Giusti
-
Wei Liu, Yaoxin Wu, Yingqian Zhang, Thomas Bäck, Yingjie Fan
-
Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, Xiaowei Huang
-
FALCON: Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation
Abdur R. Fayjie, Pankhi Kashyap, Jutika Borah, Patrick Vandewalle
-
iFlip: Iterative Feedback-driven Counterfactual Example Refinement
Yilong Wang, Qianli Wang, Nils Feldhus
-
Image Synthesis Using Spintronic Deep Convolutional Generative Adversarial Network
Saumya Gupta, Abhinandan, Venkatesh vadde, Bhaskaran Muralidharan, Abhishek Sharma
-
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs
Xin Wang, Yunhao Chen, Juncheng Li, Yixu Wang, Yang Yao, Tianle Gu, Jie Li, Yan Teng, Xingjun Ma, Yingchun Wang, Xia Hu
-
Causal discovery for linear causal model with correlated noise: an Adversarial Learning Approach
Mujin Zhou, Junzhe Zhang
-
DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors
Yash Thesia, Meera Suthar
-
Concave Certificates: Geometric Framework for Distributionally Robust Risk and Complexity Analysis
Hong T.M. Chu
-
Chandra Thapa, Surya Nepal
-
Steerability of Instrumental-Convergence Tendencies in LLMs
Jakub Hoscilowicz
-
How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference
Songyang Liu, Chaozhuo Li, Rui Pu, Litian Zhang, Chenxu Wang, Zejian Chen, Yuting Zhang, Yiming Hei
-
RefSR-Adv: Adversarial Attack on Reference-based Image Super-Resolution Models
Jiazhu Dai, Huihui Jiang
-
Aggressive Compression Enables LLM Weight Theft
Davis Brown, Juan-Pablo Rivera, Dan Hendrycks, Mantas Mazeika
-
Out-of-Band Power Side-Channel Detection for Semiconductor Supply Chain Integrity at Scale
Rajiv Thummala, Katherine Winton, Luke Flores, Elizabeth Redmond, Gregory Falco
-
NADD: Amplifying Noise for Effective Diffusion-based Adversarial Purification
David D. Nguyen, The-Anh Ta, Yansong Gao, Alsharif Abuadbba
-
SecureCodeRL: Security-Aware Reinforcement Learning for Code Generation with Partial-Credit Rewards
Suryansh Singh Sijwali, Suman Saha
-
dataRLsec: Safety, Security, and Reliability With Robust Offline Reinforcement Learning for DPAs
Shriram KS Pandian, Naresh Kshetri
-
IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection
Jiajie Zhu, Xia Du, Xiaoyuan Liu, Jizhe Zhou, Qizhen Xu, Zheng Lin, Chi-Man Pun
-
The Impact of Post-training on Data Contamination
Muhammed Yusuf Kocyigit, Caglar Yildirim
-
HyunJun Jeon
-
CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns
Zhenhong Zhou, Shilinlu Yan, Chuanpu Liu, Qiankun Li, Kun Wang, Zhigang Zeng
-
SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation
Yiling Wang, Zeyu Zhang, Yiran Wang, Hao Tang
-
Modality Dominance-Aware Optimization for Embodied RGB-Infrared Perception
Xianhui Liu, Siqi Jiang, Yi Xie, Yuqing Lin, Siao Liu
-
Adversarial Samples Are Not Created Equal
Jennifer Crawford, Amol Khanna, Fred Lu, Amy R. Wagoner, Stella Biderman, Andre T. Nguyen, Edward Raff
-
Yueyan Dong, Minghui Xu, Qin Hu, Yinhao Xiao, Qi Luo, Yechao Zhang, Yue Zhang, Xiuzhen Cheng
-
Emoji-Based Jailbreaking of Large Language Models
M P V S Gopinadh, S Mahaboob Hussain
-
WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift
Julian D. Santamaria, Claudia Isaza, Jhony H. Giraldo
-
Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks
Longwei Wang, Mohammad Navid Nayyem, Abdullah Al Rakin, KC Santosh, Chaowei Zhang, Yang Zhou
-
Will LLM-powered Agents Bias Against Humans? Exploring the Belief-Dependent Vulnerability
Zongwei Wang, Bincheng Gu, Hongyu Yu, Junliang Yu, Tao He, Jiayin Feng, Min Gao
-
DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection
Yuxin Li, Xiangyu Zhang, Yifei Li, Zhiwei Guo, Haoyang Zhang, Eng Siong Chng, Cuntai Guan
-
Robust Uncertainty Quantification for Factual Generation of Large Language Models
Yuhao Zhang, Zhongliang Yang, Linna Zhou
-
Mapping Human Anti-collusion Mechanisms to Multi-agent AI
Jamiu Adekunle Idowu, Ahmed Almasoud, Ayman Alfahid
-
PatchBlock: A Lightweight Defense Against Adversarial Patches for Embedded EdgeAI Devices
Nandish Chattopadhyay, Abdul Basit, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
-
Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing
Md Mahbub Hasan, Marcus Sternhagen, Krishna Chandra Roy
-
Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations
Hyunjun Kim
-
Haoran Gu, Handing Wang, Yi Mei, Mengjie Zhang, Yaochu Jin
-
ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching
Yi Sun, Xinhao Zhong, Hongyan Li, Yimin Zhou, Junhao Li, Bin Chen, Xuan Wang
-
Robust Graph Fine-Tuning with Adversarial Graph Prompting
Ziyan Zhang, Bo Jiang, Jin Tang
-
Rectifying Adversarial Examples Using Their Vulnerabilities
Fumiya Morimoto, Ryuto Morita, Satoshi Ono
-
NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion
Muhammad Bilal, Omer Tariq, Hasan Ahmed
-
Weijie Wang, Peizhuo Lv, Yan Wang, Rujie Dai, Guokun Xu, Qiujian Lv, Hangcheng Liu, Weiqing Huang, Wei Dong, Jiaheng Zhang
-
Traffic-MoE: A Sparse Foundation Model for Network Traffic Analysis
Jiajun Zhou, Changhui Sun, Meng Shen, Shanqing Yu, Qi Xuan
-
Enhancing the QA Model through a Multi-domain Debiasing Framework
Yuefeng Wang, ChangJae Lee
-
R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory
Maoyuan Li, Zhongsheng Wang, Haoyuan Li, Jiamou Liu
-
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI
Srija Mukhopadhyay, Sathwik Reddy, Shruthi Muthukumar, Jisun An, Ponnurangam Kumaraguru
-
Muhammad Abdullahi Said, Muhammad Sammani Sani
-
Takeru Kusakabe, Yudai Hirose, Mashiho Mukaida, Satoshi Ono
-
CPR: Causal Physiological Representation Learning for Robust ECG Analysis under Distribution Shifts
Shunbo Jia, Caizhi Liao
-
HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs
Honglin Gao, Lan Zhao, Junhao Ren, Xiang Li, Gaoxi Xiao
-
Sparse Offline Reinforcement Learning with Corruption Robustness
Nam Phuong Tran, Andi Nika, Goran Radanovic, Long Tran-Thanh, Debmalya Mandal
-
Secure Digital Semantic Communications: Fundamentals, Challenges, and Opportunities
Weixuan Chen, Qianqian Yang, Yuanyuan Jia, Junyu Pan, Shuo Shao, Jincheng Dai, Meixia Tao, Ping Zhang
-
Towards Provably Secure Generative AI: Reliable Consensus Sampling
Yu Cui, Hang Fu, Sicheng Pan, Zhuoyu Sun, Yifei Liu, Yuhong Nie, Bo Ran, Baohan Huang, Xufeng Zhang, Haibin Zhang, Cong Zuo, Licheng Wang
-
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts
Hengli Li, Zhaoxin Yu, Qi Shen, Chenxi Li, Mengmeng Wang, Tinglang Wu, Yipeng Kang, Yuxuan Wang, Song-Chun Zhu, Zixia Jia, Zilong Zheng
-
Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing
Manish Bhatt, Adrian Wood, Idan Habler, Ammar Al-Kahfah
-
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
Xiaoze Liu, Weichen Yu, Matt Fredrikson, Xiaoqian Wang, Jing Gao
-
Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition
Yuchao Hou (1, 2), Zixuan Zhang (1), Jie Wang (1), Wenke Huang (3), Lianhui Liang (4), Di Wu (5), Zhiquan Liu (6), Youliang Tian (2), Jianming Zhu (7), Jisheng Dang (8), Junhao Dong (3), Zhongliang Guo (9) ((1) Shanxi Normal University, Taiyuan, China, (2) Guizhou University, Guiyang, China, (3) Nanyang Technological University, Singapore, Singapore, (4) Guangxi University, Nanning, China, (5) La Trobe University, Melbourne, Australia, (6) Jinan University, Guangzhou, China, (7) Central University of Finance and Economics, Beijing, China, (8) Lanzhou University, Lanzhou, China, (9) University of St Andrews, St Andrews, United Kingdom)
-
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
Prajwal Chinchmalatpure, Suyash Chinchmalatpure, Siddharth Chavan
-
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
Yuan Xin, Dingfan Chen, Linyi Yang, Michael Backes, Xiao Zhang
-
Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations
Yalin E. Sagduyu, Tugba Erpek, Aylin Yener, Sennur Ulukus
-
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
Changzhen Li, Yuecong Min, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen
-
Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation
Haotang Li, Zhenyu Qi, Hao Qin, Huanrui Yang, Sen He, Kebin Peng
-
Yongtao Chen, Yanbo Wang, Wentao Zhao, Guole Shen, Tianchen Deng, Jingchuan Wang
-
Bayesian Self-Distillation for Image Classification
Anton Adelöw, Matteo Gamba, Atsuto Maki
-
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention
Haijing Liu, Zhiyuan Song, Hefeng Wu, Tao Pu, Keze Wang, Liang Lin
-
Vladimir Frants, Sos Agaian
-
Kacem Khaled, Felipe Gohring de Magalhães, Gabriela Nicolescu
-
Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems
Tinglong Dai, David Simchi-Levi, Michelle Xiao Wu, Yao Xie
-
Ruixuan Huang, Qingyue Wang, Hantao Huang, Yudong Gao, Dong Chen, Shuai Wang, Wei Wang
-
SourceBroken: A large-scale analysis on the (un)reliability of SourceRank in the PyPI ecosystem
Biagio Montaruli, Serena Elisa Ponta, Luca Compagna, Davide Balzarotti
-
Jingyu Zhang
-
Sina Jahromi, Farshid Hajati, Alireza Rezaee, Javaher Nourian
-
The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models
Giuseppe Canale, Kashyap Thimmaraju
-
Osasumwen Cedric Ogiesoba-Eguakun, Suman Rath
-
Ruben Neyroud, Sam Corley
-
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Błaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip H.S. Torr, Adam Mahdi, Adel Bibi
-
Zhen Liang, Hai Huang, Zhengkui Chen
-
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Jiafeng Liang, Hao Li, Chang Li, Jiaqi Zhou, Shixin Jiang, Zekun Wang, Changkai Ji, Zhihao Zhu, Runxuan Liu, Tao Ren, Jinlan Fu, See-Kiong Ng, Xia Liang, Ming Liu, Bing Qin
-
Toward Trustworthy Agentic AI: A Multimodal Framework for Preventing Prompt Injection Attacks
Toqeer Ali Syed, Mishal Ateeq Almutairi, Mahmoud Abdel Moaty
-
Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing
Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai.-Doss
-
NeXT-IMDL: Build Benchmark for NeXT-Generation Image Manipulation Detection & Localization
Yifei Li, Haoyuan He, Yu Zheng, Bingyao Yu, Wenzhao Zheng, Lei Chen, Jie Zhou, Jiwen Lu
-
ProGuard: Towards Proactive Multimodal Safeguard
Shaohan Yu, Lijun Li, Chenyang Si, Lu Sheng, Jing Shao
-
Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
Armstrong Foundjem, Lionel Nganyewou Tidjon, Leuson Da Silva, Foutse Khomh
-
Yu Jiang, Xindi Tong, Ziyao Liu, Xiaoxi Zhang, Kwok-Yan Lam, Chee Wei Tan
-
Calibrated Multi-Level Quantile Forecasting
Tiffany Ding, Isaac Gibbs, Ryan J. Tibshirani
-
RobustMask: Certified Robustness against Adversarial Neural Ranking Attack via Randomized Masking
Jiawei Liu, Zhuo Chen, Rui Zhu, Miaokun Chen, Yuyang Gong, Wei Lu, Xiaofeng Wang
-
A Privacy Protocol Using Ephemeral Intermediaries and a Rank-Deficient Matrix Power Function (RDMPF)
Eduardo Salazar
-
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark
Manu, Yi Guo, Jo Plested, Tim Lynar, Kanchana Thilakarathna, Nirhoshan Sivaroopan, Jack Yang, Wangli Yang
-
Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems
Samaresh Kumar Singh, Joyjit Roy, Martin So
-
Improved Bounds for Private and Robust Alignment
Wenqian Weng, Yi He, Xingyu Zhou
-
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation
Kaustubh Dhole
-
Roee Ziv, Raz Lapid, Moshe Sipper
-
Calibrated Multi-Level Quantile Forecasting
Tiffany Ding, Isaac Gibbs, Ryan J. Tibshirani
-
DECEPTICON: How Dark Patterns Manipulate Web Agents
Phil Cuvin, Hao Zhu, Diyi Yang
-
Ju-Hsuan Weng, Jia-Wei Liao, Cheng-Fu Chou, Jun-Cheng Chen
-
Fundamental Novel Consistency Theory: $H$-Consistency Bounds
Yutao Zhong
-
Soham Padia, Dhananjay Vaidya, Ramchandra Mangrulkar
-
Privacy-Preserving Black-Box Optimization (PBBO): Theory and the Model-Based Algorithm DFOp
Pengcheng Xie
-
DECEPTICON: How Dark Patterns Manipulate Web Agents
Phil Cuvin, Hao Zhu, Diyi Yang
-
Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring
Saisab Sadhu, Ashim Dhor
-
Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks
Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
-
Verifiable Dropout: Turning Randomness into a Verifiable Claim
Kichang Lee, Sungmin Lee, Jaeho Jin, JeongGil Ko
-
Towards Reliable Evaluation of Adversarial Robustness for Spiking Neural Networks
Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
-
Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation
Yiming Qian, Thorsten Neumann, Xueyining Huang, David Hardoon, Fei Gao, Yong Liu, Siow Mong Rick Goh
-
StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars
Zhiyao Sun, Ziqiao Peng, Yifeng Ma, Yi Chen, Zhengguang Zhou, Zixiang Zhou, Guozhen Zhang, Youliang Zhang, Yuan Zhou, Qinglin Lu, Yong-Jin Liu
-
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models
Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang
-
Dunyuan XU, Xikai Yang, Yaoqian Li, Juzheng Miao, Jinpeng Li, Pheng-Ann Heng
-
Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs
Jiayu Hu, Beibei Li, Jiangwei Xia, Yanjun Qin, Bing Ji, Zhongshi He
-
Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models
Zongmin Zhang, Zhen Sun, Yifan Liao, Wenhan Dong, Xinlei He, Xingshuo Han, Shengmin Xu, Xinyi Huang
-
Scaling Adversarial Training via Data Selection
Youran Ye, Dejin Wang, Ajinkya Bhandare
-
Attack-Aware Deepfake Detection under Counter-Forensic Manipulations
Noor Fatima, Hasan Faraz Khan, Muzammil Behzad
-
LLA: Enhancing Security and Privacy for Generative Models with Logic-Locked Accelerators
You Li, Guannan Zhao, Yuhao Ju, Yunqi He, Jie Gu, Hai Zhou
-
Mohammad Zakaria Haider, Amit Kumar Podder, Prabin Mali, Aranya Chakrabortty, Sumit Paudyal, Mohammad Ashiqur Rahman
-
Yunguo Yu
-
Vahideh Zolfaghari
-
Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought
Yuyi Zhang, Boyu Tang, Tianjie Ju, Sufeng Duan, Gongshen Liu
-
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
Egor Shulgin, Grigory Malinovsky, Sarit Khirirat, Peter Richtárik
-
Dictionary-Transform Generative Adversarial Networks
Angshul Majumdar
-
Assessing the Effectiveness of Membership Inference on Generative Music
Kurtis Chow, Omar Samiullah, Vinesh Sridhar, Hewen Zhang
-
Exploring the Security Threats of Retriever Backdoors in Retrieval-Augmented Code Generation
Tian Li, Bo Lin, Shangwen Wang, Yusong Tan
-
Machine Learning Power Side-Channel Attack on SNOW-V
Deepak, Rahul Balout, Anupam Golder, Suparna Kundu, Angshuman Karmakar, Debayan Das
-
Learning from Negative Examples: Why Warning-Framed Training Data Teaches What It Warns Against
Tsogt-Ochir Enkhbayar
-
Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning
Deep Pankajbhai Mehta
-
Can Generative Models Actually Forge Realistic Identity Documents?
Alexander Vinogradov
-
Israk Hasan Jone, D.M. Rafiun Bin Masud, Promit Sarker, Sayed Fuad Al Labib, Nazmul Islam, Farhad Billah
-
Beyond Context: Large Language Models Failure to Grasp Users Intent
Ahmed M. Hussain, Salahuddin Salahuddin, Panos Papadimitratos
-
Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking
Yifan Huang, Xiaojun Jia, Wenbo Guo, Yuqiang Sun, Yihao Huang, Chong Wang, Yang Liu
-
Improving the Convergence Rate of Ray Search Optimization for Query-Efficient Hard-Label Attacks
Xinjie Xu, Shuyu Cheng, Dongwei Xu, Qi Xuan, Chen Ma
-
Efficient and Robust Video Defense Framework against 3D-field Personalized Talking Face
Rui-qing Sun, Xingshan Yao, Tian Lan, Hui-Yang Zhao, Jia-Ling Shi, Chen-Hao Cui, Zhijing Wu, Chen Yang, Xian-Ling Mao
-
Robustness Certificates for Neural Networks against Adversarial Attacks
Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Majid Zamani
-
Time-Efficient Evaluation and Enhancement of Adversarial Robustness in Deep Neural Networks
Runqi Lin
-
Clever Hans in Chemistry: Chemist Style Signals Confound Activity Prediction on Public Benchmarks
Andrew D. Blevins, Ian K. Quigley
-
zkFL-Health: Blockchain-Enabled Zero-Knowledge Federated Learning for Medical AI Privacy
Savvy Sharma, George Petrovic, Sarthak Kaushik
-
Ji Hyuk Jung, Ji Won Yoon
-
AegisAgent: An Autonomous Defense Agent Against Prompt Injection Attacks in LLM-HARs
Yihan Wang, Huanqi Yang, Shantanu Pal, Weitao Xu
-
GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Stjepan Picek, Ahmad-Reza Sadeghi
-
CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents
Haoyang Li, Mingjin Li, Jinxin Zuo, Siqi Li, Xiao Li, Hao Wu, Yueming Lu, Xiaochuan He
-
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Le Wang, Zonghao Ying, Xiao Yang, Quanchen Zou, Zhenfei Yin, Tianlin Li, Jian Yang, Yaodong Yang, Aishan Liu, Xianglong Liu
-
LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors
Tianwei Lan, Farid Naït-Abdesselam
-
The Imitation Game: Using Large Language Models as Chatbots to Combat Chat-Based Cybercrimes
Yifan Yao, Baojuan Wang, Jinhao Duan, Kaidi Xu, ChuanKai Guo, Zhibo Eric Sun, Yue Zhang
-
A Reinforcement Learning Approach to Synthetic Data Generation
Natalia Espinosa-Dice, Nicholas J. Jackson, Chao Yan, Aaron Lee, Bradley A. Malin
-
IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense
Rahul Yumlembam, Biju Issac, Seibu Mary Jacob, Longzhi Yang
-
Honglin Mu, Jinghao Liu, Kaiyang Wan, Rui Xing, Xiuying Chen, Timothy Baldwin, Wanxiang Che
-
Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography
Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, Dacheng Tao
-
Evasion-Resilient Detection of DNS-over-HTTPS Data Exfiltration: A Practical Evaluation and Toolkit
Adam Elaoumari
-
Jaykumar Kasundra, Anjaneya Praharaj, Sourabh Surana, Lakshmi Sirisha Chodisetty, Sourav Sharma, Abhigya Verma, Abhishek Bhardwaj, Debasish Kanhar, Aakash Bhagat, Khalil Slimi, Seganrasan Subramanian, Sathwik Tejaswi Madhusudhan, Ranga Prasad Chenna, Srinivas Sunkara
-
Jixiao Yang, Jinyu Chen, Zixiao Huang, Chengda Xu, Chi Zhang, Sijia Li
-
Yuanjian Xu, Yuan Shuai, Jianing Hao, Guang Zhang
-
Ipek Sena Yilmaz, Onur G. Tuncer, Zeynep E. Aksoy, Zeynep Yağmur Baydemir
-
ChatGPT: Excellent Paper! Accept It. Editor: Imposter Found! Review Rejected
Kanchon Gharami, Sanjiv Kumar Sarkar, Yongxin Liu, Shafika Showkat Moni
-
Safety Alignment of LMs via Non-cooperative Games
Anselm Paulus, Ilia Kulikov, Brandon Amos, Rémi Munos, Ivan Evtimov, Kamalika Chaudhuri, Arman Zharmagambetov
-
Bridging Efficiency and Safety: Formal Verification of Neural Networks with Early Exits
Yizhak Yisrael Elboher, Avraham Raviv, Amihay Elboher, Zhouxing Shi, Omri Azencot, Hillel Kugler, Guy Katz
-
Adversarial Training for Failure-Sensitive User Simulation in Mental Health Dialogue Optimization
Ziyi Zhu, Olivier Tieleman, Caitlin A. Stamatis, Luka Smyth, Thomas D. Hull, Daniel R. Cahn, Matteo Malgaroli
-
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Zhengyang Shan, Aaron Mueller
-
Semantic Deception: When Reasoning Models Can't Compute an Addition
Nathaniël de Leeuw, Marceau Nahon, Mathis Reymond, Raja Chatila, Mehdi Khamassi
-
Mohammadreza Rostami, Solmaz S. Kia
-
Defending against adversarial attacks using mixture of experts
Mohammad Meymani, Roozbeh Razavi-Far
-
Real-World Adversarial Attacks on RF-Based Drone Detectors
Omer Gazit, Yael Itzhakev, Yuval Elovici, Asaf Shabtai
-
Investigating Model Editing for Unlearning in Large Language Models
Shariqah Hossain, Lalana Kagal
-
SemCovert: Secure and Covert Video Transmission via Deep Semantic-Level Hiding
Zhihan Cao, Xiao Yang, Gaolei Li, Jun Wu, Jianhua Li, Yuchen Liu
-
Failure Analysis of Safety Controllers in Autonomous Vehicles Under Object-Based LiDAR Attacks
Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl
-
Base Station Deployment under EMF constrain by Deep Reinforcement learning
Mohammed Mallik, Guillaume Villemaud
-
Safety Alignment of LMs via Non-cooperative Games
Anselm Paulus, Ilia Kulikov, Brandon Amos, Rémi Munos, Ivan Evtimov, Kamalika Chaudhuri, Arman Zharmagambetov
-
DIAL: Direct Iterative Adversarial Learning for Realistic Multi-Turn Dialogue Simulation
Ziyi Zhu, Olivier Tieleman, Caitlin A. Stamatis, Luka Smyth, Thomas D. Hull, Daniel R. Cahn, Matteo Malgaroli
-
A.A. Gde Yogi Pramana, Jason Ray, Anthony Jaya, Michael Wijaya
-
Konstantin Kaulen, Tobias Ladner, Stanley Bak, Christopher Brix, Hai Duong, Thomas Flinkow, Taylor T. Johnson, Lukas Koller, Edoardo Manino, ThanhVu H Nguyen, Haoze Wu
-
Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline
Akshaj Prashanth Rao, Advait Singh, Saumya Kumaar Saksena, Dhruv Kumar
-
The Erasure Illusion: Stress-Testing the Generalization of LLM Forgetting Evaluation
Hengrui Jia, Taoran Li, Jonas Guan, Varun Chandrasekaran
-
Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
Linzhi Chen, Yang Sun, Hongru Wei, Yuqi Chen
-
Lorenzo Capelli, Leandro de Souza Rosa, Gianluca Setti, Mauro Mangia, Riccardo Rovatti
-
Decoupled Generative Modeling for Human-Object Interaction Synthesis
Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim
-
6DAttack: Backdoor Attacks in the 6DoF Pose Estimation
Jihui Guo, Zongmin Zhang, Zhen Sun, Yuhao Yang, Jinlin Wu, Fu Zhang, Xinlei He
-
Optimizer Dynamics at the Edge of Stability with Differential Privacy
Ayana Hussain, Ricky Fang
-
GShield: Mitigating Poisoning Attacks in Federated Learning
Sameera K. M., Serena Nicolazzo, Antonino Nocera, Vinod P., Rafidha Rehiman K. A
-
DREAM: Dynamic Red-teaming across Environments for AI Models
Liming Lu, Xiang Gu, Junyu Huang, Jiawei Du, Yunhuai Liu, Yongbin Zhou, Shuchao Pang
-
Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress
Samruddhi Baviskar
-
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
Jiayun Wu, Jiashuo Liu, Zhiyuan Zeng, Tianyang Zhan, Tianle Cai, Wenhao Huang
-
Farjana Yesmin, Romana Akter
-
Naseem Machlovi, Maryam Saleki, Ruhul Amin, Mohamed Rahouti, Shawqi Al-Maliki, Junaid Qadir, Mohamed M. Abdallah, Ala Al-Fuqaha
-
MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking
Jianyi Zhang, Shizhao Liu, Ziyin Zhou, Zhen Li
-
Gökdeniz Gülmez
-
DASH: Deception-Augmented Shared Mental Model for a Human-Machine Teaming System
Zelin Wan, Han Jun Yoon, Nithin Alluru, Terrence J. Moore, Frederica F. Nelson, Seunghyun Yoon, Hyuk Lim, Dan Dongseong Kim, Jin-Hee Cho
-
Junjun Pan, Yixin Liu, Rui Miao, Kaize Ding, Yu Zheng, Quoc Viet Hung Nguyen, Alan Wee-Chung Liew, Shirui Pan
-
FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation
Ziyuan Tao, Chuanzhi Xu, Sandaru Jayawardana, Wei Bao, Kanchana Thilakarathna, Teng Joon Lim
-
Zhiyuan Peng, Zihan Ye, Shreyank N Gowda, Yuping Yan, Haotian Xu, Ling Shao
-
SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
Pengcheng Li, Qiang Fang, Tong Zhao, Yixing Lan, Xin Xu
-
Generating Risky Samples with Conformity Constraints via Diffusion Models
Han Yu, Hao Zou, Xingxuan Zhang, Zhengyi Wang, Yue He, Kehan Li, Peng Cui
-
Ni Ding, Songpei Lu, Wenjing Yang, Zijian Zhang
-
Khondokar Fida Hasan, Hasibul Hossain Shajeeb, Chathura Abeydeera, Benjamin Turnbull, Matthew Warren
-
Zhang Wei, Peilu Hu, Shengning Lang, Hao Yan, Li Mei, Yichao Zhang, Chen Yang, Junfeng Hao, Zhimo Han
-
Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness
Haotian Deng, Chris Farber, Jiyoon Lee, David Tang
-
Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models
Zhang Wei, Peilu Hu, Zhenyuan Wei, Chenwei Liang, Jing Luo, Ziyi Ni, Hao Yan, Li Mei, Shengning Lang, Kuan Lu, Xi Xiao, Zhimo Han, Yijin Wang, Yichao Zhang, Chen Yang, Junfeng Hao, Jiayi Gu, Riyang Bao, Mu-Jiang-Shan Wang
-
Zehao Liu, Xi Lin
-
Who Can See Through You? Adversarial Shielding Against VLM-Based Attribute Inference Attacks
Yucheng Fan, Jiawei Chen, Yu Tian, Zhaoxia Yin
-
AL-GNN: Privacy-Preserving and Replay-Free Continual Graph Learning via Analytic Learning
Xuling Zhang, Jindong Li, Yifei Zhang, Menglin Yang
-
SoK: Understanding (New) Security Issues Across AI4Code Use Cases
Qilong Wu, Taoran Li, Tianyang Zhou, Varun Chandrasekaran
-
Rahul Yumlembam, Biju Issac, Nauman Aslam, Eaby Kollonoor Babu, Josh Collyer, Fraser Kennedy
-
Felipe Biava Cataneo
-
Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track
June Young Yi, Hyeongju Kim, Juheon Lee
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Jiaqi Tang, Jianmin Chen, Wei Wei, Xiaogang Xu, Runtao Liu, Xiangyu Wu, Qipeng Xie, Jiafei Wu, Lei Zhang, Qifeng Chen
-
Adversarial Robustness of Vision in Open Foundation Models
Jonathon Fox, William J Buchanan, Pavlos Papadopoulos
-
AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens
Tung-Ling Li, Yuhao Wu, Hongliang Liu
-
EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories
Lu Wei, Yuta Nakashima, Noa Garcia
-
Visually Prompted Benchmarks Are Surprisingly Fragile
Haiwen Feng, Long Lian, Lisa Dunlap, Jiahao Shu, XuDong Wang, Renhao Wang, Trevor Darrell, Alane Suhr, Angjoo Kanazawa
-
Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach
Yidong Chai, Yi Liu, Mohammadreza Ebrahimi, Weifeng Li, Balaji Padmanabhan
-
DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference
Yonathan Bornfeld, Shai Avidan
-
Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors
Huixin Zhan
-
Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning
Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu
-
Timely Information Updating for Mobile Devices Without and With ML Advice
Yu-Pin Hsu, Yi-Hsuan Tseng
-
Cryptanalysis of Pseudorandom Error-Correcting Codes
Tianrui Wang, Anyu Wang, Tianshuo Cong, Delong Ran, Jinyuan Liu, Xiaoyun Wang
-
Securing Agentic AI Systems -- A Multilayer Security Framework
Sunil Arora, John Hastings
-
Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models
Wei Qian, Chenxu Zhao, Yangyi Li, Mengdi Huai
-
PermuteV: A Performant Side-channel-Resistant RISC-V Core Securing Edge AI Inference
Nuntipat Narkthong, Xiaolin Xu
-
From Fake Focus to Real Precision: Confusion-Driven Adversarial Attention Learning in Transformers
Yawei Liu
-
Consistency-Aware Editing for Entity-level Unlearning in Language Models
Xiaoqi Han, Víctor Gutiérrez-Basulto, Ru Li, Xiaoli Li, Jiye Liang, Jeff Z. Pan
-
Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das
-
StarCraft+: Benchmarking Multi-agent Algorithms in Adversary Paradigm
Yadong Li, Tong Zhang, Bo Huang, Zhen Cui
-
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille
-
Autoencoder-based Denoising Defense against Adversarial Attacks on Object Detection
Min Geun Song, Gang Min Kim, Woonmin Kim, Yongsik Kim, Jeonghyun Sim, Sangbeom Park, Huy Kang Kim
-
C-DGPA: Class-Centric Dual-Alignment Generative Prompt Adaptation
Chao Li, Dasha Hu, Chengyang Li, Yuming Jiang, Yuncheng Shen
-
Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification
Geofrey Owino, Bernard Shibwabo Kasamani, Ahmed M. Abdelmoniem, Edem Wornyo
-
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
Safwan Shaheer, G.M. Refatul Islam, Mohammad Rafid Hamid, Tahsin Zaman Jilan
-
Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
Yuxuan Qiao, Dongqin Liu, Hongchang Yang, Wei Zhou, Songlin Hu
-
TTP: Test-Time Padding for Adversarial Detection and Robust Adaptation on Vision-Language Models
Zhiwei Li, Yitian Pang, Weining Wang, Zhenan Sun, Qi Li
-
Ripan Kumar Kundu, Istiak Ahmed, Khaza Anuarul Hoque
-
Pixel Seal: Adversarial-only training for invisible image and video watermarking
Tomáš Souček, Pierre Fernandez, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Tom Sander, Alexandre Mourachko
-
Hacking Neural Evaluation Metrics with Single Hub Text
Hiroyuki Deguchi, Katsuki Chousa, Yusuke Sakai
-
ContextLeak: Auditing Leakage in Private In-Context Learning Methods
Jacob Choi, Shuying Cao, Xingjian Dong, Wang Bill Zhu, Robin Jia, Sai Praneeth Karimireddy
-
Hao Li, Yubing Ren, Yanan Cao, Yingjie Li, Fang Fang, Shi Wang, Li Guo
-
Jiaheng Geng, Jiatong Du, Xinyu Zhang, Ye Li, Panqu Wang, Yanjun Huang
-
Pixel Super-Resolved Fluorescence Lifetime Imaging Using Deep Learning
Paloma Casteleiro Costa, Parnian Ghapandar Kashani, Xuhui Liu, Alexander Chen, Ary Portes, Julien Bec, Laura Marcu, Aydogan Ozcan
-
Adaptive Frequency Domain Alignment Network for Medical image segmentation
Zhanwei Li, Liang Li, Jiawan Zhang
-
DeContext as Defense: Safe Image Editing in Diffusion Transformers
Linghui Shen, Mingyue Cui, Xingyi Yang
-
Detecting Localized Deepfakes: How Well Do Synthetic Image Detectors Handle Inpainting?
Serafino Pandolfini, Lorenzo Pellegrini, Matteo Ferrara, Davide Maltoni
-
Dual-View Inference Attack: Machine Unlearning Amplifies Privacy Exposure
Lulu Xue, Shengshan Hu, Linqiang Qian, Peijin Guo, Yechao Zhang, Minghui Li, Yanjun Zhang, Dayong Ye, Leo Yu Zhang
-
Privacy Blur: Quantifying Privacy and Utility for Image Data Release
Saeed Mahloujifar, Narine Kokhlikyan, Chuan Guo, Kamalika Chaudhuri
-
In-Context Probing for Membership Inference in Fine-Tuned Language Models
Zhexi Lu, Hongliang Chi, Nathalie Baracaldo, Swanand Ravindra Kadhe, Yuseok Jeon, Lei Yu
-
A Systematic Study of Code Obfuscation Against LLM-based Vulnerability Detection
Xiao Li, Yue Li, Hao Wu, Yue Zhang, Yechao Zhang, Fengyuan Xu, Sheng Zhong
-
Empirical Evaluation of Structured Synthetic Data Privacy Metrics: Novel experimental framework
Milton Nicolás Plasencia Palacios, Alexander Boudewijn, Sebastiano Saccani, Andrea Filippo Ferraris, Diana Sofronieva, Giuseppe D'Acquisto, Filiberto Brozzetti, Daniele Panfilo, Luca Bortolussi
-
Security Risks of Agentic Vehicles: A Systematic Analysis of Cognitive and Cross-Layer Threats
Ali Eslami, Jiangbo Yu
-
MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
Saksham Sahai Srivastava, Haoyu He
-
Istiak Ahmed, Ripan Kumar Kundu, Khaza Anuarul Hoque
-
Perturb Your Data: Paraphrase-Guided Training Data Watermarking
Pranav Shetty, Mirazul Haque, Petr Babkin, Zhiqiang Ma, Xiaomo Liu, Manuela Veloso
-
Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception
Bangya Liu, Chengpo Yan, Chenghao Jiang, Suman Banerjee, Akarsh Prabhakara
-
BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs
Muhammad Zeeshan Karamat, Sadman Saif, Christiana Chamon Garcia
-
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
Kai Hu, Abhinav Aggarwal, Mehran Khodabandeh, David Zhang, Eric Hsin, Li Chen, Ankit Jain, Matt Fredrikson, Akash Bharadwaj
-
SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification
Hongbo Wang, MaungMaung AprilPyone, Isao Echizen
-
The Meta-Prompting Protocol: Orchestrating LLMs via Adversarial Feedback Loops
Fanzhe Fu
-
Quantifying Return on Security Controls in LLM Systems
Richard Helder Moulton, Austin O'Brien, John D. Hastings
-
Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang
-
Quantum Machine Learning for Cybersecurity: A Taxonomy and Future Directions
Siva Sai, Ishika Goyal, Shubham Sharma, Sri Harshita Manuri, Vinay Chamola, Rajkumar Buyya
-
Adversarial versification in portuguese as a jailbreak operator in LLMs
Joao Queiroz
-
How Do Semantically Equivalent Code Transformations Impact Membership Inference on LLMs for Code?
Hua Yang, Alejandro Velasco, Thanh Le-Cong, Md Nazmul Haque, Bowen Xu, Denys Poshyvanyk
-
BashArena: A Control Setting for Highly Privileged AI Agents
Adam Kaufman, James Lucassen, Tyler Tracy, Cody Rushing, Aryan Bhatt
-
Robust and Calibrated Detection of Authentic Multimedia Content
Sarim Hashmi, Abdelrahman Elsayed, Mohammed Talha Alam, Samuele Poppi, Nils Lukas
-
CLIP-FTI: Fine-Grained Face Template Inversion via CLIP-Driven Attribute Conditioning
Longchen Dai, Zixuan Shen, Zhiheng Zhou, Peipeng Yu, Zhihua Xia
-
Mukur Gupta, Niharika Gupta, Saifur Rahman, Shantanu Pal, Chandan Karmakar
-
An Efficient Gradient-Based Inference Attack for Federated Learning
Pablo Montaña-Fernández, Ines Ortega-Fernandez
-
Bits for Privacy: Evaluating Post-Training Quantization via Membership Inference
Chenxiang Zhang, Tongxi Qu, Zhong Li, Tian Zhang, Jun Pang, Sjouke Mauw
-
Xiangrui Xu, Zhize Li, Yufei Han, Bin Wang, Jiqiang Liu, Wei Wang
-
Adrián Detavernier, Jasper De Bock
-
Time-Varying Audio Effect Modeling by End-to-End Adversarial Training
Yann Bourdin, Pierrick Legrand, Fanny Roche
-
Ratang Sedimo, Ivoline C. Ngong, Jami Lashua, Joseph P. Near
-
Vahideh Zolfaghari
-
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
Tejas Anvekar, Fenil Bardoliya, Pavan K. Turaga, Chitta Baral, Vivek Gupta
-
MCR-VQGAN: A Scalable and Cost-Effective Tau PET Synthesis Approach for Alzheimer's Disease Imaging
Jin Young Kim, Jeremy Hudson, Jeongchul Kim, Qing Lyu, Christopher T. Whitlow
-
Unveiling the Attribute Misbinding Threat in Identity-Preserving Models
Junming Fu, Jishen Zeng, Yi Jiang, Peiyu Zhuang, Baoying Chen, Siyu Lu, Jianquan Yang
-
The Semantic Illusion: Certified Limits of Embedding-Based Hallucination Detection in RAG Systems
Debu Sinha
-
Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition
Ellie Zhou, Jihoon Chung, Olga Russakovsky
-
ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures
Zhonghao Yang, Cheng Luo, Daojing He, Yiming Li, Yu Li
-
Jiesong Lian, Ruizhe Zhong, Zixiang Zhou, Xiaoyue Mi, Yixue Hao, Yuan Zhou, Qinglin Lu, Long Hu, Junchi Yan
-
Jiesong Lian, Ruizhe Zhong, Zixiang Zhou, Xiaoyue Mi, Yixue Hao, Yuan Zhou, Qinglin Lu, Long Hu, Junchi Yan
-
SGM: Safety Glasses for Multimodal Large Language Models via Neuron-Level Detoxification
Hongbo Wang, MaungMaung AprilPyone, Isao Echizen
-
Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang
-
IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol
Yunhao Yao, Zhiqiang Wang, Haoran Cheng, Yihang Cheng, Haohua Du, Xiang-Yang Li
-
Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity
Shuai Dong, Jie Zhang, Guoying Zhao, Shiguang Shan, Xilin Chen
-
Dual Attention Guided Defense Against Malicious Edits
Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen
-
Towards Transferable Defense Against Malicious Image Edits
Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen
-
Xingfu Zhou, Pengfei Wang
-
Erasing CLIP Memories: Non-Destructive, Data-Free Zero-Shot class Unlearning in CLIP Models
Ashish Mishra, Tarun Kumar, Gyanaranjan Nayak, Arpit Shah, Suparna Bhattacharya, Martin Foltin
-
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World
Shuxin Zhao, Bo Lang, Nan Xiao, Yilang Zhang
-
Mimicking Human Visual Development for Learning Robust Image Representations
Ankita Raj, Kaashika Prajaapat, Tapan Kumar Gandhi, Chetan Arora
-
LCMem: A Universal Model for Robust Image Memorization Detection
Mischa Dombrowski, Felix Nützel, Bernhard Kainz
-
Yiheng Huang, Junhong Chen, Anqi Ning, Zhanhong Liang, Nick Michiels, Luc Claesen, Wenyin Liu
-
On Improving Deep Active Learning with Formal Verification
Jonathan Spiegelman, Guy Amir, Guy Katz
-
Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix
Wei Tao, Sheng Long, Xin Liu, Wei Li, Qing Tao
-
Black-Box Auditing of Quantum Model: Lifted Differential Privacy with Quantum Canaries
Baobao Song, Shiva Raj Pokhrel, Athanasios V. Vasilakos, Tianqing Zhu, Gang Li
-
PerProb: Indirectly Evaluating Memorization in Large Language Models
Yihan Liao, Jacky Keung, Xiaoxue Ma, Jingyu Zhang, Yicheng Sun
-
Unai Laskurain, Aitor Aguirre-Ortuzar, Urko Zurutuza
-
Penetration Testing of Agentic AI: A Comparative Security Analysis Across Models and Frameworks
Viet K. Nguyen, Mohammad I. Husain
-
ComMark: Covert and Robust Black-Box Model Watermarking with Compressed Samples
Yunfei Yang, Xiaojun Chen, Zhendong Zhao, Yu Zhou, Xiaoyan Gu, Juan Cao
-
Cybercrime and Computer Forensics in Epoch of Artificial Intelligence in India
Sahibpreet Singh, Shikha Dhiman
-
Edward Y. Chang
-
From Adversarial Poetry to Adversarial Tales: An Interpretability Research Agenda
Piercosma Bisconti, Marcello Galisai, Matteo Prandi, Federico Pierucci, Olga Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Marcantonio Brancale, Daniele Nardi
-
IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol
Yunhao Yao, Zhiqiang Wang, Haoran Cheng, Yihang Cheng, Haohua Du, Xiang-Yang Li
-
Towards Transferable Defense Against Malicious Image Edits
Jie Zhang, Shuai Dong, Shiguang Shan, Xilin Chen
-
CTIGuardian: A Few-Shot Framework for Mitigating Privacy Leakage in Fine-Tuned LLMs
Shashie Dilhara Batan Arachchige, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Dinusha Vatsalan, Dali Kaafar
-
Cisco Integrated AI Security and Safety Framework Report
Amy Chang, Tiffany Saade, Sanket Mendapara, Adam Swanda, Ankit Garg
-
Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning
Amin Jalal Aghdasian, Farzaneh Abdollahi, Ali Kamali Iglie
-
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Wenjing lu, Zerui Tao, Dongping Zhang, Yuning Qiu, Yang Yang, Qibin Zhao
-
SSAS: Cross-subject EEG-based Emotion Recognition through Source Selection with Adversarial Strategy
Yici Liu, Qi Wei Oung, Hoi Leong Lee
-
Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS
Sabrine Ennaji, Elhadj Benkhelifa, Luigi Vincenzo Mancini
-
Leonard Bereska, Zoe Tzifa-Kratira, Reza Samavi, Efstratios Gavves
-
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation
Richard J. Young
-
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models
Ali Al Sahili, Ali Chehab, Razane Tajeddine
-
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
Jiaqi Wang, Weijia Wu, Yi Zhan, Rui Zhao, Ming Hu, James Cheng, Wei Liu, Philip Torr, Kevin Qinghong Lin
-
Learning to Generate Cross-Task Unexploitable Examples
Haoxuan Qu, Qiuchi Xiang, Yujun Cai, Yirui Wu, Majid Mirmehdi, Hossein Rahmani, Jun Liu
-
Test-Time Modification: Inverse Domain Transformation for Robust Perception
Arpit Jadon, Joshua Niemeijer, Yuki M. Asano
-
Evaluating Adversarial Attacks on Federated Learning for Temperature Forecasting
Karina Chichifoi, Fabio Merizzi, Michele Colajanni
-
Dual-Phase Federated Deep Unlearning via Weight-Aware Rollback and Reconstruction
Changjun Zhou, Jintao Zheng, Leyou Yang, Pengfei Wang
-
Chethana Prasad Kabgere, Shylaja S S
-
Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
Asa Cooper Stickland, Jan Michelfeit, Arathi Mani, Charlie Griffin, Ollie Matthews, Tomek Korbak, Rogan Inglis, Oliver Makins, Alan Cooney
-
Less Is More: Sparse and Cooperative Perturbation for Point Cloud Attacks
Keke Tang, Tianyu Hao, Xiaofei Wang, Weilong Peng, Denghui Zhang, Peican Zhu, Zhihong Tian
-
MURIM: Multidimensional Reputation-based Incentive Mechanism for Federated Learning
Sindhuja Madabushi, Dawood Wasif, Jin-Hee Cho
-
The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces
Subramanyam Sahoo, Jared Junkin
-
Topologically-Stabilized Graph Neural Networks: Empirical Robustness Across Domains
Jelena Losic
-
Stability-Drift Early Warning for Cyber-Physical Systems Under Degradation Attacks
Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl
-
Bilevel Optimization for Covert Memory Tampering in Heterogeneous Multi-Agent Architectures (XAMT)
Akhil Sharma, Shaikh Yaser Arafat, Jai Kumar Sharma, Ken Huang
-
PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility
Md Nahid Hasan Shuvo, Moinul Hossain
-
On the Effectiveness of Membership Inference in Targeted Data Extraction from Large Language Models
Ali Al Sahili, Ali Chehab, Razane Tajeddine
-
Saad Alqithami
-
Detecting Prompt Injection Attacks Against Application Using Classifiers
Safwan Shaheer, G. M. Refatul Islam, Mohammad Rafid Hamid, Md. Abrar Faiaz Khan, Md. Omar Faruk, Yaseen Nur
-
PRIVEE: Privacy-Preserving Vertical Federated Learning Against Feature Inference Attacks
Sindhuja Madabushi, Ahmad Faraz Khan, Haider Ali, Ananthram Swami, Rui Ning, Hongyi Wu, Jin-Hee Cho
-
StegaVAR: Privacy-Preserving Video Action Recognition via Steganographic Domain Analysis
Lixin Chen, Chaomeng Chen, Jiale Zhou, Zhijian Wu, Xun Lin
-
GradID: Adversarial Detection via Intrinsic Dimensionality of Gradients
Mohammad Mahdi Razmjoo, Mohammad Mahdi Sharifian, Saeed Bagheri Shouraki
-
Animesh Mishra
-
Iterative Sampling Methods for Sinkhorn Distributionally Robust Optimization
Jie Wang
-
Ahmed Ryan, Junaid Mansur Ifti, Md Erfan, Akond Ashfaque Ur Rahman, Md Rayhanur Rahman
-
The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models
Md. Hasib Ur Rahman
-
One Leak Away: How Pretrained Model Exposure Amplifies Jailbreak Risks in Finetuned LLMs
Yixin Tan, Zhe Yu, Jun Sakuma
-
Samruddhi Baviskar
-
Auto-Tuning Safety Guardrails for Black-Box Large Language Models
Perry Abdulkadir
-
RAMBO: Reliability Analysis for Mamba through Bit-flip attack Optimization
Sanjay Das, Swastik Bhattacharya, Shamik Kundu, Arnab Raha, Souvik Kundu, Kanad Basu
-
Feeling the Strength but Not the Source: Partial Introspection in LLMs
Ely Hahami, Lavik Jain, Ishaan Sinha
-
Dynamic Homophily with Imperfect Recall: Modeling Resilience in Adversarial Networks
Saad Alqithami
-
Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data
Shubhada Agrawal, Aaditya Ramdas
-
Hua Ma, Ruoxi Sun, Minhui Xue, Xingliang Yuan, Carsten Rudolph, Surya Nepal, Ling Liu
-
Hellinger loss function for Generative Adversarial Networks
Giovanni Saraceno, Anand N. Vidyashankar, Claudio Agostinelli
-
Minfeng Qi, Qin Wang, Ruiqiang Li, Tianqing Zhu, Shiping Chen
-
Sim2Real Reinforcement Learning for Soccer skills
Jonathan Spraggett
-
Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models
Melih Catal, Pooja Rani, Harald C. Gall
-
Björn Deiseroth, Max Henning Höth, Kristian Kersting, Letitia Parcalabescu
-
Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints
Kai Yao, Marc Juarez
-
Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously
Andrew Adiletta, Kathryn Adiletta, Kemal Derya, Berk Sunar
-
CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare
Akash Ghosh, Srivarshinee Sridhar, Raghav Kaushik Ravi, Muhsin Muhsin, Sriparna Saha, Chirag Agarwal
-
Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models
Divya Kothandaraman, Jaclyn Pytlarz
-
CAT: Can Trust be Predicted with Context-Awareness in Dynamic Heterogeneous Networks?
Jie Wang, Zheng Yan, Jiahe Lan, Xuyan Li, Elisa Bertino
-
Attacking and Securing Community Detection: A Game-Theoretic Framework
Yifan Niu, Aochuan Chen, Tingyang Xu, Jia Li
-
SpectralKrum: A Spectral-Geometric Defense Against Byzantine Attacks in Federated Learning
Aditya Tripathi, Karan Sharma, Rahul Mishra, Tapas Kumar Maiti
-
Peichun Hua, Hao Li, Shanghao Shi, Zhiyuan Yu, Ning Zhang
-
Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors
Max McGuinness, Alex Serrano, Luke Bailey, Scott Emmons
-
Kaichuang Zhang, Wei Yin, Jinghao Yang, Ping Xu
-
CLOAK: Contrastive Guidance for Latent Diffusion-Based Data Obfuscation
Xin Yang, Omid Ardakanian
-
Adversarial Attacks Against Deep Learning-Based Radio Frequency Fingerprint Identification
Jie Ma, Junqing Zhang, Guanxiong Shen, Alan Marshall, Chip-Hong Chang
-
Junling Fan, George Rushevich, Giorgio Rusconi, Mengdi Zhu, Reiner Dizon-Paradis, Domenic Forte
-
Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs
Jing Cui, Yufei Han, Jianbin Jiao, Junge Zhang
-
Factor(U,T): Controlling Untrusted AI by Monitoring their Plans
Edward Lue Chee Lip, Anthony Channg, Diana Kim, Aaron Sandoval, Kevin Zhu
-
PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling
Jamal Al-Karaki, Muhammad Al-Zafar Khan, Rand Derar Mohammad Al Athamneh
-
Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?
Junchi Lu, Xinke Li, Yuheng Liu, Qi Alfred Chen
-
Robust MLLM Unlearning via Visual Knowledge Distillation
Yuhang Wang, Zhenxing Niu, Haoxuan Ji, Guangyu He, Haichang Gao, Gang Hua
-
Data-Chain Backdoor: Do You Trust Diffusion Models as Generative Data Supplier?
Junchi Lu, Xinke Li, Yuheng Liu, Qi Alfred Chen
-
Wenhan Wu, Zhili He, Huanghuang Liang, Yili Gong, Jiawei Jiang, Chuang Hu, Dazhao Cheng
-
Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention
Yang Yu, Zhuangzhuang Chen, Siqi Wang, Lanqing Li, Xiaomeng Li
-
Targeted Data Protection for Diffusion Model by Matching Training Trajectory
Hojun Lee, Mijin Koo, Yeji Song, Nojun Kwak
-
Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar
-
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
Tong Zhang, Carlos Hinojosa, Bernard Ghanem
-
FLARE: A Wireless Side-Channel Fingerprinting Attack on Federated Learning
Md Nahid Hasan Shuvo, Moinul Hossain, Anik Mallik, Jeffrey Twigg, Fikadu Dagefu
-
A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale
Vinoth Punniyamoorthy, Ashok Gadi Parthi, Mayilsamy Palanigounder, Ravi Kiran Kodali, Bikesh Kumar, Kabilan Kannan
-
Yash Srivastava, Shalin Jain, Sneha Awathare, Nitin Awathare
-
The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks
Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Tianyu Du, Jinbao Li, Jianhai Chen, Shouling Ji
-
How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation
Devanshu Sahoo, Vasudev Majhi, Arjun Neekhra, Yash Sinha, Murari Mandal, Dhruv Kumar
-
UACER: An Uncertainty-Aware Critic Ensemble Framework for Robust Adversarial Reinforcement Learning
Jiaxi Wu, Tiantian Zhang, Yuxing Wang, Yongzhe Chang, Xueqian Wang
-
Empirical evaluation of the Frank-Wolfe methods for constructing white-box adversarial attacks
Kristina Korotkova, Aleksandr Katrutsa
-
Agniva Maiti, Prajwal Panth, Suresh Chandra Satapathy
-
Watermarks for Language Models via Probabilistic Automata
Yangkun Wang, Jingbo Shang
-
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
Hongsin Lee, Hye Won Chung
-
Authority Backdoor: A Certifiable Backdoor Mechanism for Authoring DNNs
Han Yang, Shaofeng Li, Tian Dong, Xiangyu Xu, Guangchi Liu, Zhen Ling
-
Neha, Tarunpreet Bhatia
-
Virtual camera detection: Catching video injection attacks in remote biometric systems
Daniyar Kurmankhojayev, Andrei Shadrikov, Dmitrii Gordin, Mikhail Shkorin, Danijar Gabdullin, Aigerim Kambetbayeva, Kanat Kuatov
-
PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents
Yuqun Zhang, Yuxuan Zhao, Sijia Chen
-
Differential Privacy for Secure Machine Learning in Healthcare IoT-Cloud Systems
N Mangala, Murtaza Rangwala, S Aishwarya, B Eswara Reddy, Rajkumar Buyya, KR Venugopal, SS Iyengar, LM Patnaik
-
FBA$^2$D: Frequency-based Black-box Attack for AI-generated Image Detection
Xiaojing Chen, Dan Li, Lijun Peng, Jun YanŁetter, Zhiqing Guo, Junyang Chen, Xiao Lan, Zhongjie Ba, Yunfeng DiaoŁetter
-
Privacy-Preserving Computer Vision for Industry: Three Case Studies in Human-Centric Manufacturing
Sander De Coninck, Emilio Gamba, Bart Van Doninck, Abdellatif Bey-Temsamani, Sam Leroux, Pieter Simoens
-
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
Jan Betley, Jorio Cocola, Dylan Feng, James Chua, Andy Arditi, Anna Sztyber-Betley, Owain Evans
-
MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI
Fengli Wu, Vaidehi Patil, Jaehong Yoon, Yue Zhang, Mohit Bansal
-
FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning
Khurram Khalil, Khaza Anuarul Hoque
-
Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized
Er Jin, Yang Zhang, Yongli Mou, Yanfei Dong, Stefan Decker, Kenji Kawaguchi, Johannes Stegmaier
-
A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge
Zihao Ding, Mufeng Zhu, Zhongze Tang, Sheng Wei, Yao Liu
-
Goal inference with Rao-Blackwellized Particle Filters
Yixuan Wang, Dan P. Guralnik, Warren E. Dixon
-
Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs
Sohely Jahan, Ruimin Sun
-
Membership and Dataset Inference Attacks on Large Audio Generative Models
Jakub Proboszcz, Paweł Kochanski, Karol Korszun, Donato Crisostomi, Giorgio Strano, Emanuele Rodolà, Kamil Deja, Jan Dubinski
-
Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination
Ryosuke Nagumo, Hironori Fujisawa
-
Weiyi He, Yue Xing
-
Estimation of Stochastic Optimal Transport Maps
Sloan Nietert, Ziv Goldfeld
-
ObliInjection: Order-Oblivious Prompt Injection Attack to LLM Agents with Multi-source Data
Ruiqi Wang, Yuqi Jia, Neil Zhenqiang Gong
-
Reference Recommendation based Membership Inference Attack against Hybrid-based Recommender Systems
Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Hongsheng Hu, Wanchun Dou
-
ByteShield: Adversarially Robust End-to-End Malware Detection through Byte Masking
Daniel Gibert, Felip Manyà
-
Robust AI Security and Alignment: A Sisyphean Endeavor?
Apostol Vassilev
-
Zhongjie Jiang
-
Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning
Lama Alssum, Hani Itani, Hasan Abed Al Kader Hammoud, Philip Torr, Adel Bibi, Bernard Ghanem
-
TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0
Jinyu Chen, Long Shi, Taotao Wang, Jiaheng Wang, Wei Zhang
-
LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks
Najmul Hassan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang
-
SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models
Mohamed Afane, Abhishek Satyam, Ke Chen, Tao Li, Junaid Farooq, Juntao Chen
-
Futa Waseda, Shojiro Yamabe, Daiki Shiono, Kento Sasaki, Tsubasa Takahashi
-
Yong-Woon Kim
-
Yiming Lu
-
Jinghao Wang, Ping Zhang, Carter Yagemann
-
Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem
Shiva Gaire, Srijan Gyawali, Saroj Mishra, Suman Niroula, Dilip Thakur, Umesh Yadav
-
Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models
Michael R. Martin, Garrick Chan, Kwan-Liu Ma
-
Waleed Razzaq, Yun-Bo Zhao
-
Jiaming Zhang, Che Wang, Yang Cao, Longtao Huang, Wei Yang Bryan Lim
-
A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image Generation
Zhigang Jia, Duan Wang, Hengkai Wang, Yajun Xie, Meixiang Zhao, Xiaoyu Zhao
-
Yi Liu, Weixiang Han, Chengjun Cai, Xingliang Yuan, Cong Wang
-
Differentially Private Synthetic Data Generation Using Context-Aware GANs
Anantaa Kotal, Anupam Joshi
-
Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents
Xiang Chen, Yuling Shi, Qizhen Lan, Yuchao Qiu, Xiaodong Gu
-
When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation
Joshua Ward, Bochao Gu, Chi-Hua Wang, Guang Cheng
-
Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation
Sampriti Soor, Suklav Ghosh, Arijit Sur
-
Sampriti Soor, Suklav Ghosh, Arijit Sur
-
Keito Inoshita
-
Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization
Guangmingmei Yang, David J. Miller, George Kesidis
-
Robust Agents in Open-Ended Worlds
Mikayel Samvelyan
-
Fully Decentralized Certified Unlearning
Hithem Lamri, Michail Maniatakos
-
Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning
Junnan Qiu, Jie Li
-
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis
-
Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models
Huzaifa Arif, Pin-Yu Chen, Alex Gittens, James Diffenderfer, Bhavya Kailkhura
-
Worst-case generation via minimax optimization in Wasserstein space
Xiuyuan Cheng, Yao Xie, Linglingzhi Zhu, Yunqin Zhu
-
Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks
Thai Duong Nguyen, Ngoc-Tan Nguyen, Thanh-Dao Nguyen, Nguyen Van Huynh, Dinh-Hieu Tran, Symeon Chatzinotas
-
Secure and Privacy-Preserving Federated Learning for Next-Generation Underground Mine Safety
Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong
-
MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks
Tailun Chen, Yu He, Yan Wang, Shuo Shao, Haolun Zheng, Zhihao Liu, Jinfeng Li, Yuefeng Chen, Zhixuan Chu, Zhan Qin
-
Exposing and Defending Membership Leakage in Vulnerability Prediction Models
Yihan Liao, Jacky Keung, Xiaoxue Ma, Jingyu Zhang, Yicheng Sun
-
Developing a Strong CPS Defender: An Evolutionary Approach
Qingyuan Hu, Christopher M. Poskitt, Jun Sun, Yuqi Chen
-
Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs
Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu
-
WOLF: Werewolf-based Observations for LLM Deception and Falsehoods
Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O'Brien, Kevin Zhu
-
Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks
Shihao Li, Jiachen Li, Dongmei Chen
-
Anirudh Nakra, Nayeeb Rashid, Chau-Wai Wong, Min Wu
-
ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs
Mohammad M Maheri, Sunil Cotterill, Alex Davidson, Hamed Haddadi
-
How a Bit Becomes a Story: Semantic Steering via Differentiable Fault Injection
Zafaryab Haider, Md Hafizur Rahman, Shane Moeykens, Vijay Devabhaktuni, Prabuddha Chakraborty
-
Hybrid Attribution Priors for Explainable and Robust Model Training
Zhuoran Zhang, Feng Zhang, Shangyuan Li, Yang Shi, Yuanxing Zhang, Wei Chen, Tengjiao Wang, Kam-Fai Wong
-
HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate
Shenzhe Zhu
-
Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach
Hua Yang, Alejandro Velasco, Sen Fang, Bowen Xu, Denys Poshyvanyk
-
Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Chao Shen
-
CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification
Pingchuan Ma, Chengshuai Zhao, Bohan Jiang, Saketh Vishnubhatla, Ujun Jeong, Alimohammad Beigi, Adrienne Raglin, Huan Liu
-
AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration
Harish Karthikeyan, Yue Guo, Leo de Castro, Antigoni Polychroniadou, Leo Ardon, Udari Madhushani Sehwag, Sumitra Ganesh, Manuela Veloso
-
Optimization-Guided Diffusion for Interactive Scene Generation
Shihao Li, Naisheng Ye, Tianyu Li, Kashyap Chitta, Tuo An, Peng Su, Boyang Wang, Haiou Liu, Chen Lv, Hongyang Li
-
Xiqiao Xiong, Ouxiang Li, Zhuo Liu, Moxin Li, Wentao Shi, Fuli Feng, Xiangnan He
-
Auditing Games for Sandbagging
Jordan Taylor, Sid Black, Dillon Bowen, Thomas Read, Satvik Golechha, Alex Zelenka-Martin, Oliver Makins, Connor Kissane, Kola Ayonrinde, Jacob Merizian, Samuel Marks, Chris Cundy, Joseph Bloom
-
ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking
Yunzhe Li, Jianan Wang, Hongzi Zhu, James Lin, Shan Chang, Minyi Guo
-
Towards Robust Protective Perturbation against DeepFake Face Swapping
Hengyang Yao, Lin Li, Ke Sun, Jianing Qiu, Huiping Chen
-
When normalization hallucinates: unseen risks in AI-powered whole slide image processing
Karel Moens, Matthew B. Blaschko, Tinne Tuytelaars, Bart Diricx, Jonas De Vylder, Mustafa Yousif
-
Forget and Explain: Transparent Verification of GNN Unlearning
Imran Ahsan (1), Hyunwook Yu (2), Jinsung Kim (2), Mucheol Kim (2) ((1) Department of Smart Cities, Chung-Ang University, (2) Department of Computer Science and Engineering, Chung-Ang University)
-
Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models
Kassoum Sanogo, Renzo Ardiccioni
-
Richard Young
-
Fenghua Weng, Chaochao Lu, Xia Hu, Wenqi Shao, Wenjie Wang
-
Siyuan Xu, Yibing Liu, Peilin Chen, Yung-Hui Li, Shiqi Wang, Sam Kwong
-
Ziming Hong, Tianyu Huang, Runnan Chen, Shanshan Ye, Mingming Gong, Bo Han, Tongliang Liu
-
How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline
Chunhui Zhang, Li Liu, Zhipeng Zhang, Yong Wang, Hao Wen, Xi Zhou, Shiming Ge, Yanfeng Wang
-
Chih-Chung Hsu, Shao-Ning Chen, Chia-Ming Lee, Yi-Fang Wang, Yi-Shiuan Chou
-
Zhibo Liang, Tianze Hu, Zaiye Chen, Mingjie Tang
-
GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering
Jehyeok Yeon, Federico Cinus, Yifan Wu, Luca Luceri
-
Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods
Panagiota Kiourti, Anu Singh, Preeti Duraipandian, Weichao Zhou, Wenchao Li
-
RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting
Longjie Zhao, Ziming Hong, Zhenyang Ren, Runnan Chen, Mingming Gong, Tongliang Liu
-
SoK: Trust-Authorization Mismatch in LLM Agent Interactions
Guanquan Shi, Haohua Du, Zhiqiang Wang, Xiaoyu Liang, Weiwenpei Liu, Song Bian, Zhenyu Guan
-
FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations
Mayank Ravishankara
-
George Mikros
-
Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization
Donghang Duan, Xu Zheng, Yuefeng He, Chong Mu, Leyi Cai, Lizong Zhang
-
MATEX: A Multi-Agent Framework for Explaining Ethereum Transactions
Zifan Peng
-
RunawayEvil: Jailbreaking the Image-to-Video Generative Models
Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan
-
Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation
Ali Ebrahimpour-Boroojeny
-
Uniqueness ratio as a predictor of a privacy leakage
Danah A. AlSalem AlKhashti
-
Metaphor-based Jailbreaking Attacks on Text-to-Image Models
Chenyu Zhang, Yiwen Ma, Lanjun Wang, Wenhui Li, Yi Tu, An-An Liu
-
Protecting Bystander Privacy via Selective Hearing in Audio LLMs
Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland
-
Delete and Retain: Efficient Unlearning for Document Classification
Aadya Goel, Mayuri Sridhar
-
Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation
Xining Song, Zhihua Wei, Rui Wang, Haixiao Hu, Yanxiang Chen, Meng Han
-
Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses
Mehrab Hosain, Sabbir Alom Shuvo, Matthew Ogbe, Md Shah Jalal Mazumder, Yead Rahman, Md Azizul Hakim, Anukul Pandey
-
Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks
Saeid Jamshidi, Kawser Wazed Nafi, Arghavan Moradi Dakhel, Negar Shahabi, Foutse Khomh, Naser Ezzati-Jivan
-
Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection
Jiabao Guo, Yadian Wang, Hui Ma, Yuhao Fu, Ju Jia, Hui Liu, Shengeng Tang, Lechao Cheng, Yunfeng Diao, Ajian Liu
-
AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars
Ramazan Fazylov, Sergey Zagoruyko, Aleksandr Parkin, Stamatis Lefkimmiatis, Ivan Laptev
-
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation
Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang Liu
-
Quantization Blindspots: How Model Compression Breaks Backdoor Defenses
Rohan Pandey, Eric Ye
-
Privacy Loss of Noise Perturbation via Concentration Analysis of A Product Measure
Shuainan Liu, Tianxi Ji, Zhongshuo Fang, Lu Wei, Pan Li
-
Mitigating Self-Preference by Authorship Obfuscation
Taslim Mahbub, Shi Feng
-
Hua Wang, Jinghao Lu, Fan Zhang
-
Matching Ranks Over Probability Yields Truly Deep Safety Alignment
Jason Vega, Gagandeep Singh
-
Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee, Mostafa Musharrat, Sai Vishnu Vamsi
-
Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data
Georgy Perevozchikov, Nancy Mehta, Egor Ershov, Radu Timofte
-
VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack
Shiji Zhao, Shukun Xiong, Yao Huang, Yan Jin, Zhenyu Wu, Jiyang Guan, Ranjie Duan, Jialing Tao, Hui Xue, Xingxing Wei
-
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller
-
Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models
Mahesh Kumar Nandwana, Youngwan Lim, Joseph Liu, Alex Yang, Varun Notibala, Nishchaie Khanna
-
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
Igor Shilov, Alex Cloud, Aryo Pradipta Gema, Jacob Goldman-Wetzler, Nina Panickssery, Henry Sleight, Erik Jones, Cem Anil
-
LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction
Marius F.R. Juston, Ramavarapu S. Sreenivas, Dustin Nottage, Ahmet Soylemezoglu
-
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates
Neil G. Marchant, Andrew C. Cullen, Feng Liu, Sarah M. Erfani
-
PrivCode: When Code Generation Meets Differential Privacy
Zheng Liu, Chen Gong, Terry Yue Zhuo, Kecen Li, Weichen Yu, Matt Fredrikson, Tianhao Wang
-
TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations
Xiuyuan Chen, Jian Zhao, Yuxiang He, Yuan Xun, Xinwei Liu, Yanshu Li, Huilin Zhou, Wei Cai, Ziyan Shi, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li
-
Ana-Maria Cretu, Klim Kireev, Amro Abdalla, Wisdom Obinna, Raphael Meier, Sarah Adel Bargal, Elissa M. Redmiles, Carmela Troncoso
-
Weikai Lu, Ziqian Zeng, Kehua Zhang, Haoran Li, Huiping Zhuang, Ruidong Wang, Cen Chen, Hao Peng
-
Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
Ju-Young Kim, Ji-Hong Park, Myeongjun Kim, Gun-Woo Kim
-
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models
Fan Yang
-
Auto-SPT: Automating Semantic Preserving Transformations for Code
Ashish Hooda, Mihai Christodorescu, Chuangang Ren, Aaron Wilson, Kassem Fawaz, Somesh Jha
-
When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models
S.M. Mustaqim, Anantaa Kotal, Paul H. Yi
-
Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Zhuo Wang, W.K. Chan
-
Sheng Liu, Panos Papadimitratos
-
SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling
Ankit Gupta, Christoph Adami, Emily Dolson (Michigan State University)
-
The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
Jiale Zhao, Xing Mou, Jinlin Wu, Hongyuan Yu, Mingrui Sun, Yang Shi, Xuanwu Yin, Zhen Chen, Zhen Lei, Yaohua Wang
-
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Fabian Konstantinidis, Moritz Sackmann, Ulrich Hofmann, Christoph Stiller
-
M Zeeshan, Saud Satti
-
Adversarial Limits of Quantum Certification: When Eve Defeats Detection
Davut Emre Tasar
-
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li
-
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yue Zhao, Xiyang Hu
-
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
Wei Zhao, Zhe Li, Jun Sun
-
The Universal Weight Subspace Hypothesis
Prakhar Kaushik, Shravan Chaudhari, Ankit Vaidya, Rama Chellappa, Alan Yuille
-
L. D. M. S. Sai Teja, N. Siva Gopala Krishna, Ufaq Khan, Muhammad Haris Khan, Partha Pakray, Atul Mishra
-
Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering
Marco Pintore, Maura Pintor, Dimosthenis Karatzas, Battista Biggio
-
Sheng Hang, Chaoxiang He, Hongsheng Hu, Hanqing Hu, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang
-
Guanchen Du, Jianlong Xu, Wei Wei
-
Physics-Guided Deepfake Detection for Voice Authentication Systems
Alireza Mohammadi, Keshav Sood, Dhananjay Thiruvady, Asef Nazari
-
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Cong Wang, Changfeng Gao, Yang Xiang, Zhihao Du, Keyu An, Han Zhao, Qian Chen, Xiangang Li, Yingming Gao, Ya Li
-
Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan
-
SELF: A Robust Singular Value and Eigenvalue Approach for LLM Fingerprinting
Hanxiu Zhang, Yue Zheng
-
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs
Tengyun Ma, Jiaqi Yao, Daojing He, Shihao Peng, Yu Li, Shaohui Liu, Zhuotao Tian
-
Out-of-the-box: Black-box Causal Attacks on Object Detectors
Melane Navaratnarajah, David A. Kelly, Hana Chockler
-
In-Context Representation Hijacking
Itay Yona, Amir Sarid, Michael Karasik, Yossi Gandelsman
-
TARA Test-by-Adaptive-Ranks for Quantum Anomaly Detection with Conformal Prediction Guarantees
Davut Emre Tasar, Ceren Ocal Tasar
-
Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits
Robert Dilworth
-
Zhigang Yang, Yuan Liu, Jiawei Zhang, Puning Zhang, Xinqiang Ma
-
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Ge-Peng Ji, Jingyi Liu, Deng-Ping Fan, Nick Barnes
-
Towards Irreversible Machine Unlearning for Diffusion Models
Xun Yuan, Zilong Zhao, Jiayu Li, Aryan Pasikhani, Prosanta Gope, Biplab Sikdar
-
Automatic Attack Discovery for Few-Shot Class-Incremental Learning via Large Language Models
Haidong Kang, Wei Wu, Hanling Wang
-
Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs
Oren Rachmil, Roy Betser, Itay Gershon, Omer Hofman, Nitay Yakoby, Yuval Meron, Idan Yankelev, Asaf Shabtai, Yuval Elovici, Roman Vainshtein
-
Immunity memory-based jailbreak detection: multi-agent adaptive guard for large language models
Jun Leng, Litian Zhang, Xi Zhang
-
Rethinking Security in Semantic Communication: Latent Manipulation as a New Threat
Zhiyuan Xi, Kun Zhu
-
Towards Privacy-Preserving Range Queries with Secure Learned Spatial Index over Encrypted Data
Zuan Wang, Juntao Lu, Jiazhuang Wu, Youliang Tian, Wei Song, Qiuxian Li, Duo Zhang
-
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Huy Nghiem, Swetasudha Panda, Devashish Khatwani, Huy V. Nguyen, Krishnaram Kenthapadi, Hal Daumé III
-
Peter B. Walker, Hannah Davidson, Aiden Foster, Matthew Lienert, Thomas Pardue, Dale Russell
-
Leon Mayer, Piotr Kalinowski, Caroline Ebersbach, Marcel Knopp, Tim Rädsch, Evangelia Christodoulou, Annika Reinke, Fiona R. Kolbinger, Lena Maier-Hein
-
Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness
Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Jing Lin
-
One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
Biagio Montaruli, Luca Compagna, Serena Elisa Ponta, Davide Balzarotti
-
Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems
Ruichao Liang, Le Yin, Jing Chen, Cong Wu, Xiaoyu Zhang, Huangpeng Gu, Zijian Zhang, Yang Liu
-
WildCode: An Empirical Analysis of Code Generated by ChatGPT
Kobra Khanmohammadi, Pooria Roy, Raphael Khoury, Abdelwahab Hamou-Lhadj, Wilfried Patrick Konan
-
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin
-
Fast and Flexible Robustness Certificates for Semantic Segmentation
Thomas Massena (IRIT-MISFIT, DTIPG - SNCF, UT3), Corentin Friedrich, Franck Mamalet, Mathieu Serrurier (IRIT-MISFIT)
-
Yubo Hou, Mohamed Ragab, Min Wu, Chee-Keong Kwoh, Xiaoli Li, Zhenghua Chen
-
Invasive Context Engineering to Control Large Language Models
Thomas Rivasseau
-
COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers
Junyu Wang, Changjia Zhu, Yuanbo Zhou, Lingyao Li, Xu He, Junjie Xiong
-
VACoT: Rethinking Visual Data Augmentation with VLMs
Zhengzhuo Xu, Chong Sun, SiNan Du, Chen Li, Jing Lyu, Chun Yuan
-
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
Tsimur Hadeliya, Mohammad Ali Jauhar, Nidhi Sakpal, Diogo Cruz
-
ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce
Zheng Fang, Donghao Xie, Ming Pang, Chunyuan Yuan, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo
-
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography
Mayar Elfares, Pascal Reisert, Tilman Dietz, Manpa Barman, Ahmed Zaki, Ralf Küsters, Andreas Bulling
-
Reasoning-Aware Multimodal Fusion for Hateful Video Detection
Shuonan Yang, Tailin Chen, Jiangbei Yue, Guangliang Cheng, Jianbo Jiao, Zeyu Fu
-
Defense That Attacks: How Robust Models Become Better Attackers
Mohamed Awad, Mahmoud Akrm, Walid Gomaa
-
GraphMatch: Fusing Language and Graph Representations in a Dynamic Two-Sided Work Marketplace
Mikołaj Sacha, Hammad Jafri, Mattie Terzolo, Ayan Sinha, Andrew Rabinovich
-
Lumos: Let there be Language Model System Certification
Isha Chaudhary, Vedaant Jain, Avaljot Singh, Kavya Sachdeva, Sayan Ranu, Gagandeep Singh
-
LeechHijack: Covert Computational Resource Exploitation in Intelligent Agent Systems
Yuanhe Zhang, Weiliu Wang, Zhenhong Zhou, Kun Wang, Jie Zhang, Li Sun, Yang Liu, Sen Su
-
Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities
Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao
-
SAGE: Style-Adaptive Generalization for Privacy-Constrained Semantic Segmentation Across Domains
Qingmei Li, Yang Zhang, Peifeng Zhang, Haohuan Fu, Juepeng Zheng
-
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu
-
FGC-Comp: Adaptive Neighbor-Grouped Attribute Completion for Graph-based Anomaly Detection
Junpeng Wu, Pinheng Zong
-
Adversarial Jamming for Autoencoder Distribution Matching
Waleed El-Geresy, Deniz Gündüz
-
FiMMIA: scaling semantic perturbation-based membership inference across modalities
Anton Emelyanov, Sergei Kudriashov, Alena Fenogenova
-
Adaptive Decentralized Federated Learning for Robust Optimization
Shuyuan Wu, Feifei Wang, Yuan Gao, Hansheng Wang
-
Quantum Vanguard: Server Optimized Privacy Fortified Federated Intelligence for Future Vehicles
Dev Gurung, Shiva Raj Pokhrel
-
Ziyi Tong, Feifei Sun, Le Minh Nguyen
-
HydroDCM: Hydrological Domain-Conditioned Modulation for Cross-Reservoir Inflow Prediction
Pengfei Hu, Fan Ming, Xiaoxue Han, Chang Lu, Yue Ning, Dan Lu
-
Robust Tabular Foundation Models
Matthew Peroni, Franck Le, Vadim Sheinin
-
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
Kunj Joshi, David A. Smith
-
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li
-
Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs
Kunj Joshi, Jaydeep Borkar, David A. Smith
-
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Songwen Zhao, Danqing Wang, Kexun Zhang, Jiaxuan Luo, Zhuo Li, Lei Li
-
DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling
Han-Jin Lee, Han-Ju Lee, Jin-Seong Kim, Seok-Hwan Choi
-
Yongxin Zhou, Philippe Mulhem, Didier Schwab
-
Generative Adversarial Gumbel MCTS for Abstract Visual Composition Generation
Zirui Zhao, Boye Niu, David Hsu, Wee Sun Lee
-
EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations
Xinyun Zhou, Xinfeng Li, Yinan Peng, Ming Xu, Xuanwang Zhang, Miao Yu, Yidong Wang, Xiaojun Jia, Kun Wang, Qingsong Wen, XiaoFeng Wang, Wei Dong
-
Dual Randomized Smoothing: Beyond Global Noise Variance
Chenhao Sun, Yuhao Mao, Martin Vechev
-
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
Jinghan Jia, Nathalie Baracaldo, Sijia Liu
-
Securing Large Language Models (LLMs) from Prompt Injection Attacks
Omar Farooq Khan Suri, John McCrae
-
Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory
Chenyi Wang, Yanmao Man, Raymond Muller, Ming Li, Z. Berkay Celik, Ryan Gerdes, Jonathan Petit
-
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Haoran Li, Jiayu Lv, Congying Han, Zicheng Zhang, Anqi Li, Yan Liu, Tiande Guo, Nan Jiang
-
Ali Nafisi, Sina Asghari, Mohammad Saeed Arvenaghi, Hossein Shakibania
-
Label Forensics: Interpreting Hard Labels in Black-Box Text Classifier
Mengyao Du, Gang Yang, Han Fang, Quanjun Yin, Ee-chien Chang
-
SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models
Stella Etuk, Ashraf Matrawy
-
On the Unreasonable Effectiveness of Last-layer Retraining
John C. Hill, Tyler LaBonte, Xinchen Zhang, Vidya Muthukumar
-
Jimin Choi, Max Z. Li
-
Differentially Private and Federated Structure Learning in Bayesian Networks
Ghita Fassy El Fehri, Aurélien Bellet, Philippe Bastien
-
Zihao Wang, Kar Wai Fok, Vrizlynn L. L. Thing
-
Rongzhe Wei, Peizhi Niu, Xinjie Shen, Tony Tu, Yifan Li, Ruihan Wu, Eli Chien, Olgica Milenkovic, Pan Li
-
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
Lewen Yan, Jilin Mei, Tianyi Zhou, Lige Huang, Jie Zhang, Dongrui Liu, Jing Shao
-
Factor(T,U): Factored Cognition Strengthens Monitoring of Untrusted AI
Aaron Sandoval, Cody Rushing
-
Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters
Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino
-
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu, Bo Ni, Han Xu, Kunpeng Liu, Dan Lin, Tyler Derr
-
Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare
Adeela Bashir, The Anh han, Zia Ush Shamszaman
-
CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing
Zixia Wang, Gaojie Jin, Jia Hu, Ronghui Mu
-
Dual Randomized Smoothing: Beyond Global Noise Variance
Chenhao Sun, Yuhao Mao, Martin Vechev
-
DPAC: Distribution-Preserving Adversarial Control for Diffusion Sampling
Han-Jin Lee, Han-Ju Lee, Jin-Seong Kim, Seok-Hwan Choi
-
Cen Lu, Yung-Chen Tang, Andrea Cavallaro
-
Concept-Guided Backdoor Attack on Vision Language Models
Haoyu Shen, Weimin Lyu, Haotian Xu, Tengfei Ma
-
Graph Data Augmentation with Contrastive Learning on Covariate Distribution Shift
Fanlong Zeng, Wensheng Gan
-
Bias Injection Attacks on RAG Databases and Sanitization Defenses
Hao Wu, Prateek Saxena
-
World Model Robustness via Surprise Recognition
Geigh Zollicoffer, Tanush Chopra, Mingkuan Yan, Xiaoxu Ma, Kenneth Eaton, Mark Riedl
-
Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios
Jianxiang Zang, Yongda Wei, Ruxue Bai, Shiyu Jiang, Nijia Mo, Binhong Li, Qiang Sun, Hui Liu
-
The Outline of Deception: Physical Adversarial Attacks on Traffic Signs Using Edge Patches
Haojie Jia, Te Hu, Haowen Li, Long Jin, Chongshi Xin, Yuchi Yao, Jiarui Xiao
-
Mitigating Indirect Prompt Injection via Instruction-Following Intent Analysis
Mintong Kang, Chong Xiang, Sanjay Kariyappa, Chaowei Xiao, Bo Li, Edward Suh
-
Tao Zhang, Yevgeniy Vorobeychik
-
Chenyi Zhang, Tao Shang, Chao Guo, Ruohan He
-
SEA: Spectral Edge Attacks on Graph Neural Networks
Yongyu Wang
-
When Safety Blocks Sense: Measuring Semantic Confusion in LLM Refusals
Riad Ahmed Anonto, Md Labid Al Nahiyan, Md Tanvir Hassan
-
UMM-RM: An Upcycle-and-Merge MoE Reward Model for Mitigating Reward Hacking
Lingling Fu, Yongfu Xue
-
Tao Zhang, Yevgeniy Vorobeychik
-
Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning
Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi
-
Gradient Inversion in Federated Reinforcement Learning
Shenghong He
-
Adversarial Signed Graph Learning with Differential Privacy
Haobin Ke, Sen Zhang, Qingqing Ye, Xun Ran, Haibo Hu
-
Red Teaming Large Reasoning Models
Jiawei Chen, Yang Yang, Chao Yu, Yu Tian, Zhi Cao, Linghao Li, Hang Su, Zhaoxia Yin
-
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li
-
IslandRun: Privacy-Aware Multi-Objective Orchestration for Distributed AI Inference
Bala Siva Sai Akhil Malepati
-
Goutham Nalagatla
-
Razieh Ghaedi, AmirReza BabaAhmadi, Reyer Zwiggelaar, Xinqi Fan, Nashid Alam
-
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models
Zhongqi Wang, Jie Zhang, Shiguang Shan, Xilin Chen
-
Yongkang Hu, Yu Cheng, Yushuo Zhang, Yuan Xie, Zhaoxia Yin
-
Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning
Na Li, Zewu Zheng, Wei Ni, Hangguan Shan, Wenjie Zhang, Xinyu Li
-
Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics
Deep Patel, Emmanouil-Vasileios Vlatakis-Gkaragkounis
-
Benjamin D. Ballyk, Ankit Gupta, Sujay Konda, Kavitha Subramanian, Chris Landon, Ahmed Ammar Naseer, Georg Maierhofer, Sumanth Swaminathan, Vasudevan Venkateshwaran
-
Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation
Timur Sattarov, Marco Schreyer, Damian Borth
-
RECTor: Robust and Efficient Correlation Attack on Tor
Binghui Wu, Dinil Mon Divakaran, Levente Csikor, Mohan Gurusamy
-
TrojanLoC: LLM-based Framework for RTL Trojan Localization
Weihua Xiao, Zeng Wang, Minghao Shao, Raghu Vamshi Hemadri, Ozgur Sinanoglu, Muhammad Shafique, Johann Knechtel, Siddharth Garg, Ramesh Karri
-
Superpixel Attack: Enhancing Black-box Adversarial Attack with Image-driven Division Areas
Issa Oe, Keiichiro Yamamura, Hiroki Ishikura, Ryo Hamahira, Katsuki Fujisawa
-
Yang Li, Chong Ma, Yuanzheng Li, Sen Li, Yanbo Chen, Zhaoyang Dong
-
Pruning Graphs by Adversarial Robustness Evaluation to Strengthen GNN Defenses
Yongyu Wang
-
WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols
Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi
-
Adversarial Training for Process Reward Models
Gurusha Juneja, Deepak Nathani, William Yang Wang
-
AgentShield: Make MAS more secure and efficient
Kaixiang Wang, Zhaojiacheng Zhou, Bunyod Suvonov, Jiong Lou, Jie LI
-
Are LLMs Good Safety Agents or a Propaganda Engine?
Neemesh Yadav, Francesco Ortu, Jiarui Liu, Joeun Yook, Bernhard Schölkopf, Rada Mihalcea, Alberto Cazzaniga, Zhijing Jin
-
Fault-Tolerant MARL for CAVs under Observation Perturbations for Highway On-Ramp Merging
Yuchen Shi, Huaxin Pei, Yi Zhang, Danya Yao
-
A Game-Theoretic Approach for Adversarial Information Fusion in Distributed Sensor Networks
Kassem Kallas
-
Hoang Khang Phan, Nhat Tan Le
-
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
Matt MacDermott, Qiyao Wei, Rada Djoneva, Francis Rhys Ward
-
DeFi TrustBoost: Blockchain and AI for Trustworthy Decentralized Financial Decisions
Swati Sachan, Dale S. Fickett
-
Does Self-Evaluation Enable Wireheading in Language Models?
David Demitri Africa, Hans Ethan Ting
-
Pirzada Suhail, Rehna Afroz, Amit Sethi
-
An Empirical Study on the Security Vulnerabilities of GPTs
Tong Wu, Weibin Wu, Zibin Zheng
-
Watermarks for Embeddings-as-a-Service Large Language Models
Anudeex Shetty
-
Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning
Jiajun Guo, Xin Luo, Jiayin Zheng, Yiqun Wang, Kai-Wei Chang, Wei Wang, Jie Liu
-
AI Deception: Risks, Dynamics, and Controls
Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan, Brian Tse, Jen-Tse (Jay)Huang, Kun Wang, Baihui Zheng, Jiaheng Liu, Jian Yang, Yiming Li, Wenting Chen, Dongrui Liu, Lukas Vierling, Zhiheng Xi, Haobo Fu, Wenxuan Wang, Jitao Sang, Zhengyan Shi, Chi-Min Chan, Eugenie Shi, Simin Li, Juncheng Li, Wei Ji, Dong Li, Jun Song, Yinpeng Dong, Jie Fu, Bo Zheng, Min Yang, Yike Guo, Philip Torr, Zhongyuan Wang, Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, Hongjiang Zhang, Andrew Yao
-
A Safety and Security Framework for Real-World Agentic Systems
Shaona Ghosh, Barnaby Simkin, Kyriacos Shiarlis, Soumili Nandi, Dan Zhao, Matthew Fiedler, Julia Bazinska, Nikki Pope, Roopa Prabhu, Daniel Rohrer, Michael Demoret, Bartley Richardson
-
Tianyu Zhang, Zihang Xi, Jingyu Hua, Sheng Zhong
-
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante, Md Mokarram Chowdhury, Yang Li
-
RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks
Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang
-
GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents
Xinyu Zhang, Yixin Wu, Boyang Zhang, Chenhao Lin, Chao Shen, Michael Backes, Yang Zhang
-
PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration
Junfei Zhan, Haoxun Shen, Zheng Lin, Tengjiao He
-
Mingzhe Li, Renhao Zhang, Zhiyang Wen, Siqi Pan, Bruno Castro da Silva, Juan Zhai, Shiqing Ma
-
Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation
Daniel Sungho Jung, Kyoung Mu Lee
-
Creating Blank Canvas Against AI-enabled Image Forgery
Qi Song, Ziyuan Luo, Renjie Wan
-
Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?
Wenkai Huang, Yijia Guo, Gaolei Li, Lei Ma, Hang Zhang, Liwen Hu, Jiazheng Wang, Jianhua Li, Tiejun Huang
-
ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection
Runzhi Deng, Yundi Hu, Xinshuang Zhang, Zhao Wang, Xixi Liu, Wang-Zhou Dai, Caifeng Shan, Fang Zhao
-
Shanchuan Lin, Ceyuan Yang, Zhijie Lin, Hao Chen, Haoqi Fan
-
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Yuan Yao, Lixu Wang, Jiaqi Wu, Jin Song, Simin Chen, Zehua Wang, Zijian Tian, Wei Chen, Huixia Li, Xiaoxiao Li
-
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
Guanxi Lu, Hao Mark Chen, Zhiqiang Que, Wayne Luk, Hongxiang Fan
-
Privacy-Utility-Bias Trade-offs for Privacy-Preserving Recommender Systems
Shiva Parsarad, Isabel Wagner
-
Difficulties with Evaluating a Deception Detector for AIs
Lewis Smith, Bilal Chughtai, Neel Nanda
-
An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks
Kanchon Gharami, Shafika Showkat Moni
-
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
Richard J. Young
-
Minghui Min, Yulu Li, Gang Li, Meng Li, Hongliang Zhang, Miao Pan, Dusit Niyato, Zhu Han
-
Exposing Vulnerabilities in RL: A Novel Stealthy Backdoor Attack through Reward Poisoning
Bokang Zhang, Chaojun Lu, Jianhui Li, Junfeng Wu
-
CacheTrap: Injecting Trojans in LLMs without Leaving any Traces in Inputs or Weights
Mohaiminul Al Nahian (1), Abeer Matar A. Almalky (1), Gamana Aragonda (2), Ranyang Zhou (2), Sabbir Ahmed (1), Dmitry Ponomarev (1), Li Yang (3), Shaahin Angizi (2), Adnan Siraj Rakin (1) ((1) SUNY Binghamton, (2) New Jersey Institute of Technology, (3) UNC Charlotte)
-
Ghosting Your LLM: Without The Knowledge of Your Gradient and Data
Abeer Matar A. Almalky (1), Ziyan Wang (2), Mohaiminul Al Nahian (1), Li Yang (2), Adnan Siraj Rakin (1) ((1) Binghamton University, (2) UNC Charlotte)
-
NetDeTox: Adversarial and Efficient Evasion of Hardware-Security GNNs via RL-LLM Orchestration
Zeng Wang, Minghao Shao, Akashdeep Saha, Ramesh Karri, Johann Knechtel, Muhammad Shafique, Ozgur Sinanoglu
-
Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning
Linze Chen, Yufan Cai, Zhe Hou, Jinsong Dong
-
Resilient Charging Infrastructure via Decentralized Coordination of Electric Vehicles at Scale
Chuhao Qin, Alexandru Sorici, Andrei Olaru, Evangelos Pournaras, Adina Magda Florea
-
GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision
Yuxiao Xiang, Junchi Chen, Zhenchao Jin, Changtao Miao, Haojie Yuan, Qi Chu, Tao Gong, Nenghai Yu
-
Taehoon Kang, Taeyong Kim
-
Dongkyu Derek Cho, Huan Song, Arijit Ghosh Chowdhury, Haotian An, Yawei Wang, Rohit Thekkanal, Negin Sokhandan, Sharlina Keshava, Hannah Marlowe
-
CAHS-Attack: CLIP-Aware Heuristic Search Attack Method for Stable Diffusion
Shuhan Xia, Jing Dai, Hui Ouyang, Yadong Shang, Dongxiao Zhao, Peipei Li
-
Privacy in Federated Learning with Spiking Neural Networks
Dogukan Aksu, Jesus Martinez del Rincon, Ihsen Alouani
-
When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Qixin Zhang, Bingquan Shen, Alex C. Kot, Xudong Jiang
-
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
Yuhang Wang, Yanxu Zhu, Dongyuan Lu, Jitao Sang
-
Multimodal Robust Prompt Distillation for 3D Point Cloud Models
Xiang Gu, Liming Lu, Xu Zheng, Anan Du, Yongbin Zhou, Shuchao Pang
-
HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal
Kexin Li, Xiao Hu, Ilya Grishchenko, David Lie
-
Attention-Guided Patch-Wise Sparse Adversarial Attacks on Vision-Language-Action Models
Naifu Zhang, Wei Tao, Xi Xiao, Qianpu Sun, Yuxin Zheng, Wentao Mo, Peiqiang Wang, Nan Zhang
-
Escaping the Verifier: Learning to Reason via Demonstrations
Locke Cai, Ivan Provilkov
-
Al Amin, Kamrul Hasan, Liang Hong, Sharif Ullah
-
TEAR: Temporal-aware Automated Red-teaming for Text-to-Video Models
Jiaming He, Guanyu Hou, Hongwei Li, Zhicong Huang, Kangjie Chen, Yi Yu, Wenbo Jiang, Guowen Xu, Tianwei Zhang
-
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training
Haotian Xue, Qi Chen, Zhonghao Wang, Xun Huang, Eli Shechtman, Jinrong Xie, Yongxin Chen
-
Yaw Osei Adjei (Kwame Nkrumah University of Science and Technology)
-
Dataset Poisoning Attacks on Behavioral Cloning Policies
Akansha Kalra, Soumil Datta, Ethan Gilmore, Duc La, Guanhong Tao, Daniel S. Brown
-
Computing Strategic Responses to Non-Linear Classifiers
Jack Geary, Boyan Gao, Henry Gouk
-
EvilGenie: A Reward Hacking Benchmark
Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld
-
Data Exfiltration by Compression Attack: Definition and Evaluation on Medical Image Data
Huiyu Li, Nicholas Ayache, Hervé Delingette
-
Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI
Tien Dat Hoang
-
Active Learning for GCN-based Action Recognition
Hichem Sahbi
-
Deceptron: Learned Local Inverses for Fast and Stable Physics Inversion
Aaditya L. Kachhadiya
-
Standardized Threat Taxonomy for AI Security, Governance, and Regulatory Compliance
Hernan Huwyler
-
Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
Xinyu Liu, Xu Zhang, Can Chen, Ren Wang
-
ABLE: Using Adversarial Pairs to Construct Local Models for Explaining Model Predictions
Krishna Khadka, Sunny Shree, Pujan Budhathoki, Yu Lei, Raghu Kacker, D. Richard Kuhn
-
The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning
Ethan Hsu, Harry Chen, Chudi Zhong, Lesia Semenova
-
Fatemeh Akbarian, Anahita Baninajjar, Yingyi Zhang, Ananth Balashankar, Amir Aminifar
-
SA^2GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation
Junhua Shi, Qingyun Sun, Haonan Yuan, Xingcheng Fu
-
Self-Transparency Failures in Expert-Persona LLMs: How Instruction-Following Overrides Disclosure
Alex Diep
-
Steering Awareness: Models Can Be Trained to Detect Activation Steering
Joshua Fonseca Rivera, David Demitri Africa
-
Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning
Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang
-
Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries
Alexander Beiser, Flavio Martinelli, Wulfram Gerstner, Johanni Brea
-
Quantifying the Privacy Implications of High-Fidelity Synthetic Network Traffic
Van Tran, Shinan Liu, Tian Li, Nick Feamster
-
PaTAS: A Parallel System for Trust Propagation in Neural Networks Using Subjective Logic
Koffi Ismael Ouattara, Ioannis Krontiris, Theo Dimitrakos, Dennis Eisermann, Frank Kargl
-
Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
Arun Chowdary Sanna
-
Zero-Knowledge Proof Based Verifiable Inference of Models
Yunxiao Wang
-
On the Feasibility of Hijacking MLLMs' Decision Chain via One Perturbation
Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He
-
Sidahmed Benabderrahmane, James Cheney, Talal Rahwan
-
BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents
Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, Ninghui Li
-
Adversarial Confusion Attack: Disrupting Multimodal Large Language Models
Jakub Hoscilowicz, Artur Janicki
-
GFT-GCN: Privacy-Preserving 3D Face Mesh Recognition with Spectral Diffusion
Hichem Felouat, Hanrui Wang, Isao Echizen
-
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Sen Nie, Jie Zhang, Jianxin Yan, Shiguang Shan, Xilin Chen
-
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
Weijia Mao, Hao Chen, Zhenheng Yang, Mike Zheng Shou
-
TReFT: Taming Rectified Flow Models For One-Step Image Translation
Shengqian Li, Ming Gao, Yi Liu, Zuzeng Lin, Feng Wang, Feng Dai
-
GS-Checker: Tampering Localization for 3D Gaussian Splatting
Haoliang Han, Ziyuan Luo, Jun Qi, Anderson Rocha, Renjie Wan
-
Frequency Bias Matters: Diving into Robust and Generalized Deep Image Forgery Detection
Chi Liu, Tianqing Zhu, Wanlei Zhou, Wei Zhao
-
Jun Jia, Hongyi Miao, Yingjie Zhou, Linhan Cao, Yanwei Jiang, Wangqiu Zhou, Dandan Zhu, Hua Yang, Wei Sun, Xiongkuo Min, Guangtao Zhai
-
Latent Diffusion Inversion Requires Understanding the Latent Space
Mingxing Rao, Bowen Qu, Daniel Moyer
-
Shreevanth Krishnaa Gopalakrishnan, Stephen Hailes
-
Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
Yuhang Wang, Heye Huang, Zhenhua Xu, Kailai Sun, Baoshen Guo, Jinhua Zhao
-
Xiaojiao Xiao, Qinmin Vivian Hu, Tae Hyun Kim, Guanghui Wang
-
Trung Cuong Dang, David Mohaisen
-
Supporting Students in Navigating LLM-Generated Insecure Code
Jaehwan Park, Kyungchan Lim, Seonhye Park, Doowon Kim
-
Securing the Model Context Protocol (MCP): Risks, Controls, and Governance
Herman Errico, Jiquan Ngiam, Shanita Sojan
-
Categorical Framework for Quantum-Resistant Zero-Trust AI Security
I. Cherkaoui, C. Clarke, J. Horgan, I. Dey
-
Jun Jia, Hongyi Miao, Yingjie Zhou, Wangqiu Zhou, Jianbo Zhang, Linhan Cao, Dandan Zhu, Hua Yang, Xiongkuo Min, Wei Sun, Guangtao Zhai
-
FlowSteer: Guiding Few-Step Image Synthesis with Authentic Trajectories
Lei Ke, Hubery Yin, Gongye Liu, Zhengyao Lv, Jingcai Guo, Chen Li, Wenhan Luo, Yujiu Yang, Jing Lyu
-
Defending Large Language Models Against Jailbreak Exploits with Responsible AI Considerations
Ryan Wong (1), Hosea David Yu Fei Ng (1), Dhananjai Sharma (1), Glenn Jun Jie Ng (1), Kavishvaran Srinivasan (1) ((1) National University of Singapore)
-
Learning to Compress Graphs via Dual Agents for Consistent Topological Robustness Evaluation
Qisen Chai, Yansong Wang, Junjie Huang, Tao Jia
-
Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang
-
Mohamed Rissal Hedna, Sesugh Samuel Nder
-
Yingjia Shang, Yi Liu, Huimin Wang, Furong Li, Wenfang Sun, Wu Chengyu, Yefeng Zheng
-
Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning
James R. M. Black, Moritz S. Hanke, Aaron Maiwald, Tina Hernandez-Boussard, Oliver M. Crook, Jaspreet Pannu
-
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
Zhaolong Su, Wang Lu, Hao Chen, Sharon Li, Jindong Wang
-
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation
Junbo Zhang, Ran Chen, Qianli Zhou, Xinyang Deng, Wen Jiang
-
Now You See It, Now You Don't - Instant Concept Erasure for Safe Text-to-Image and Video Generation
Shristi Das Biswas, Arani Roy, Kaushik Roy
-
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
Juncheng Li, Yige Li, Hanxun Huang, Yunhao Chen, Xin Wang, Yixu Wang, Xingjun Ma, Yu-Gang Jiang
-
Leveraging Adversarial Learning for Pathological Fidelity in Virtual Staining
José Teixeira, Pascal Klöckner, Diana Montezuma, Melis Erdal Cesur, João Fraga, Hugo M. Horlings, Jaime S. Cardoso, Sara P. Oliveira
-
Beilin Chu, Weike You, Mengtao Li, Tingting Zheng, Kehan Zhao, Xuan Xu, Zhigao Lu, Jia Song, Moxuan Xu, Linna Zhou
-
Three-Dimensional Anatomical Data Generation Based on Artificial Neural Networks
Ann-Sophia Müller, Moonkwang Jeong, Meng Zhang, Jiyuan Tian, Arkadiusz Miernik, Stefanie Speidel, Tian Qiu
-
Robust and Generalizable GNN Fine-Tuning via Uncertainty-aware Adapter Learning
Bo Jiang, Weijun Zhao, Beibei Wang, Xiao Wang, Jin Tang
-
FedPoisonTTP: A Threat Model and Poisoning Attack for Federated Test-Time Personalization
Md Akil Raihan Iftee, Syed Md. Ahnaf Hasan, Amin Ahsan Ali, AKM Mahbubur Rahman, Sajib Mistry, Aneesh Krishna
-
Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic
Mostafa Mozafari, Farooq Ahmad Wani, Maria Sofia Bucarelli, Fabrizio Silvestri
-
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
Adarsh Kumarappan, Ayushi Mehrotra
-
Hi-SAFE: Hierarchical Secure Aggregation for Lightweight Federated Learning
Hyeong-Gun Joo, Songnam Hong, Seunghwan Lee, Dong-Joon Shin
-
Targeted Manipulation: Slope-Based Attacks on Financial Time-Series Data
Dominik Luszczynski
-
RoguePrompt: Dual-Layer Ciphering for Self-Reconstruction to Circumvent LLM Moderation
Benyamin Tafreshian
-
Yu Cui, Yifei Liu, Hang Fu, Sicheng Pan, Haibin Zhang, Cong Zuo, Licheng Wang
-
AttackPilot: Autonomous Inference Attacks Against ML Services With LLM-Based Agents
Yixin Wu, Rui Wen, Chi Cui, Michael Backes, Yang Zhang
-
Synthetic Data: AI's New Weapon Against Android Malware
Angelo Gaspar Diniz Nogueira, Kayua Oleques Paim, Hendrio Bragança, Rodrigo Brandão Mansilha, Diego Kreutz
-
Steven Peh
-
Maria Thoma, Michalis A. Savelonas, Dimitris K. Iakovidis
-
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
Adarsh Kumarappan, Ananya Mujoo
-
An Invariant Latent Space Perspective on Language Model Inversion
Wentao Ye, Jiaqi Hu, Haobo Wang, Xinpeng Ti, Zhiqing Xiao, Hao Chen, Liyao Li, Lei Feng, Sai Wu, Junbo Zhao
-
DISCO: A Browser-Based Privacy-Preserving Framework for Distributed Collaborative Learning
Julien T. T. Vignoud, Valérian Rousset, Hugo El Guedj, Ignacio Aleman, Walid Bennaceur, Batuhan Faik Derinbay, Eduard Ďurech, Damien Gengler, Lucas Giordano, Felix Grimberg, Franziska Lippoldt, Christina Kopidaki, Jiafan Liu, Lauris Lopata, Nathan Maire, Paul Mansat, Martin Milenkoski, Emmanuel Omont, Güneş Özgün, Mina Petrović, Francesco Posa, Morgan Ridel, Giorgio Savini, Marcel Torne, Lucas Trognon, Alyssa Unell, Olena Zavertiaieva, Sai Praneeth Karimireddy, Tahseen Rabbani, Mary-Anne Hartley, Martin Jaggi
-
EAGER: Edge-Aligned LLM Defense for Robust, Efficient, and Accurate Cybersecurity Question Answering
Onat Gungor, Roshan Sood, Jiasheng Zhou, Tajana Rosing
-
David Amebley, Sayanton Dibbo
-
Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
Andrew Maranhão Ventura D'addario
-
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
Adarsh Kumarappan, Ayushi Mehrotra
-
Natural Emergent Misalignment from Reward Hacking in Production RL
Monte MacDiarmid, Benjamin Wright, Jonathan Uesato, Joe Benton, Jon Kutasov, Sara Price, Naia Bouscal, Sam Bowman, Trenton Bricken, Alex Cloud, Carson Denison, Johannes Gasteiger, Ryan Greenblatt, Jan Leike, Jack Lindsey, Vlad Mikulik, Ethan Perez, Alex Rodrigues, Drake Thomas, Albert Webson, Daniel Ziegler, Evan Hubinger
-
Xiaoqing Wang, Keman Huang, Bin Liang, Hongyu Li, Xiaoyong Du
-
Evaluating perturbation robustnessof generative systems that use COBOL code inputs
Samuel Ackerman, Wesam Ibraheem, Orna Raz, Marcel Zalmanovici
-
Syed Mohaiminul Hoque, Naimur Rahman, Md Sakhawat Hossain
-
Richard J. Young
-
Hao Shen, Jikang Cheng, Renye Yan, Zhongyuan Wang, Wei Peng, Baojin Huang
-
Robust Physical Adversarial Patches Using Dynamically Optimized Clusters
Harrison Bagley, Will Meakin, Simon Lucey, Yee Wei Law, Tat-Jun Chin
-
Generative Myopia: Why Diffusion Models Fail at Structure
Milad Siami
-
Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks
Xunlei Qian, Yue Xing
-
Differential privacy with dependent data
Valentin Roth, Marco Avella-Medina
-
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang, Runpeng Geng, Jinghui Chen, Minhao Cheng, Jinyuan Jia
-
Building Resilient Information Ecosystems: Large LLM-Generated Dataset of Persuasion Attacks
Hsien-Te Kao, Aleksey Panasyuk, Peter Bautista, William Dupree, Gabriel Ganberg, Jeffrey M. Beaubien, Laura Cassani, Svitlana Volkova
-
Yi Zhang, Tianxiang Xu, Zijian Li, Chao Zhang, Kunyu Zhang, Zhan Gao, Meinuo Li, Xiaohan Zhang, Qichao Qi, Bing Chen
-
Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma
Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary
-
Svitlana Volkova, Will Dupree, Hsien-Te Kao, Peter Bautista, Gabe Ganberg, Jeff Beaubien, Laura Cassani
-
Yanxi Li, Ruocheng Shan
-
Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models
Jiayi Luo, Qingyun Sun, Lingjuan Lyu, Ziwei Zhang, Haonan Yuan, Xingcheng Fu, Jianxin Li
-
Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks
Jiayi Luo, Qingyun Sun, Yuecen Wei, Haonan Yuan, Xingcheng Fu, Jianxin Li
-
H. Zhang, L. Zhang, G. Epiphaniou, C. Maple
-
Adversarial Pseudo-replay for Exemplar-free Class-incremental Learning
Hiroto Honda
-
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Yusong Wu, Stephen Brade, Teng Ma, Tia-Jane Fowler, Enning Yang, Berker Banar, Aaron Courville, Natasha Jaques, Cheng-Zhi Anna Huang
-
Federated Anomaly Detection and Mitigation for EV Charging Forecasting Under Cyberattacks
Oluleke Babayomi, Dong-Seong Kim
-
Understanding Private Learning From Feature Perspective
Meng Ding, Mingxi Lei, Shaopeng Fu, Shaowei Wang, Di Wang, Jinhui Xu
-
Curvature-Aware Safety Restoration In LLMs Fine-Tuning
Thong Bach, Thanh Nguyen-Tang, Dung Nguyen, Thao Minh Le, Truyen Tran
-
Vulnerability-Aware Robust Multimodal Adversarial Training
Junrui Zhang, Xinyu Zhao, Jie Peng, Chenjie Wang, Jianmin Ji, Tianlong Chen
-
Beyond Jailbreak: Unveiling Risks in LLM Applications Arising from Blurred Capability Boundaries
Yunyi Zhang, Shibo Cui, Baojun Liu, Jingkai Yu, Min Zhang, Fan Shi, Han Zheng
-
ASTRA: Agentic Steerability and Risk Assessment Framework
Itay Hazan, Yael Mathov, Guy Shtar, Ron Bitton, Itsik Mantin
-
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Dheeraj Kulshrestha, Rajiv Ramnath
-
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
Yusong Wu, Stephen Brade, Aleksandra Teng Ma, Tia-Jane Fowler, Enning Yang, Berker Banar, Aaron Courville, Natasha Jaques, Cheng-Zhi Anna Huang
-
Geometric-Disentangelment Unlearning
Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Heng Ji, Huan Zhang
-
Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding
Daniil Ignatev, Ayman Santeer, Albert Gatt, Denis Paperno
-
Vision Language Models are Confused Tourists
Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji
-
MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models
Xiongtao Sun, Hui Li, Jiaming Zhang, Yujie Yang, Kaili Liu, Ruxin Feng, Wen Jun Tan, Wei Yang Bryan Lim
-
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution
Yushun Fang, Yuxiang Chen, Shibo Yin, Qiang Hu, Jiangchao Yao, Ya Zhang, Xiaoyun Zhang, Yanfeng Wang
-
ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP
Linxiang Su, András Balogh
-
Zheng Wang, Yi Zhang, Siddartha Khastgir, Carsten Maple, Xingyu Zhao
-
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models
Yuqi Li, Junhao Dong, Chuanguang Yang, Shiping Wen, Piotr Koniusz, Tingwen Huang, Yingli Tian, Yew-Soon Ong
-
Steering in the Shadows: Causal Amplification for Activation Space Attacks in Large Language Models
Zhiyuan Xu, Stanislav Abaimov, Joseph Gardiner, Sana Belguith
-
Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism
Yinjie Zhao, Heng Zhao, Bihan Wen, Joey Tianyi Zhou
-
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
Tom Perel
-
MURMUR: Using cross-user chatter to break collaborative language agents in groups
Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath
-
Enhancing Adversarial Transferability through Block Stretch and Shrink
Quan Liu, Feng Ye, Chenhao Lu, Shuming Zhen, Guanliang Huang, Lunzhe Chen, Xudong Ke
-
AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations
Dawid Wolkiewicz, Anastasiya Pechko, Przemysław Spurek, Piotr Syga
-
GANGR: GAN-Assisted Scalable and Efficient Global Routing Parallelization
Hadi Khodaei Jooshin, Inna Partin-Vaisband
-
Jithin Krishnan
-
Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
Shahin Zanbaghi, Ryan Rostampour, Farhan Abid, Salim Al Jarmakani
-
Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming
Strahinja Janjuesvic, Anna Baron Garcia, Sohrob Kazerounian
-
KeFan Li, Mengfei Wang, Hengzhi Zhang, Zhichao Li, Yuan Yuan, Mu Li, Xiang Gao, Hailong Sun, Chunming Hu, Weifeng Lv
-
Physically Realistic Sequence-Level Adversarial Clothing for Robust Human-Detection Evasion
Dingkun Zhou, Patrick P. K. Chan, Hengxu Wu, Shikang Zheng, Ruiqi Huang, Yuanjie Zhao
-
When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
Yuping Yan, Yuhan Xie, Yinxin Zhang, Lingjuan Lyu, Yaochu Jin
-
Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security
Wei Zhao, Zhe Li, Yige Li, Jun Sun
-
"To Survive, I Must Defect": Jailbreaking LLMs via the Game-Theory Scenarios
Zhen Sun, Zongmin Zhang, Deqi Liang, Han Sun, Yule Liu, Yun Shen, Xiangshan Gao, Yilong Yang, Shuai Liu, Yutao Yue, Xinlei He
-
Sayak Mukherjee, Samrat Chatterjee, Emilie Purvine, Ted Fujimoto, Tegan Emerson
-
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization
Huseein Jawad, Nicolas Brunel
-
Layer-wise Noise Guided Selective Wavelet Reconstruction for Robust Medical Image Segmentation
Yuting Lu, Ziliang Wang, Weixin Xu, Wei Zhang, Yongqiang Zhao, Yang Yu, Xiaohong Zhang
-
An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs
Zhi Luo, Zenghui Yuan, Wenqi Wei, Daizong Liu, Pan Zhou
-
Erase to Retain: Low Rank Adaptation Guided Selective Unlearning in Medical Segmentation Networks
Nirjhor Datta, Md. Golam Rabiul Alam
-
Loss Functions Robust to the Presence of Label Errors
Nicholas Pellegrino, David Szczecina, Paul Fieguth
-
Rate-optimal community detection near the KS threshold via node-robust algorithms
Jingqiu Ding, Yiding Hua, Kasper Lindberg, David Steurer, Aleksandr Storozhenko
-
Yijun Yang, Lichao Wang, Jianping Zhang, Chi Harold Liu, Lanqing Hong, Qiang Xu
-
Chunyang Li, Zifeng Kang, Junwei Zhang, Zhuo Ma, Anda Cheng, Xinghua Li, Jianfeng Ma
-
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
Yige Li, Zhe Li, Wei Zhao, Nay Myat Min, Hanxun Huang, Xingjun Ma, Jun Sun
-
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
Adeel Yousaf, Joseph Fioresi, James Beetham, Amrit Singh Bedi, Mubarak Shah
-
PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models
Oscar Chew, Po-Yi Lu, Jayden Lin, Kuan-Hao Huang, Hsuan-Tien Lin
-
Membership Inference Attacks Beyond Overfitting
Mona Khalil, Alberto Blanco-Justicia, Najeeb Jebreel, Josep Domingo-Ferrer
-
As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files
Haodong Li, Jingqi Zhang, Xiao Cheng, Peihua Mai, Haoyu Wang, Yang Pan
-
Effective Code Membership Inference for Code Completion Models via Adversarial Prompts
Yuan Jiang, Zehao Li, Shan Huang, Christoph Treude, Xiaohong Su, Tiantian Wang
-
Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang
-
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
Piercosma Bisconti, Matteo Prandi, Federico Pierucci, Francesco Giarrusso, Marcantonio Bracale, Marcello Galisai, Vincenzo Suriani, Olga Sorokoletova, Federico Sartore, Daniele Nardi
-
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
Linyin Luo, Yujuan Ding, Yunshan Ma, Wenqi Fan, Hanjiang Lai
-
What Your Features Reveal: Data-Efficient Black-Box Feature Inversion Attack for Split DNNs
Zhihan Ren, Lijun He, Jiaxi Liang, Xinzhu Fu, Haixia Bi, Fan Li
-
Transferable Dual-Domain Feature Importance Attack against AI-Generated Image Detector
Weiheng Zhu, Gang Cao, Jing Liu, Lifang Yu, Shaowei Weng
-
Robust Bayesian Optimisation with Unbounded Corruptions
Abdelhamid Ezzerg, Ilija Bogunovic, Jeremias Knoblauch
-
Critical Evaluation of Quantum Machine Learning for Adversarial Robustness
Saeefa Rubaiyet Nowmi, Jesus Lopez, Md Mahmudul Alam Imon, Shahrooz Pouryouse, Mohammad Saidur Rahman
-
Trustworthy GenAI over 6G: Integrated Applications and Security Frameworks
Bui Duc Son, Trinh Van Chien, Dong In Kim
-
Privacy-Preserving IoT in Connected Aircraft Cabin
Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar
-
Securing AI Agents Against Prompt Injection Attacks
Badrinath Ramakrishnan, Akshaya Balaji
-
TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models
Bhagyesh Kumar, A S Aravinthakashan, Akshat Satyanarayan, Ishaan Gakhar, Ujjwal Verma
-
Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance
Yanxuan Yu, Michael S. Hughes, Julien Lee, Jiacheng Zhou, Andrew F. Laine
-
When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers
Zhaoxin Zhang, Borui Chen, Yiming Hu, Youyang Qu, Tianqing Zhu, Longxiang Gao
-
Robust Bayesian Optimisation with Unbounded Corruptions
Abdelhamid Ezzerg, Ilija Bogunovic, Jeremias Knoblauch
-
When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling
Alessio Pellegrino, Jacopo Mauro
-
From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs
Erum Mushtaq, Anil Ramakrishna, Satyapriya Krishna, Sattvik Sahai, Prasoon Goyal, Kai-Wei Chang, Tao Zhang, Rahul Gupta
-
Yule Liu, Heyi Zhang, Jinyi Zheng, Zhen Sun, Zifan Peng, Tianshuo Cong, Yilong Yang, Xinlei He, Zhuo Ma
-
FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration
Jingren Liu, Shuning Xu, Qirui Yang, Yun Wang, Xiangyu Chen, Zhong Ji
-
Certified Signed Graph Unlearning
Junpeng Zhao, Lin Li, Kaixi Hu, Kaize Shi, Jingling Yuan
-
Kangqiao Zhao, Shuo Huai, Xurui Song, Jun Luo
-
Sigil: Server-Enforced Watermarking in U-Shaped Split Federated Learning via Gradient Injection
Zhengchunmin Dai, Jiaxiong Tang, Peng Sun, Honglong Chen, Liantao Wu
-
Abolfazl Younesi, Leon Kiss, Zahra Najafabadi Samani, Juan Aznar Poveda, Thomas Fahringer
-
Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT
Le Yu, Zhengyue Zhao, Yawen Zheng, Yunhao Liu
-
Unified Defense for Large Language Models against Jailbreak and Fine-Tuning Attacks in Education
Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang, Xiaoling Wang, Liang He
-
Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion
Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie
-
Coffee: Controllable Diffusion Fine-tuning
Ziyao Zeng, Jingcheng Ni, Ruyi Liu, Alex Wong
-
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, Lu Cheng
-
Certified but Fooled! Breaking Certified Defences with Ghost Certificates
Quoc Viet Vo, Tashreque M. Haq, Paul Montague, Tamas Abraham, Ehsan Abbasnejad, Damith C. Ranasinghe
-
Observational Auditing of Label Privacy
Iden Kalemaj, Luca Melis, Maxime Boucher, Ilya Mironov, Saeed Mahloujifar
-
N-GLARE: An Non-Generative Latent Representation-Efficient LLM Safety Evaluator
Zheyu Lin, Jirui Yang, Hengqi Guo, Yubing Bao, Yao Guan
-
Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation
Bastien Vuillod, Pierre-Alain Moellic, Jean-Max Dutertre
-
Zifan Wang, Georgios Pantazis, Sergio Grammatico, Michael M. Zavlanos, Karl H. Johansson
-
Dynamic Black-box Backdoor Attacks on IoT Sensory Data
Ajesh Koyatan Chathoth, Stephen Lee
-
Privis: Towards Content-Aware Secure Volumetric Video Delivery
Kaiyuan Hu, Hong Kang, Yili Jin, Junhua Liu, Chengming Hu, Haolun Wu, Xue Liu
-
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
Hajun Kim, Hyunsik Na, Daeseon Choi
-
SecureSign: Bridging Security and UX in Mobile Web3 through Emulated EIP-6963 Sandboxing
Charles Cheng Ji, Brandon Kong
-
Mathieu Dufour, Andrew Duncan
-
Henry Wong, Clement Fung, Weiran Lin, Karen Li, Stanley Chen, Lujo Bauer
-
Yuwen Zhang, Viet Tran, Paul Weng
-
ReflexGrad: A Dual-Process Architecture for Gradient-Free Inference-Time Learning
Ankush Kadu, Ashwanth Krishnan
-
ReflexGrad: A Dual-Process Architecture for Gradient-Free Inference-Time Learning
Ankush Kadu, Ashwanth Krishnan
-
Watch Out for the Lifespan: Evaluating Backdoor Attacks Against Federated Model Adaptation
Bastien Vuillod, Pierre-Alain Moellic, Jean-Max Dutertre
-
Jailbreaking Large Vision Language Models in Intelligent Transportation Systems
Badhan Chandra Das, Md Tasnim Jawad, Md Jueal Mia, M. Hadi Amini, Yanzhao Wu
-
Find the Leak, Fix the Split: Cluster-Based Method to Prevent Leakage in Video-Derived Datasets
Noam Glazner, Noam Tsfaty, Sharon Shalev, Avishai Weizman
-
Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color Skew
Farhin Farhad Riya, Shahinul Hoque, Jinyuan Stella Sun, Olivera Kotevska
-
The Battle of Metasurfaces: Understanding Security in Smart Radio Environments
Paul Staat, Christof Paar, Swarun Kumar
-
What Color Is It? A Text-Interference Multimodal Hallucination Benchmark
Jinkun Zhao, Lei Huang, Haixin Ge, Wenjun Wu
-
Pascal Zimmer, Ghassan Karame
-
VEIL: Jailbreaking Text-to-Video Models via Visual Exploitation from Implicit Language
Zonghao Ying, Moyang Chen, Nizhang Li, Zhiqiang Wang, Wenxin Zhang, Quanchen Zou, Zonglei Jing, Aishan Liu, Xianglong Liu
-
InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference
Ruijun Deng, Zhihui Lu, Qiang Duan
-
RobustGait: Robustness Analysis for Appearance Based Gait Recognition
Reeshoon Sayera, Akash Kumar, Sirshapan Mitra, Prudvi Kamtam, Yogesh S Rawat
-
Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
Samuel Nathanson, Rebecca Williams, Cynthia Matuszek
-
Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks
Haotian Jin, Yang Li, Haihui Fan, Lin Shen, Xiangfang Li, Bo Li
-
Falsely Accused: How AI Detectors Misjudge Slightly Polished Arabic Articles
Saleh Almohaimeed, Saad Almohaimeed, Mousa Jari, Khaled A. Alobaid, Fahad Alotaibi
-
Evaluating Autoformalization Robustness via Semantically Similar Paraphrasing
Hayden Moore, Asfahan Shah
-
Backdoor Attacks on Open Vocabulary Object Detectors via Multi-Modal Prompt Tuning
Ankita Raj, Chetan Arora
-
ToxSearch: Evolving Prompts for Toxicity Search in Large Language Models
Onkar Shelar, Travis Desell
-
Shaowei Guan, Yu Zhai, Zhengyu Zhang, Yanze Wang, Hin Chi Kwok
-
Runhao Jiang, Chengzhi Jiang, Rui Yan, Huajin Tang
-
Model Inversion Attack Against Deep Hashing
Dongdong Zhao, Qiben Xu, Ranxin Fang, Baogang Song
-
Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
Sajad U P
-
Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio
Guangke Chen, Yuhui Wang, Shouling Ji, Xiapu Luo, Ting Wang
-
GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning
Ying Song, Balaji Palanisamy
-
Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting
Nirmit Arora, Sathvik Joel, Ishan Kavathekar, Palak, Rohan Gandhi, Yash Pandya, Tanuja Ganu, Aditya Kanade, Akshay Nambi
-
Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis
Farhad Abtahi, Fernando Seoane, Iván Pau, Mario Vega-Barbas
-
HealSplit: Towards Self-Healing through Adversarial Distillation in Split Federated Learning
Yuhan Xie, Chen Lyu
-
AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models
Haokun Chen, Jianing Li, Yao Zhang, Jinhe Bi, Yan Xia, Jindong Gu, Volker Tresp
-
Shaowei Guan, Hin Chi Kwok, Ngai Fong Law, Gregor Stiglic, Vivian Hui
-
Private Frequency Estimation Via Residue Number Systems
Héber H. Arcolezi
-
LANE: Lexical Adversarial Negative Examples for Word Sense Disambiguation
Jader Martins Camboim de Sá, Jooyoung Lee, Cédric Pruski, Marcos Da Silveira
-
Redwan Hussain, Mizanur Rahman, Prithwiraj Bhattacharjee
-
Questioning the Stability of Visual Question Answering
Amir Rosenfeld, Neta Glazer, Ethan Fetaya
-
One-to-N Backdoor Attack in 3D Point Cloud via Spherical Trigger
Dongmei Shan, Wei Lian, Chongxia Wang
-
Parameter-Efficient MoE LoRA for Few-Shot Multi-Style Editing
Cong Cao, Yujie Xu, Xiaodong Xu
-
SimuFreeMark: A Noise-Simulation-Free Robust Watermarking Against Image Editing
Yichao Tang, Mingyang Li, Di Miao, Sheng Li, Zhenxing Qian, Xinpeng Zhang
-
Unsupervised Robust Domain Adaptation: Paradigm, Theory and Algorithm
Fuxiang Huang, Xiaowei Fu, Shiyu Ye, Lina Ma, Wen Li, Xinbo Gao, David Zhang, Lei Zhang
-
Adaptive Symmetrization of the KL Divergence
Omri Ben-Dov, Luiz F.O. Chamon
-
Armadillo: Robust Single-Server Secure Aggregation for Federated Learning with Input Validation
Yiping Ma, Yue Guo, Harish Karthikeyan, Antigoni Polychroniadou
-
On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing
Yunyi Ni, Ziyu Yang, Ze Niu, Emily Davis, Finn Carter
-
SEAL: Subspace-Anchored Watermarks for LLM Ownership
Yanbo Dai, Zongjie Li, Zhenlan Ji, Shuai Wang
-
Robustness of LLM-enabled vehicle trajectory prediction under data security threats
Feilong Wang, Fuqiang Liu
-
MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm
Xiao Fan, Jingyan Jiang, Zhaoru Chen, Fanding Huang, Xiao Chen, Qinting Jiang, Bowen Zhang, Xing Tang, Zhi Wang
-
Learning Fair Representations with Kolmogorov-Arnold Networks
Amisha Priyadarshini, Sergio Gago-Masague
-
Better LLM Reasoning via Dual-Play
Zhengxin Zhang, Chengyu Huang, Aochong Oliver Li, Claire Cardie
-
Le Xu, Jiayu Chen
-
NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks
Lama Sleem, Jerome Francois, Lujun Li, Nathan Foucher, Niccolo Gentile, Radu State
-
GraphToxin: Reconstructing Full Unlearned Graphs from Graph Unlearning
Ying Song, Balaji Palanisamy
-
Defending Unauthorized Model Merging via Dual-Stage Weight Protection
Wei-Jia Chen, Min-Yen Tsai, Cheng-Yi Lee, Chia-Mu Yu
-
Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging
Qinfeng Li, Miao Pan, Jintao Chen, Fu Teng, Zhiqiang Shen, Ge Su, Hao Peng, Xuhong Zhang
-
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
Shuaitong Liu, Renjue Li, Lijia Yu, Lijun Zhang, Zhiming Liu, Gaojie Jin
-
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
Runpeng Geng, Yanting Wang, Chenlong Yin, Minhao Cheng, Ying Chen, Jinyuan Jia
-
Optimal Welfare in Noncooperative Network Formation under Attack
Natan Doubez, Pascal Lenzner, Marcus Wunderlich
-
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
Yudong Yang, Xuezhen Zhang, Zhifeng Han, Siyin Wang, Jimin Zhuang, Zengrui Jin, Jing Shao, Guangzhi Sun, Chao Zhang
-
destroR: Attacking Transfer Models with Obfuscous Examples to Discard Perplexity
Saadat Rafid Ahmed, Rubayet Shareen, Radoan Sharkar, Nazia Hossain, Mansur Mahi, Farig Yousuf Sadeque
-
Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Sethu Vijayakumar, Alexandros Kouris, Oisin Mac Aodha, Chris Xiaoxuan Lu
-
DeepDefense: Layer-Wise Gradient-Feature Alignment for Building Robust Neural Networks
Ci Lin, Tet Yeap, Iluju Kiringa, Biwei Zhang
-
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
Francis Rhys Ward, Teun van der Weij, Hanna Gábor, Sam Martin, Raja Mehta Moreno, Harel Lidar, Louis Makower, Thomas Jodrell, Lauren Robson
-
Black-Box On-Policy Distillation of Large Language Models
Tianzhu Ye, Li Dong, Zewen Chi, Xun Wu, Shaohan Huang, Furu Wei
-
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks
Xuancun Lu, Jiaxiang Chen, Shilin Xiao, Zizhi Jin, Zhangrui Chen, Hanwen Yu, Bohan Qian, Ruochen Zhou, Xiaoyu Ji, Wenyuan Xu
-
Nikolaos Tsagkas, Andreas Sochopoulos, Duolikun Danier, Sethu Vijayakumar, Alexandros Kouris, Oisin Mac Aodha, Chris Xiaoxuan Lu
-
Consensus Sampling for Safer Generative AI
Adam Tauman Kalai, Yael Tauman Kalai, Or Zamir
-
Robust and Diverse Multi-Agent Learning via Rational Policy Gradient
Niklas Lauffer, Ameesh Shah, Micah Carroll, Sanjit A. Seshia, Stuart Russell, Michael Dennis
-
FAST-CAD: A Fairness-Aware Framework for Non-Contact Stroke Diagnosis
Tianming Sha, Zechuan Chen, Zhan Cheng, Haotian Zhai, Xuwei Ding, Junnan Li, Haixiang Tang, Zaoting Sun, Yanchuan Tang, Yongzhe Yi, Yanjie Huang, Anhao Li, Yuan Gao, Keze Wang
-
iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification
Zixun Xiong, Gaoyi Wu, Qingyang Yu, Mingyu Derek Ma, Lingfeng Yao, Miao Pan, Xiaojiang Du, Hao Wang
-
From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
Hanbo Cheng, Peng Wang, Kaixiang Lei, Qi Li, Zhen Zou, Pengfei Hu, Jun Du
-
Improving Sustainability of Adversarial Examples in Class-Incremental Learning
Taifeng Liu, Xinjing Liu, Liangqiu Dong, Yang Liu, Yilong Yang, Zhuo Ma
-
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
Shigeki Kusaka, Keita Saito, Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto
-
Differentially Private Rankings via Outranking Methods and Performance Data Aggregation
Luis Del Vasto-Terrientes
-
Jian Wang, Hong Shen, Chan-Tong Lam
-
GuardFed: A Trustworthy Federated Learning Framework Against Dual-Facet Attacks
Yanli Li, Yanan Zhou, Zhongliang Guo, Nan Yang, Yuning Zhang, Huaming Chen, Dong Yuan, Weiping Ding, Witold Pedrycz
-
Jiajie Su, Zihan Nan, Yunshan Ma, Xiaobo Xia, Xiaohua Feng, Weiming Liu, Xiaolin Zheng, Chaochao Chen
-
Spatio-Temporal Graph Unlearning
Qiming Guo, Wenbo Sun, Wenlu Wang
-
AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness
Zhuoqun Huang, Neil G. Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein
-
Boosting Adversarial Transferability via Ensemble Non-Attention
Yipeng Zou, Qin Liu, Jie Wu, Yu Peng, Guo Chen, Hui Zhou, Guanghui Ye
-
Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference
Chengze Jiang, Minjing Dong, Xinli Shi, Jie Gui
-
DBINDS - Can Initial Noise from Diffusion Model Inversion Help Reveal AI-Generated Videos?
Yanlin Wu, Xiaogang Yuan, Dezhi An
-
Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation
Kazuki Iwahana, Yusuke Yamasaki, Akira Ito, Takayuki Miura, Toshiki Shibahara
-
Fairness-Aware Few-Shot Learning for Audio-Visual Stress Detection
Anushka Sanjay Shelke, Aditya Sneh, Arya Adyasha, Haroon R. Lone
-
Philipp Dingfelder, Christian Riess
-
Philip Sosnin, Matthew Wicker, Josh Collyer, Calvin Tsay
-
DeepTracer: Tracing Stolen Model via Deep Coupled Watermarks
Yunfei Yang, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao, Xin Zhao, He Li
-
Adversarially and Distributionally Robust Virtual Energy Storage Systems via the Scenario Approach
Georgios Pantazis, Nicola Mignoni, Raffaele Carli, Mariagrazia Dotoli, Sergio Grammatico
-
Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models
Zijian Chen, Wenjun Zhang, Guangtao Zhai
-
Transferable Hypergraph Attack via Injecting Nodes into Pivotal Hyperedges
Meixia He, Peican Zhu, Le Cheng, Yangming Guo, Manman Yuan, Keke Tang
-
Is nasty noise actually harder than malicious noise?
Guy Blanc, Yizhi Huang, Tal Malkin, Rocco A. Servedio
-
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
Romain Cosentino, Sarath Shekkizhar, Adam Earle
-
3D Guard-Layer: An Integrated Agentic AI Safety System for Edge Artificial Intelligence
Eren Kurshan, Yuan Xie, Paul Franzon
-
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
Joseph Fioresi, Ishan Rajendrakumar Dave, Mubarak Shah
-
Jingjie He, Weijie Liang, Zihan Shan, Matthew Caesar
-
Automated Hardware Trojan Insertion in Industrial-Scale Designs
Yaroslav Popryho, Debjit Pal, Inna Partin-Vaisband
-
Jian Wang, Lijun He, Yixing Yong, Haixia Bi, Fan Li
-
A methodological analysis of prompt perturbations and their effect on attack success rates
Tiago Machado, Maysa Malfiza Garcia de Macedo, Rogerio Abreu de Paula, Marcelo Carpinette Grave, Aminat Adebiyi, Luan Soares de Souza, Enrico Santarelli, Claudio Pinhanez
-
SALT: Steering Activations towards Leakage-free Thinking in Chain of Thought
Shourya Batra, Pierce Tillman, Samarth Gaggar, Shashank Kesineni, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Vasu Sharma, Maheep Chaudhary
-
Alignment-Aware Quantization for LLM Safety
Sunghyun Wee, Suyoung Kim, Hyeonjin Kim, Kyomin Hwang, Nojun Kwak
-
Chenhao Dang, Jing Ma
-
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Chloe Li, Mary Phuong, Daniel Tan
-
A Theoretical Analysis of Detecting Large Model-Generated Time Series
Junji Hou, Junzhou Zhao, Shuo Zhang, Pinghui Wang
-
Liang Shan, Kaicheng Shen, Wen Wu, Zhenyu Ying, Chaochao Lu, Guangze Ye, Liang He
-
Chun-Ming Huang, Li-Heng Chang, I-Hsin Chang, An-Sheng Lee, Hao Kuo-Chen
-
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
Kunhao Li, Wenhao Li, Di Wu, Lei Yang, Jun Bai, Ju Jia, Jason Xue
-
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
Peng Zhang, peijie sun
-
FoCLIP: A Feature-Space Misalignment Framework for CLIP-Based Image Manipulation and Detection
Yulin Chen, Zeyuan Wang, Tianyuan Yu, Yingmei Wei, Liang Bai
-
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Zhisheng Zhang, Derui Wang, Yifan Mi, Zhiyong Wu, Jie Gao, Yuxin Cao, Kai Ye, Minhui Xue, Jie Hao
-
More Agents Helps but Adversarial Robustness Gap Persists
Khashayar Alavi, Zhastay Yeltay, Lucie Flek, Akbar Karimi
-
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
Sayambhu Sen, Shalabh Bhatnagar
-
Verifying rich robustness properties for neural networks
Mohammad Afzal, S. Akshay, Ashutosh Gupta
-
LoReTTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs
Himanshu Pal, Venkata Sai Pranav Bachina, Ankit Gangwal, Charu Sharma
-
Language Generation with Infinite Contamination
Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou
-
Ruiheng Liu, XiaoBing Chen, Jinyu Zhang, Qiongwen Zhang, Yu Zhang, Bailong Yang
-
Yilin Jiang, Mingzi Zhang, Xuanyu Yin, Sheng Jin, Suyu Lu, Zuocan Ying, Zengyi Yu, Xiangjie Kong
-
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai, Xingjian Jiang, Zizhuang Deng
-
SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
Zhenliang Zhang, Xinyu Hu, Xiaojun Wan
-
Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Ozgur B. Akan
-
Certified L2-Norm Robustness of 3D Point Cloud Recognition in the Frequency Domain
Liang Zhou, Qiming Wang, Tianze Chen
-
3D-ANC: Adaptive Neural Collapse for Robust 3D Point Cloud Recognition
Yuanmin Huang, Wenxuan Li, Mi Zhang, Xiaohan Zhang, Xiaoyu You, Min Yang
-
From Pretrain to Pain: Adversarial Vulnerability of Video Foundation Models Without Task Knowledge
Hui Lu, Yi Yu, Song Xia, Yiming Yang, Deepu Rajan, Boon Poh Ng, Alex Kot, Xudong Jiang
-
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation
Yuxuan Zhou, Tao Yu, Wen Huang, Yuheng Zhang, Tao Dai, Shu-Tao Xia
-
Breaking the Stealth-Potency Trade-off in Clean-Image Backdoors with Generative Trigger Optimization
Binyan Xu, Fan Yang, Di Tang, Xilin Dai, Kehuan Zhang
-
PlanT 2.0: Exposing Biases and Structural Flaws in Closed-Loop Driving
Simon Gerstenecker, Andreas Geiger, Katrin Renz
-
Non-Rival Data as Rival Products: An Encapsulation-Forging Approach for Data Synthesis
Kaidong Wang, Jiale Li, Shao-Bo Lin, Yao Wang
-
Beyond Uniform Deletion: A Data Value-Weighted Framework for Certified Machine Unlearning
Lisong He, Yi Yang, Xiangyu Chang
-
Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations
Guang Yang, Lixia Luo, Qiongxiu Li
-
On Stealing Graph Neural Network Models
Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pręgowska, Tomasz P. Michalak
-
A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube
Gautam Chandrasekaran, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
-
Adam Piaseczny, Eric Ruzomberka, Rohit Parasnis, Christopher G. Brinton
-
Uncovering Pretraining Code in LLMs: A Syntax-Aware Attribution Approach
Yuanheng Li, Zhuoyang Chen, Xiaoyun Liu, Yuhao Wang, Mingwei Liu, Yang Shi, Kaifeng Huang, Shengjie Zhao
-
Privacy on the Fly: A Predictive Adversarial Transformation Network for Mobile Sensor Data
Tianle Song, Chenhao Lin, Yang Cao, Zhengyu Zhao, Jiahao Sun, Chong Zhang, Le Yang, Chao Shen
-
JPRO: Automated Multimodal Jailbreaking via Multi-Agent Collaboration Framework
Yuxuan Zhou, Yang Bai, Kuofeng Gao, Tao Dai, Shu-Tao Xia
-
Private Sketches for Linear Regression
Shrutimoy Das, Debanuj Nayak, Anirban Dasgupta
-
Shuangqing Xu, Yifeng Zheng, Zhongyun Hua
-
Qiang Wang, Liying Yang, Jiayun Song, Yifan Bai, Jingtao Du
-
Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models
Asia Belfiore, Jonathan Passerat-Palmbach, Dmitrii Usynin
-
How Generative AI Empowers Attackers and Defenders Across the Trust & Safety Landscape
Patrick Gage Kelley, Steven Rousso-Schindler, Renee Shelby, Kurt Thomas, Allison Woodruff
-
Formal Reasoning About Confidence and Automated Verification of Neural Networks
Mohammad Afzal, S. Akshay, Blaise Genest, Ashutosh Gupta
-
Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents
Hanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Sai Zou, Ozgur B. Akan
-
Efficient LLM Safety Evaluation through Multi-Agent Debate
Dachuan Lin, Guobin Shen, Zihao Yang, Tianrong Liu, Dongcheng Zhao, Yi Zeng
-
Dilli Prasad Sharma, Liang Xue, Xiaowei Sun, Xiaodong Lin, Pulei Xiong
-
RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework
Seif Ikbarieh, Kshitiz Aryal, Maanak Gupta
-
Mojtaba Noghabaei
-
Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization
Rathin Chandra Shit, Sharmila Subudhi
-
EchoMark: Perceptual Acoustic Environment Transfer with Watermark-Embedded Room Impulse Response
Chenpei Huang, Lingfeng Yao, Kyu In Lee, Lan Emily Zhang, Xun Chen, Miao Pan
-
TriShGAN: Enhancing Sparsity and Robustness in Multivariate Time Series Counterfactuals Explanation
Hongnan Ma, Yiwei Shi, Guanxiong Sun, Mengyue Yang, Weiru Liu
-
Robust Nearest Neighbour Retrieval Using Targeted Manifold Manipulation
B. Ghosh, H. Harikumar, S. Rana
-
Probably Approximately Global Robustness Certification
Peter Blohm, Patrick Indri, Thomas Gärtner, Sagar Malhotra
-
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi, Guoli Wang, Tu Ouyang, An Wang
-
When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins
Yigitcan Kaya, Anton Landerer, Stijn Pletinckx, Michelle Zimmermann, Christopher Kruegel, Giovanni Vigna
-
CGCE: Classifier-Guided Concept Erasure in Generative Models
Viet Nguyen, Vishal M. Patel
-
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Alina Fastowski, Bardh Prenkaj, Yuxiao Li, Gjergji Kasneci
-
Runtime Safety Monitoring of Deep Neural Networks for Perception: A Survey
Albert Schotschneider, Svetlana Pavlitska, J. Marius Zöllner
-
A Privacy-Preserving Federated Learning Method with Homomorphic Encryption in Omics Data
Yusaku Negoya, Feifei Cui, Zilong Zhang, Miao Pan, Tomoaki Ohtsuki, Aohan Li
-
MCP-RiskCue: Can LLM infer risk information from MCP server System Logs?
Jiayi Fu, Qiyao Sun
-
Identity Card Presentation Attack Detection: A Systematic Review
Esteban M. Ruiz, Juan E. Tapia, Reinel T. Soto, Christoph Busch
-
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
-
CatBack: Universal Backdoor Attacks on Tabular Data via Categorical Encoding
Behrad Tajalli, Stefanos Koffas, Stjepan Picek
-
Enhancing Robustness of Graph Neural Networks through p-Laplacian
Anuj Kumar Sirohi, Subhanu Halder, Kabir Kumar, Sandeep Kumar
-
IndirectAD: Practical Data Poisoning Attacks against Recommender Systems for Item Promotion
Zihao Wang, Tianhao Mao, XiaoFeng Wang, Di Tang, Xiaozhong Liu
-
Perturbation-mitigated USV Navigation with Distributionally Robust Reinforcement Learning
Zhaofan Zhang, Minghao Yang, Sihong Xie, Hui Xiong
-
Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation
Yinjie Cheng, Paul Youssef, Christin Seifert, Jörg Schlötterer, Zhixue Zhao
-
Tharindu Fernando, Clinton Fookes, Sridha Sridharan
-
Learning Fourier shapes to probe the geometric world of deep neural networks
Jian Wang, Yixing Yong, Haixia Bi, Lijun He, Fan Li
-
Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
Prasoon Varshney, Makesh Narsimhan Sreedhar, Liwei Jiang, Traian Rebedea, Christopher Parisien
-
Deep learning models are vulnerable, but adversarial examples are even more vulnerable
Jun Li, Yanwei Xu, Keran Li, Xiaoli Zhang
-
TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems
Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, Tanuja Ganu
-
Yiting He, Zhishuai Liu, Weixin Wang, Pan Xu
-
Steering Language Models with Weight Arithmetic
Constanza Fierro, Fabien Roger
-
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi
-
Quantifying the Risk of Transferred Black Box Attacks
Disesdi Susanna Cox, Niklas Bunzel
-
Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding
Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen, Soumyadeep Pal, Sijia Liu, Mingyi Hong
-
Associative Poisoning to Generative Machine Learning
Mathias Lundteigen Mohus, Jingyue Li, Zhirong Yang
-
Marius Fracarolli, Michael Staniek, Stefan Riezler
-
Janet Jenq, Hongda Shen
-
Adversarially Robust Multitask Adaptive Control
Kasra Fallah, Leonardo F. Toso, James Anderson
-
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy
-
TRICK: Time and Range Integrity ChecK using Low Earth Orbiting Satellite for Securing GNSS
Arslan Mumtaz, Mridula Singh
-
A Secured Intent-Based Networking (sIBN) with Data-Driven Time-Aware Intrusion Detection
Urslla Uchechi Izuazu, Mounir Bensalem, Admela Jukan
-
VMDT: Decoding the Trustworthiness of Video Foundation Models
Yujin Potter, Zhun Wang, Nicholas Crispino, Kyle Montgomery, Alexander Xiong, Ethan Y. Chang, Francesco Pinto, Yuqi Chen, Rahul Gupta, Morteza Ziyadi, Christos Christodoulopoulos, Bo Li, Chenguang Wang, Dawn Song
-
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal
-
Distributionally Robust Multimodal Machine Learning
Peilin Yang, Yu Ma
-
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy
-
AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research
Tim Beyer, Jonas Dornbusch, Jakob Steimle, Moritz Ladenburger, Leo Schwinn, Stephan Günnemann
-
DeNoise: Learning Robust Graph Representations for Unsupervised Graph-Level Anomaly Detection
Qingfeng Chen, Haojin Zeng, Jingyi Jie, Shichao Zhang, Debo Cheng
-
On the Brittleness of CLIP Text Encoders
Allie Tran, Luca Rossetto
-
Differentially Private In-Context Learning with Nearest Neighbor Search
Antti Koskela, Tejas Kulkarni, Laith Zumot
-
RIDE: Difficulty Evolving Perturbation with Item Response Theory for Mathematical Reasoning
Xinyuan Li, Murong Xu, Wenbiao Tao, Hanlun Zhu, Yike Zhao, Jipeng Zhang, Yunshi Lan
-
Black-Box Guardrail Reverse-engineering Attack
Hongwei Yao, Yun Xia, Shuo Shao, Haoran Shi, Tong Qiao, Cong Wang
-
PrivacyCD: Hierarchical Unlearning for Protecting Student Privacy in Cognitive Diagnosis
Mingliang Hou, Yinuo Wang, Teng Guo, Zitao Liu, Wenzhou Dou, Jiaqi Zheng, Renqiang Luo, Mi Tian, Weiqi Luo
-
A Parallel Region-Adaptive Differential Privacy Framework for Image Pixelization
Ming Liu
-
Adversarially Robust and Interpretable Magecart Malware Detection
Pedro Pereira, José Gouveia, João Vitorino, Eva Maia, Isabel Praça
-
P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models
Mingliang Hou, Yinuo Wang, Teng Guo, Zitao Liu, Wenzhou Dou, Jiaqi Zheng, Renqiang Luo, Mi Tian, Weiqi Luo
-
Prompt-Based Safety Guidance Is Ineffective for Unlearned Text-to-Image Diffusion Models
Jiwoo Shin, Byeonghu Na, Mina Kang, Wonhyeok Choi, Il-chul Moon
-
Security Evaluation of Quantum Circuit Split Compilation under an Oracle-Guided Attack
Hongyu Zhang, Yuntao Liu
-
Fooling Algorithms in Non-Stationary Bandits using Belief Inertia
Gal Mendelson, Eyal Tadmor
-
Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models
Cristina Pinneri, Christos Louizos
-
Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
Gahyeon Kim, Sohee Kim, Seokju Lee
-
Whisper Leak: a side-channel attack on Large Language Models
Geoff McDonald, Jonathan Bar Or
-
From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
-
Yize Liu, Yunyun Hou, Aina Sui
-
A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential
Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran
-
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson
-
Death by a Thousand Prompts: Open Model Vulnerability Analysis
Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
-
Bayesian Advantage of Re-Identification Attack in the Shuffle Model
Pengcheng Su, Haibo Cheng, Ping Wang
-
Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework
Junhao Li, Jiahao Chen, Zhou Feng, Chunyi Zhou
-
Desert Waste Detection and Classification Using Data-Based and Model-Based Enhanced YOLOv12 DL Model
Abdulmumin Sa'ad, Sulaimon Oyeniyi Adebayo, Abdul Jabbar Siddiqui
-
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation
Jaden Park, Mu Cai, Feng Yao, Jingbo Shang, Soochahn Lee, Yong Jae Lee
-
Rishi Rajesh Shah, Chen Henry Wu, Shashwat Saxena, Ziqian Zhong, Alexander Robey, Aditi Raghunathan
-
SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking
Wenyuan Yang, Yichen Sun, Changzheng Chen, Zhixuan Chu, Jiaheng Zhang, Yiming Li, Dacheng Tao
-
Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks
Wenkai Fu, Finn Carter, Yue Wang, Emily Davis, Bo Zhang
-
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning
Chenyu Zhang, Minsol Kim, Shohreh Ghorbani, Jingyao Wu, Rosalind Picard, Patricia Maes, Paul Pu Liang
-
Optimizing AI Agent Attacks With Synthetic Data
Chloe Loughridge, Paul Colognese, Avery Griffin, Tyler Tracy, Jon Kutasov, Joe Benton
-
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Aashray Reddy, Andrew Zagula, Nicholas Saban
-
On The Dangers of Poisoned LLMs In Security Automation
Patrick Karlsen, Even Eilertsen
-
Ferhat Ozgur Catak, Jungwon Seo, Umit Cali
-
LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
-
Hao Li, Daiwei Lu, Jesse d'Almeida, Dilara Isik, Ehsan Khodapanah Aghdam, Nick DiSanto, Ayberk Acar, Susheela Sharma, Jie Ying Wu, Robert J. Webster III, Ipek Oguz
-
Robust Face Liveness Detection for Biometric Authentication using Single Image
Poulami Raha, Yeongnam Chae
-
A Non-Adversarial Approach to Idempotent Generative Modelling
Mohammed Al-Jaff, Giovanni Luca Marchetti, Michael C Welle, Jens Lundell, Mats G. Gustafsson, Gustav Eje Henter, Hossein Azizpour, Danica Kragic
-
Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries
Lihan Xu, Yanjie Dong, Gang Wang, Runhao Zeng, Xiaoyi Fan, Xiping Hu
-
Enhancing Federated Learning Privacy with QUBO
Andras Ferenczi, Sutapa Samanta, Dagen Wang, Todd Hodges
-
Nicolas Riccieri Gardin Assumpcao, Leandro Villas
-
PrivGNN: High-Performance Secure Inference for Cryptographic Graph Neural Networks
Fuyi Wang, Zekai Chen, Mingyuan Fan, Jianying Zhou, Lei Pan, Leo Yu Zhang
-
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
-
Verifying LLM Inference to Prevent Model Weight Exfiltration
Roy Rinberg, Adam Karvonen, Alex Hoover, Daniel Reuter, Keri Warr
-
Evaluating Control Protocols for Untrusted AI Agents
Jon Kutasov, Chloe Loughridge, Yuqi Sun, Henry Sleight, Buck Shlegeris, Tyler Tracy, Joe Benton
-
W.K.M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid
-
Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
Fatemeh Ghaffari, Siddarth Sitaraman, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili
-
PrivyWave: Privacy-Aware Wireless Sensing of Heartbeat
Yixuan Gao, Tanvir Ahmed, Zekun Chang, Thijs Roumen, Rajalakshmi Nandakumar
-
Aheer Sravon, Devdyuti Mazumder, Md. Ibrahim
-
Bayesian Evaluation of Large Language Model Behavior
Rachel Longjohn, Shang Wu, Saatvik Kher, Catarina Belém, Padhraic Smyth
-
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi
-
RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients
Duc A. Tran, Dung Truong, Duy Le
-
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Yifan Zhou, Tianshi Xu, Jue Hong, Ye Wu, Meng Li
-
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
Daniyal Ganiuly, Assel Smaiyl
-
Probabilistic Robustness for Free? Revisiting Training via a Benchmark
Yi Zhang, Zheng Wang, Zhen Chen, Wenjie Ruan, Qing Guo, Siddartha Khastgir, Carsten Maple, Xingyu Zhao
-
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Hongyin Zhang, Shuo Zhang, Junxi Jin, Qixin Zeng, Runze Li, Donglin Wang
-
Quantum Information Ordering and Differential Privacy
Naqueeb Ahmad Warsi, Ayanava Dasgupta, Masahito Hayashi
-
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
Abhiram Kusumba, Maitreya Patel, Kyle Min, Changhoon Kim, Chitta Baral, Yezhou Yang
-
Runyu Lu, Peng Zhang, Ruochuan Shi, Yuanheng Zhu, Dongbin Zhao, Yang Liu, Dong Wang, Cesare Alippi
-
T-MLA: A targeted multiscale log-exponential attack framework for neural image compression
Nikolay I. Kalmykov, Razan Dibo, Kaiyu Shen, Xu Zhonghan, Anh-Huy Phan, Yipeng Liu, Ivan Oseledets
-
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
Berk Atil, Rebecca J. Passonneau, Fred Morstatter
-
Ruofan Liu, Yun Lin, Zhiyong Huang, Jin Song Dong
-
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang
-
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
-
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li
-
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Dacheng Tao
-
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Lei Li, Chun Yuan, Dacheng Tao
-
Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models
Jiasen Zheng, Huajun Zhang, Xu Yan, Ran Hao, Chong Peng
-
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang, Song Fang
-
Alik Pramanick, Mayank Bansal, Utkarsh Srivastava, Suklav Ghosh, Arijit Sur
-
C-LEAD: Contrastive Learning for Enhanced Adversarial Defense
Suklav Ghosh, Sonal Kumar, Arijit Sur
-
Rethinking Robust Adversarial Concept Erasure in Diffusion Models
Qinghong Yin, Yu Tian, Yue Zhang
-
A Hybrid Deep Learning and Forensic Approach for Robust Deepfake Detection
Sales Aribe Jr
-
Samarup Bhattacharya, Anubhab Bhattacharya, Abir Chakraborty
-
Chenghao Du, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Yue Xiao
-
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents
Kathrin Grosse, Nico Ebert
-
Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh
-
Self-HarmLLM: Can Large Language Model Harm Itself?
Heehwan Kim, Sungjune Park, Daeseon Choi
-
BEAT: Visual Backdoor Attacks on VLM-based Embodied Agents via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang
-
Can SAEs reveal and mitigate racial biases of LLMs in healthcare?
Hiba Ahsan, Byron C. Wallace
-
Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
-
The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
William Overman, Mohsen Bayati
-
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied
-
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
Jiali Cheng, Chirag Agarwal, Hadi Amiri
-
Security Risk of Misalignment between Text and Image in Multi-modal Model
Xiaosen Wang, Zhijin Ge, Shaokang Wang
-
SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification
Yingjia Wang, Ting Qiao, Xing Liu, Chongzuo Li, Sixing Wu, Jianbin Li
-
Robust Graph Condensation via Classification Complexity Mitigation
Jiayi Luo, Qingyun Sun, Beining Yang, Haonan Yuan, Xingcheng Fu, Yanbiao Ma, Jianxin Li, Philip S. Yu
-
Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
Ruilin Tong, Haodong Lu, Yuhang Liu, Dong Gong
-
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
-
On Measuring Localization of Shortcuts in Deep Networks
Nikita Tsoy, Nikola Konstantinov
-
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
-
PEEL: A Poisoning-Exposing Encoding Theoretical Framework for Local Differential Privacy
Lisha Shuai, Jiuling Dong, Nan Zhang, Shaofeng Tan, Haokun Zhang, Zilong Song, Gaoya Dong, Xiaolong Yang
-
A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication
Weixuan Chen, Qianqian Yang
-
Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility
Kangkang Sun, Jun Wu, Minyi Guo, Jianhua Li, Jianwei Huang
-
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token
Shaked Zychlinski, Yuval Kainan
-
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Zhichao Hou, Weizhi Gao, Xiaorui Liu
-
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
Dominik Schwarz
-
Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services
Jayden Serenari, Stephen Lee
-
PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT
Rochak Dhakal, Chen Zhao, Zixin Shi, Joyce H. Keyak, Tadashi S. Kaneko, Kuan-Jui Su, Hui Shen, Hong-Wen Deng, Weihua Zhou
-
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
-
Reasoning Up the Instruction Ladder for Controllable Language Models
Zishuo Zheng, Vidhisha Balachandran, Chan Young Park, Faeze Brahman, Sachin Kumar
-
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren, Mark Dras, Usman Naseem
-
Lipschitz-aware Linearity Grafting for Certified Robustness
Yongjin Han, Suhyun Kim
-
Hasan Akgul, Mari Eplik, Javier Rojas, Aina Binti Abdullah, Pieter van der Merwe
-
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li
-
A Unified Bilevel Model for Adversarial Learning and A Case Study
Yutong Zheng, Qingna Li
-
On the Stability of Neural Networks in Deep Learning
Blaise Delattre
-
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
Phuc Tran, Nisheeth K. Vishnoi, Van H. Vu
-
Model Inversion Attacks Meet Cryptographic Fuzzy Extractors
Mallika Prabhakar, Louise Xu, Prateek Saxena
-
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
Zheng Zhang, Guanlong Wu, Sen Deng, Shuai Wang, Yinqian Zhang
-
Emily Herron, Junqi Yin, Feiyi Wang
-
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Soufiane Essahli, Oussama Sarsar, Imane Fouad, Anas Motii, Ahmed Bentajer
-
Robust GNN Watermarking via Implicit Perception of Topological Invariants
Jipeng Li, Yannning Shen
-
Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers
Quanliang Jing, Xinxin Fan, Yanyan Liu, Jingping Bi
-
Simon Yu, Peilin Yu, Hongbo Zheng, Huajie Shao, Han Zhao, Lui Sha
-
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning
Svetlana Churina, Niranjan Chebrolu, Kokil Jaidka
-
Guangzhi Su, Shuchang Huang, Yutong Ke, Zhuohang Liu, Long Qian, Kaizhu Huang
-
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning
Svetlana Churina, Niranjan Chebrolu, Kokil Jaidka
-
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong
-
Causal-Aware Generative Adversarial Networks with Reinforcement Learning
Tu Anh Hoang Nguyen, Dang Nguyen, Tri-Nhan Vo, Thuc Duy Le, Sunil Gupta
-
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
Yujun Kim, Chaewon Moon, Chulhee Yun
-
Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro
-
Relative Scaling Laws for LLMs
William Held, David Hall, Percy Liang, Diyi Yang
-
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu, Chuanyang Jin, Seungone Kim, Weizhe Yuan, Wenting Zhao, Ilia Kulikov, Xian Li, Sainbayar Sukhbaatar, Jack Lanchantin, Jason Weston
-
Heethanjan Kanagalingam, Thenukan Pathmanathan, Mokeeshan Vathanakumar, Tharmakulasingam Mukunthan
-
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen, Lin Wang, Xiaojun Jia, Zheng Lin, Weiping Wang
-
Enhancing CLIP Robustness via Cross-Modality Alignment
Xingyu Zhu, Beier Zhu, Shuo Wang, Kesen Zhao, Hanwang Zhang
-
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song, Zijing Li, Shengshan Hu, Leo Yu Zhang, Dezhong Yao, Long Zheng, Hai Jin
-
A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries
Xin Zhang, Yuqi Song, Fei Zuo
-
A Pragmatic Way to Measure Chain-of-Thought Monitorability
Scott Emmons, Roland S. Zimmermann, David K. Elson, Rohin Shah
-
Mitigating Negative Transfer via Reducing Environmental Disagreement
Hui Sun, Zheng Xie, Hao-Yuan He, Ming Li
-
SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning
Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader, Martin Vechev
-
PRIVET: Privacy Metric Based on Extreme Value Theory
Antoine Szatkownik, Aurélien Decelle, Beatriz Seoane, Nicolas Bereux, Léo Planche, Guillaume Charpiat, Burak Yelmen, Flora Jay, Cyril Furtlehner
-
A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport
Yuanyuan Wu, Zhenlin Qin, Zhenliang Ma
-
A Novel XAI-Enhanced Quantum Adversarial Networks for Velocity Dispersion Modeling in MaNGA Galaxies
Sathwik Narkedimilli, N V Saran Kumar, Aswath Babu H, Manjunath K Vanahalli, Manish M, Vinija Jain, Aman Chadha
-
Self-Concordant Perturbations for Linear Bandits
Lucas Lévy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini
-
Vishal Halder, Alexandre Reiffers-Masson, Abdeldjalil Aïssa-El-Bey, Gugan Thoppe
-
Attack on a PUF-based Secure Binary Neural Network
Bijeet Basak, Nupur Patil, Kurian Polachan, Srinivas Vivek
-
Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone, Luis Javier Navarrete-Lozano, Cristóbal R. J. Veas Chavez, Maite del Mundo de Torres
-
Yan Meng, Jiachun Li, Matthew Pillari, Arjun Deopujari, Liam Brennan, Hafsah Shamsie, Haojin Zhu, Yuan Tian
-
Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging
Banafsheh Saber Latibari, Najmeh Nazari, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis
-
Najmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini, Fatemeh Movafagh, Chongzhou Fang, Hosein Mohammadi Makrani, Kevin Immanuel Gubbi, Abhijit Mahalanobis, Setareh Rafatirad, Hossein Sayadi, Houman Homayoun
-
Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases
Ziyao Cui, Minxing Zhang, Jian Pei
-
scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye
-
Secure Retrieval-Augmented Generation against Poisoning Attacks
Zirui Cheng, Jikai Sun, Anjun Gao, Yueyang Quan, Zhuqing Liu, Xiaohua Hu, Minghong Fang
-
Vishal Halder, Alexandre Reiffers-Masson, Abdeldjalil Aïssa-El-Bey, Gugan Thoppe
-
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, Nils Jansen
-
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, Prasant Mohapatra
-
MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers
Bin Wang, Zexin Liu, Hao Yu, Ao Yang, Yenan Huang, Jing Guo, Huangsheng Cheng, Hui Li, Huiyu Wu
-
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
-
Aryan Mathur, Asaduddin Ahmed, Pushti Amit Vasoya, Simeon Kandan Sonar, Yasir Z, Madesh Kuppusamy
-
Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments
Miguel Fernandez-de-Retana, Unai Zulaika, Rubén Sánchez-Corcuera, Aitor Almeida
-
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
Gokul Ganesan
-
Hao Liang, Haifeng Wen, Kaishun Wu, Hong Xing
-
CompressionAttack: Exploiting Prompt Compression as a New Attack Surface in LLM-Powered Agents
Zesen Liu, Zhixiang Zhang, Yuchong Xie, Dongdong She
-
Retracing the Past: LLMs Emit Training Data When They Get Lost
Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen, Charles Fleming, Ming Jin, Ruoxi Jia
-
Quantifying Document Impact in RAG-LLMs
Armin Gerami, Kazem Faghih, Ramani Duraiswami
-
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Anshuman Chhabra, Shrestha Datta, Shahriar Kabir Nahin, Prasant Mohapatra
-
SpoofTrackBench: Interpretable AI for Spoof-Aware UAV Tracking and Benchmarking
Van Le, Tan Le
-
OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models
Hao Zheng, Zirui Pang, Ling li, Zhijie Deng, Yuhan Pu, Zhaowei Zhu, Xiaobo Xia, Jiaheng Wei
-
Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents
Julia Bazinska, Max Mathys, Francesco Casucci, Mateo Rojas-Carulla, Xander Davies, Alexandra Souly, Niklas Pfister
-
T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model
Chenyu Zhang, Tairen Zhang, Lanjun Wang, Ruidong Chen, Wenhui Li, Anan Liu
-
Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao
-
Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
-
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Stephen Zhao, Aidan Li, Rob Brekelmans, Roger Grosse
-
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
Alec Helbling, Shruti Palaskar, Kundan Krishna, Polo Chau, Leon Gatys, Joseph Yitan Cheng
-
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
Jiaqi Xue, Mayank Kumar, Yuzhang Shang, Shangqian Gao, Rui Ning, Mengxin Zheng, Xiaoqian Jiang, Qian Lou
-
How Hard is it to Confuse a World Model?
Waris Radji, Odalric-Ambrym Maillard
-
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling
Andrea Bonfanti, Ismael Medina, Roman List, Björn Staeves, Roberto Santana, Marco Ellero
-
Probe-based Fine-tuning for Reducing Toxicity
Jan Wehner, Mario Fritz
-
FrameShield: Adversarially Robust Video Anomaly Detection
Mojtaba Nafez, Mobina Poulaei, Nikan Vasei, Bardia Soltani Moakhar, Mohammad Sabokrou, MohammadHossein Rohban
-
Soft Instruction De-escalation Defense
Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes, David Stutz, Ilia Shumailov
-
Doubly-Regressing Approach for Subgroup Fairness
Kyungseon Lee, Kunwoong Kim, Jihu Lee, Dongyoon Yang, Yongdai Kim
-
Jie Zhang, Xiaohong Li, Mengke Zhang, Ruitao Feng, Shanshan Xu, Zhe Hou, Guangdong Bai
-
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
-
The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam
-
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Xingwei Zhong, Kar Wai Fok, Vrizlynn L.L. Thing
-
Spatio-Temporal Attention Network for Epileptic Seizure Prediction
Zan Li, Kyongmin Yeo, Wesley Gifford, Lara Marcuse, Madeline Fields, Bülent Yener
-
The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam
-
Adversarial Déjà Vu: Jailbreak Dictionary Learning for Stronger Generalization to Unseen Attacks
Mahavir Dabas, Tran Huynh, Nikhil Reddy Billa, Jiachen T. Wang, Peng Gao, Charith Peris, Yao Ma, Rahul Gupta, Ming Jin, Prateek Mittal, Ruoxi Jia
-
SAID: Empowering Large Language Models with Self-Activating Internal Defense
Yulong Chen, Yadong Liu, Jiawen Zhang, Mu Li, Chao Huang, Jie Wen
-
Wu Yichao, Wang Yirui, Ding Panpan, Wang Hailong, Zhu Bingqian, Liu Chun
-
Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
-
Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
-
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda
-
AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN
Wei Shao, Yuhao Wang, Rongguang He, Muhammad Ejaz Ahmed, Seyit Camtepe
-
RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines
Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu
-
Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß
-
BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
Liang Ye, Shengqin Chen, Jiazhu Dai
-
Causal Debiasing for Visual Commonsense Reasoning
Jiayi Zou, Gengyun Jia, Bing-Kun Bao
-
Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking
Zixuan Wu, Hengyuan Zhang, Ting-Hsuan Chen, Yuliang Guo, David Paz, Xinyu Huang, Liu Ren
-
MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
Jan Sobotka, Luca Baroni, Ján Antolík
-
H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition
Lukas Miklautz, Chengzhi Shi, Andrii Shkabrii, Theodoros Thirimachos Davarakis, Prudence Lam, Claudia Plant, Jennifer Dy, Stratis Ioannidis
-
Adversary-Aware Private Inference over Wireless Channels
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith, H. Vincent Poor
-
Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
-
HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge
Yu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung
-
NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry
Daniel Gilkarov, Ran Dubin
-
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing
Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Mahmoud Nabil Mahmoud, Parham Kebria, Abdollah Homaifar, Mehrdad Saif
-
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference
Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz
-
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
Antônio H. Ribeiro, David Vävinggren, Dave Zachariah, Thomas B. Schön, Francis Bach
-
Can Current Detectors Catch Face-to-Voice Deepfake Attacks?
Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore, Tingming Wu
-
A new measure for dynamic leakage based on quantitative information flow
Luigi D. C. Soares, Mário S. Alvim, Natasha Fernandes
-
A Reinforcement Learning Framework for Robust and Secure LLM Watermarking
Li An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang, Shiyu Chang
-
Adversarially-Aware Architecture Design for Robust Medical AI Systems
Alyssa Gerhart, Balaji Iyangar
-
Brent Winslow, Jacqueline Shreibati, Javier Perez, Hao-Wei Su, Nichole Young-Lin, Nova Hammerquist, Daniel McDuff, Jason Guss, Jenny Vafeiadou, Nick Cain, Alex Lin, Erik Schenck, Shiva Rajagopal, Jia-Ru Chung, Anusha Venkatakrishnan, Amy Armento Lee, Maryam Karimzadehgan, Qingyou Meng, Rythm Agarwal, Aravind Natarajan, Tracy Giest
-
Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
-
LAPRAD: LLM-Assisted PRotocol Attack Discovery
R.Can Aygun, Yehuda Afek, Anat Bremler-Barr, Leonard Kleinrock
-
Collaborative penetration testing suite for emerging generative AI algorithms
Petar Radanliev
-
A New Type of Adversarial Examples
Xingyang Nie, Guojie Xiao, Su Pan, Biao Wang, Huilin Ge, Tao Fang
-
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Chengcan Wu, Zhixin Zhang, Mingqian Xu, Zeming Wei, Meng Sun
-
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
Yangshijie Zhang, Xinda Wang, Jialin Liu, Wenqiang Wang, Zhicong Ma, Xingxing Jia
-
Machine Text Detectors are Membership Inference Attacks
Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
-
Hubble: a Model Suite to Advance the Study of LLM Memorization
Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia
-
OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform
Thomas Wang, Haowen Li
-
LLM Unlearning with LLM Beliefs
Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou
-
Blackbox Model Provenance via Palimpsestic Membership Inference
Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang
-
Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-eui Yoon
-
Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection
Ariana Yi, Ce Zhou, Liyang Xiao, Qiben Yan
-
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Reya Vir, Sarvesh Bhatnagar
-
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation
Omer Tariq, Muhammad Bilal, Muneeb Ul Hassan, Dongsoo Han, Jon Crowcroft
-
Revisiting the Relation Between Robustness and Universality
M. Klabunde, L. Caspari, F. Lemmerich
-
Euodia Dodd, Nataša Krčo, Igor Shilov, Yves-Alexandre de Montjoye
-
HAMLOCK: HArdware-Model LOgically Combined attacK
Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya
-
Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems
Mohamed ElShehaby, Ashraf Matrawy
-
Defending Against Prompt Injection with DataFilter
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner
-
AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices
Zhonghao Zhan, Amir Al Sadi, Krinos Li, Hamed Haddadi
-
Privacy-Preserving Spiking Neural Networks: A Deep Dive into Encryption Parameter Optimisation
Mahitha Pulivathi
-
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar
-
LLMs can hide text in other text of the same length.ipynb
Antonio Norelli, Michael Bronstein
-
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury
-
Xiang Li, Buxin Su, Chendi Wang, Qi Long, Weijie J. Su
-
Towards Strong Certified Defense with Universal Asymmetric Randomization
Hanbin Hong, Ashish Kundu, Ali Payani, Binghui Wang, Yuan Hong
-
Tushar Nayan, Ziqi Zhang, Ruimin Sun
-
Jia Deng, Jin Li, Zhenhua Zhao, Shaowei Wang
-
Machine Text Detectors are Membership Inference Attacks
Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
-
Modal Aphasia: Can Unified Multimodal Models Describe Images From Memory?
Michael Aerni, Joshua Swanson, Kristina Nikolić, Florian Tramèr
-
Rectifying Shortcut Behaviors in Preference-based Reward Learning
Wenqian Ye, Guangtao Zheng, Aidong Zhang
-
DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma
-
FeatureFool: Zero-Query Fooling of Video Models via Feature Map
Duoxun Tang, Xi Xiao, Guangwu Hu, Kangkang Sun, Xiao Yang, Dongyang Chen, Qing Li, Yongjie Yin, Jiyao Wang
-
Yifei Sun
-
Kuai Yu, Xiaoyu Wu, Peishen Yan, Qingqian Yang, Linshan Jiang, Hao Wang, Yang Hua, Tao Song, Haibing Guan
-
Thomas Hofweber, Jefrey Bergl, Ian Reyes, Amir Sadovnik
-
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Artur Zolkowski, Wen Xing, David Lindner, Florian Tramèr, Erik Jenner
-
Extracting alignment data in open models
Federico Barbero, Xiangming Gu, Christopher A. Choquette-Choo, Chawin Sitawarin, Matthew Jagielski, Itay Yona, Petar Veličković, Ilia Shumailov, Jamie Hayes
-
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
-
CourtGuard: A Local, Multiagent Prompt Injection Classifier
Isaac Wu, Michael Maslowski
-
GUIDE: Enhancing Gradient Inversion Attacks in Federated Learning with Denoising Models
Vincenzo Carletti, Pasquale Foggia, Carlo Mazzocca, Giuseppe Parrella, Mario Vento
-
Structured Debate Improves Corporate Credit Reasoning in Financial AI
Yoonjin Lee, Munhee Kim, Hanbi Choi, Juhyeon Park, Seungho Lyoo, Woojin Park
-
Elias Hossain, Swayamjit Saha, Somshubhra Roy, Ravi Prasad
-
Unified Privacy Guarantees for Decentralized Learning via Matrix Factorization
Aurélien Bellet, Edwige Cyffers, Davide Frey, Romaric Gaudel, Dimitri Lerévérend, François Taïani
-
Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Rishi Jha, Harold Triedman, Justin Wagle, Vitaly Shmatikov
-
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang, Joo-Kyung Kim
-
Chenxu Li, Zhicai Wang, Yuan Sheng, Xingyu Zhu, Yanbin Hao, Xiang Wang
-
BreakFun: Jailbreaking LLMs via Schema Exploitation
Amirkia Rafiei Oskooei, Mehmet S. Aktas
-
Fit for Purpose? Deepfake Detection in the Real World
Guangyu Lin, Li Lin, Christina P. Walker, Daniel S. Schiff, Shu Hu
-
Ryoto Miyamoto, Xin Fan, Fuyuko Kido, Tsuneo Matsumoto, Hayato Yamana
-
Colliding with Adversaries at ECML-PKDD 2025 Model Robustness Competition 1st Prize Solution
Dimitris Stefanopoulos, Andreas Voskou
-
When Intelligence Fails: An Empirical Study on Why LLMs Struggle with Password Cracking
Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar, Noor Islam, Hamid Khan
-
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Yiyang Huang, Liang Shi, Yitian Zhang, Yi Xu, Yun Fu
-
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Yiyang Huang, Liang Shi, Yitian Zhang, Yi Xu, Yun Fu
-
DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models
Yangyang Li
-
Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li
-
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
-
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
-
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei
-
Language Models are Injective and Hence Invertible
Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodola'
-
Unmasking Facial DeepFakes: A Robust Multiview Detection Framework for Natural Images
Sami Belguesmia, Mohand Saïd Allili, Assia Hamadene
-
Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
Gabriel Nixon Raj
-
Yuyuan Feng, Bin Ma, Enyan Dai
-
Adversary-Free Counterfactual Prediction via Information-Regularized Representations
Shiqin Tang, Rong Feng, Shuxin Zhuang, Hongzong Li, Youzhi Zhang
-
Constrained Adversarial Perturbation
Virendra Nishad, Bhaskar Mukhoty, Hilal AlQuabeh, Sandeep K. Shukla, Sayak Ray Chowdhury
-
Blackwell's Approachability for Sequential Conformal Inference
Guillaume Principato, Gilles Stoltz
-
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
Yuexiao Liu, Lijun Li, Xingjun Wang, Jing Shao
-
Towards Proactive Defense Against Cyber Cognitive Attacks
Bonnie Rushing, Mac-Rufus Umeokolo, Shouhuai Xu
-
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
Longwei Wang, Ifrat Ikhtear Uddin, KC Santosh, Chaowei Zhang, Xiao Qin, Yang Zhou
-
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
-
Bingjie Zhang, Yibo Yan