According to the researchers, three core mechanisms could eliminate the majority of known attack types. The first is a clear separation between instructions and untrusted data
A research paper published on May 20 by teams from Google, Gray Swan AI, EmbraceTheRed, and several universities argues that securing AI agents requires rethinking how the entire system is built, not just how the model itself behaves. The paper contends that treating the AI model as the sole security perimeter leaves too many attack surfaces unaddressed. Researchers said efforts focused only on model robustness are insufficient on their own.
According to the researchers, three core mechanisms could eliminate the majority of known attack types. The first is a clear separation between instructions and untrusted data, so that attackers cannot embed malicious commands inside content the agent is processing. Without this boundary, a bad actor can hijack an agent's behavior by hiding instructions inside what appears to be ordinary input.
The second mechanism limits permissions. The paper argues agents should only hold the minimum access required to complete a task, rather than broad system-level rights. The third transfers control of sensitive data flow away from the agent entirely, placing it at the system level, so the agent cannot be manipulated into routing private information to unauthorized destinations. These three controls, applied together, address what the researchers describe as the structural root of most AI attack scenarios.
Aaron Ratcliff, attributions lead at blockchain intelligence firm Merkle Science, said giving an AI agent access to a wallet introduces a layer of trust into a system designed to be trustless. He said the setup can be safe if built correctly, but listed several conditions, including the ability to catch front-running, apply slippage limits, audit contracts in real time, sandbox prompts, prevent injection, and block man-in-the-middle access. Ratcliff said he would want proof of all those capabilities before the agent executes a trade.
AI agents are currently being used to build Web3 applications, launch tokens, and interact autonomously with services and protocols. Some platforms are also exploring AI for trading, and the combination of autonomous decision-making with on-chain execution is drawing both developer interest and security scrutiny. The researchers said the goal of their framework is to apply the same systematic controls that have protected conventional software systems to this emerging class of autonomous agents.
