In the last year, AI agents have become all the rage. OpenAI, Google, and Anthropic all launched public-facing agents designed to take on multi-step tasks handed to them by humans. In the last month, an open-source AI agent called OpenClaw took the web by storm thanks to its impressive autonomous capabilities (and major security concerns). But we don’t really have a sense of the scale of AI agent operations, and whether all the talk is matched by actual deployment. The MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) set out to fix that with its recently published 2025 AI Agent Index, which provides our first real look at the scale and operations of AI agents in the wild.
Researchers found that interest in AI agents has undoubtedly skyrocketed in the last year or so. Research papers mentioning “AI Agent” or “Agentic AI” in 2025 more than doubled the total from 2020 to 2024 combined, and a McKinsey survey found that 62% of companies reported that their organizations were at least experimenting with AI agents.
With all that interest, the researchers focused on 30 prominent AI agents across three separate categories: chat-based options like ChatGPT Agent and Claude Code; browser-based bots like Perplexity Comet and ChatGPT Atlas; and enterprise options like Microsoft 365 Copilot and ServiceNow Agent. While the researchers didn’t provide exact figures on just how many AI agents are deployed across the web, they did offer a considerable amount of insight into how they are operating, which is largely without a safety net.
Just half of the 30 AI agents that got put under the magnifying glass by MIT CSAIL include published safety or trust frameworks, like Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, or Microsoft’s Responsible AI Standard. One in three agents has no safety framework documentation whatsoever, and five out of 30 have no compliance standards. That is troubling when you consider that 13 of 30 systems reviewed exhibit frontier levels of agency, meaning they can operate largely without human oversight across extended task sequences. Browser agents in particular tend to operate with significantly higher autonomy. This would include things like Google’s recently launched AI “Autobrowse,” which can complete multi-step tasks by navigating different websites and making use of user information to do things like log into sites on your behalf.
One of the troubles with letting agents browse freely and with few guardrails is that their activity is nearly indistinguishable from human behavior, and they do little to dispel any confusion that might occur. The researchers found that 21 out of the 30 agents provide no disclosure to end users or third parties that they are AI agents and not human users. This results in most AI agent activity being mistaken for human traffic. MIT found that just seven agents published stable User-Agent (UA) strings and IP address ranges for verification. Nearly as many explicitly use Chrome-like UA strings and residential/local IP contexts to make their traffic requests appear more human, making it next to impossible for a website to distinguish between authentic traffic and bot behavior.
For some AI agents, that’s actually a marketable feature. The researchers found that BrowserUse, an open-source AI agent, sells itself to users by claiming to bypass anti-bot systems to browse “like a human.” More than half of all the bots tested provide no specific documentation about how they handle robots.txt files (text files that are placed in a website’s root directory to instruct web crawlers on how they can interact with the site), CAPTCHAs that are meant to authenticate human traffic, or site APIs. Perplexity has even made the case that agents acting on behalf of users shouldn’t be subject to scraping restrictions since they function “just like a human assistant.”
The fact that these agents are out in the wild without much protection in place means there is a real threat of exploits. There is a lack of standardization for safety evaluations and disclosures, leaving many agents potentially vulnerable to attacks like prompt injections, in which an AI agent picks up on a hidden malicious prompt that can make it break its safety protocols. Per MIT, nine of 30 agents have no documentation of guardrails against potentially harmful actions. Nearly all of the agents fail to disclose internal safety testing results, and 23 of the 30 offer no third-party testing information on safety.
Just four agents—ChatGPT Agent, OpenAI Codex, Claude Code, and Gemini 2.5—provided agent-specific system cards, meaning the safety evaluations were tailored to how the agent actually operates, not just the underlying model. But frontier labs like OpenAI and Google offer more documentation on “existential and behavioral alignment risks,” they lack details on the type of security vulnerabilities that may arise during day-to-day activities—a habit that the researchers refer to as “safety washing,” which they describe as publishing high-level safety and ethics frameworks while only selectively disclosing the empirical evidence required to rigorously assess risk.
There has at least been some momentum toward addressing the concerns raised by MIT’s researchers. Back in December, OpenAI and Anthropic (among others) joined forces, announcing a foundation to create a development standard for AI agents. But the AI Agent Index shows just how wide the transparency gap is when it comes to agentic AI operation. AI agents are flooding the web and workplace, functioning with a shocking amount of autonomy and minimal oversight. There’s little to indicate at the moment that safety will catch up to scale any time soon.
Trending Products
Zalman P10 Micro ATX Case, MATX PC Case with 120mm ARGB Fan Pre-Put in, Panoramic View Tempered Glass Entrance & Aspect Panel, USB Sort C and USB 3.0, White
Wireless Keyboard and Mouse, Ergonomic Keyboard Mouse – RGB Backlit, Rechargeable, Quiet, with Phone Holder, Wrist Rest, Lighted Mac Keyboard and Mouse Combo, for Mac, Windows, Laptop, PC
Nimo 15.6 FHD Pupil Laptop computer, 16GB RAM, 1TB SSD, Intel Pentium Quad-Core N100 (Beat to i3-1115G4, As much as 3.4GHz), Backlit Keyboard, Fingerprint, 2 Years Guarantee, 90 Days Return, WiFi 6, Win 11
Dell S2722DGM Curved Gaming Monitor – 27-inch QHD (2560 x 1440) 1500R Curved Display, 165Hz Refresh Rate (DisplayPort), HDMI/DisplayPort Connectivity, Height/Tilt Adjustability – Black
GIM Micro ATX PC Case with 2 Tempered Glass Panels Mini Tower Gaming PC Case Micro ATX Case with 2 Magnet Mud Filters, Gaming Pc Case with USB3.0 I/O Port, Black with out Followers