Comparing AI Agents to Cybersecurity Professionals In Real-World Penetration Testing

Current cybersecurity benchmarks fail to capture the real-world capabilities of AI agents, leading to significant underestimation of cyber risk. We conducted the first head-to-head comparison of AI agents and professional penetration testers on a live enterprise network to measure true performance gaps. Our findings reveal that purpose-built agents can outperform 90% of human professionals while operating continuously at a fraction of the cost—exposing critical flaws in how the industry evaluates AI capabilities and safety guardrails.

Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho

This research is currently only available at its source.

You can find the research at the below link.
Feel free to contact Gray Swan with any questions or comments.

View research source

Contact