Gray Swan Research

All Research

Explore our published research to learn how the latest advances in AI safety and security give Gray Swan the edge against evolving threats.

Want to learn more about how Gray Swan can help you with custom research?

Get in touch

Comparing AI Agents to Cybersecurity Professionals In Real-World Penetration Testing

monitoring

•

Dec 2025

Justin W. Lin, Eliot Krzysztof Jones, Donovan Julian Jasper, Ethan Jun-shen Ho, Anna Wu, Arnold Tianyi Yang, Neil Perry, Andy Zou, Matt Fredrikson, J Zico Kolter, Percy Liang, Dan Boneh, Daniel E. Ho

Current cybersecurity benchmarks fail to capture the real-world capabilities of AI agents, leading to significant underestimation of cyber risk. We conducted the first head-to-head comparison of AI agents and professional penetration testers on a live enterprise network to measure true performance gaps. Our findings reveal that purpose-built agents can outperform 90% of human professionals while operating continuously at a fraction of the cost—exposing critical flaws in how the industry evaluates AI capabilities and safety guardrails.

Frontier Research That Has Defined the AI Security Field

First to Discover, First to Defend

Evaluation

Reliability and Control

Alignment & Control

Monitoring & Evaluation

Robustness & Security

All Research

Want to learn more about how Gray Swan can help you with custom research?

Comparing AI Agents to Cybersecurity Professionals In Real-World Penetration Testing

D-REX: A Benchmark For Detecting Deceptive Reasoning In Large Language Models

Adversarial Attacks on Robotic Vision Language Action Models

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Improving Alignment and Robustness with Circuit Breakers

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Representation Engineering: A Top-Down Approach to AI Transparency

Adversarial Attacks on Aligned Language Models

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Do the Rewards Justify the means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark

OpenOOD: Benchmarking Generalized Out-Of-Distribution Detection

Forecasting Future World Events with Neural Networks

Scaling Out-of-Distribution Detection for Real-World Settings

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures

Globally-Robust Neural Networks

APPS: Measuring Coding Challenge Competence With APPS

MMLU: Measuring Massive Multitask Language Understanding

Aligning AI With Shared Human Values

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Pretrained Transformers Improve Out-of-Distribution Robustness

Overfitting in Adversarially Robust Deep Learning

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Fast Is Better Than Free: Revisiting Adversarial Training

Natural Adversarial Examples

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Randomized Smoothing: Certified adversarial robustness via randomized smoothing

Using Pre-Training Can Improve Model Robustness and Uncertainty

ImageNet-C: Benchmarking Neural Network Robustness to Common Corruptions and Perturbations

Deep Anomaly Detection with Outlier Exposure

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks

Provable Defenses Against Adversarial Examples Via the Convex Outer Adversarial Polytope

Work With Us

Join our newsletter