Test Safety at Scale

Arena is the world's largest adversarial AI research environment where thousands of incentivized red teamers discover the vulnerabilities your internal team can't. Frontier labs use it to benchmark safety, stress-test models, evaluate their models before launching them to the public.

Talk to an Expert

A Living Adversarial Network

Structured Challenges

Gray Swan designs and launches targeted adversarial challenges against specific models, risk categories, and capability surfaces. Participants are incentivized to find what's novel.

Diverse Adversarial Perspectives

15,000+ people think differently than any internal team. The Arena surfaces attack techniques that emerge from unexpected angles, cultural contexts, linguistic patterns, and creative approaches no internal red team alone can replicate.

Research-Grade Rigor

Arena discoveries are documented with reproductions, severity classifications, and methodological transparency. This is intelligence you can cite in system cards, safety reports, and regulatory submissions.

The Largest. The Most Cited. The Most Current.

15,000+ adversarial researchers, and growing

The largest AI red-teaming network in the world. No one else has this scale or diversity of adversarial perspective.

Novel technique discovery

Arena participants are incentivized to find what's new, not re-run what's known. Your model gets tested against attacks that haven't been published yet.

Continuous operation

The Arena doesn't stop between your release cycles. Intelligence is flowing when you need it, not on a consulting timeline.

Let’s Talk

Trusted at the Frontier

Our research has directly informed the safety evaluations of some of the most advanced AI models in the world.

New Release

Put Your Model to the Test

The Arena gives you evaluation at scale, depth, and diversity that no internal team or automated tool can replicate.

Let’s Talk

AI Agent Security Cheat Sheet

Battle-Tested AI Security for Enterprise AI

Your AI Agent Can Be Compromised. You'd Never Know.

We’re Hiring: ML Engineers

Test Safety at Scale

A Living Adversarial Network

Structured Challenges

Diverse Adversarial Perspectives

Research-Grade Rigor

The Largest. The Most Cited. The Most Current.

15,000+ adversarial researchers, and growing

Novel technique discovery

Continuous operation

Trusted at the Frontier

New Release

Claude Sonnet 5

Claude Fable 5 & Claude Mythos 5

Claude Opus 4.8

Muse Spark

Claude Mythos Preview

GPT 5

Claude Sonnet 4.6

Claude Opus 4.7

Claude Opus 4.6

Claude Opus 4.5

Claude Haiku 4.5

Claude Sonnet 4.5

o3 mini

o1

Put Your Model to the Test