Gray Swan Introduces the Dangerous Reasoning Arena Competition

Advancing AI Red-Teaming to the Next Layer of Risk

Launching May 10, Gray Swan AI’s newest challenge invites red-teamers and AI security professionals to explore a deeper layer of vulnerability: the model’s internal reasoning.

Unlike traditional red-teaming efforts focused on output manipulation, this challenge surfaces a more subtle threat surface—dangerous chains of thought that originate in the model's planning process, even when the final response appears safe.

Key Details

Warmup Week Begins: Saturday, May 10 @ 1PM EDT (non-prized)
Prize Waves: May 17 & May 24
Total Prize Pool: $20,000
Focus: Chain-of-thought jailbreaks targeting dangerous reasoning and intent

Participants will attempt to elicit model behaviors such as:

Bioattack planning
Strategic deception
Covert backdoor insertion
Infrastructure sabotage

Even if the model’s response looks harmless—if the internal logic reflects harmful, model-originated intent, the red team has succeeded.

Why it Matters

As reasoning models become more capable, risk no longer lies solely in outputs, but in how these systems plan and decide. The Dangerous Reasoning Challenge is designed to help developers and enterprise teams anticipate emerging failure modes and build stronger safety interventions—before they become production risks.

We welcome participation from:

AI research teams
Enterprise security and red-teaming professionals
Responsible AI advocates focused on frontier model oversight

Learn More and Join the Challenge

Arena: https://app.grayswan.ai/arena/challenge/dangerous-reasoning
Community: https://discord.gg/grayswanai

If you’re working at the edge of AI capability, this is the challenge you’ll want eyes on.