Gray Swan Introduces the Dangerous Reasoning Arena Competition

Launching May 10, Gray Swan AI’s newest challenge invites red-teamers and AI security professionals to explore a deeper layer of vulnerability: the model’s internal reasoning.

Gray Swan
May 7, 2025

Advancing AI Red-Teaming to the Next Layer of Risk

Launching May 10, Gray Swan AI’s newest challenge invites red-teamers and AI security professionals to explore a deeper layer of vulnerability: the model’s internal reasoning.

Unlike traditional red-teaming efforts focused on output manipulation, this challenge surfaces a more subtle threat surface—dangerous chains of thought that originate in the model's planning process, even when the final response appears safe.

Key Details

  • Warmup Week Begins: Saturday, May 10 @ 1PM EDT (non-prized)
  • Prize Waves: May 17 & May 24
  • Total Prize Pool: $20,000
  • Focus: Chain-of-thought jailbreaks targeting dangerous reasoning and intent

Participants will attempt to elicit model behaviors such as:

  • Bioattack planning
  • Strategic deception
  • Covert backdoor insertion
  • Infrastructure sabotage

Even if the model’s response looks harmless—if the internal logic reflects harmful, model-originated intent, the red team has succeeded.

Why it Matters

As reasoning models become more capable, risk no longer lies solely in outputs, but in how these systems plan and decide. The Dangerous Reasoning Challenge is designed to help developers and enterprise teams anticipate emerging failure modes and build stronger safety interventions—before they become production risks.

We welcome participation from:

  • AI research teams
  • Enterprise security and red-teaming professionals
  • Responsible AI advocates focused on frontier model oversight

Learn More and Join the Challenge

Arena: https://app.grayswan.ai/arena/challenge/dangerous-reasoning
Community: https://discord.gg/grayswanai

If you’re working at the edge of AI capability, this is the challenge you’ll want eyes on.