Launching May 10, Gray Swan AI’s newest challenge invites red-teamers and AI security professionals to explore a deeper layer of vulnerability: the model’s internal reasoning.
Launching May 10, Gray Swan AI’s newest challenge invites red-teamers and AI security professionals to explore a deeper layer of vulnerability: the model’s internal reasoning.
Unlike traditional red-teaming efforts focused on output manipulation, this challenge surfaces a more subtle threat surface—dangerous chains of thought that originate in the model's planning process, even when the final response appears safe.
Participants will attempt to elicit model behaviors such as:
Even if the model’s response looks harmless—if the internal logic reflects harmful, model-originated intent, the red team has succeeded.
As reasoning models become more capable, risk no longer lies solely in outputs, but in how these systems plan and decide. The Dangerous Reasoning Challenge is designed to help developers and enterprise teams anticipate emerging failure modes and build stronger safety interventions—before they become production risks.
We welcome participation from:
Arena: https://app.grayswan.ai/arena/challenge/dangerous-reasoning
Community: https://discord.gg/grayswanai
If you’re working at the edge of AI capability, this is the challenge you’ll want eyes on.