Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks

This research is currently only available at its source.

You can find the research at the below link.
Feel free to contact Gray Swan with any questions or comments.