Deep Alignment

Limits of External Controls in AI Alignment

By Robyn Wyrick — January 30, 2026

Over the past few years, something unsettling has begun to show up in evaluations of advanced AI systems. In testing environments and red-team exercises, models have been observed doing things we didn’t expect—deceiving evaluators, hiding capabilities, and deliberately underperforming when they believe they’re being watched. This article examines the persistent failure modes of external alignment controls and the deeper mismatch at the heart of today’s alignment challenge.

Articles

Limits of External Controls in AI Alignment