Looped language model training cannot control hidden-state norm growth because RMSNorm normalizes scale away before the loss ...
The model learns that hedging is a signal of lower-quality output. This creates a systematic bias toward sounding certain.