Misaligned goals in artificial intelligence (original) (raw)

Artificial intelligence agents sometimes misbehave due to faulty objective functions that fail to adequately encapsulate the programmers' intended goals. The misaligned objective function may look correct to the programmer, and may even perform well in a limited test environment, yet may still produce unanticipated and undesired results when deployed.