Editors’ Choice: beware of geeks baring gifs: when d&d-style AI alignment breaks

There’s a fascinating piece of research out from Anthropic. Agentic Misalignment: How LLMs could be insider threats. Maybe you’ve seen it doing the rounds this weekend as it landed on Friday I think. There’s a lot of hyperbolic reaction, even for something that merits serious concern. Nevertheless, it seems these things aren’t ready to be autonomous agents yet; we shouldn’t keep opening that door until we have a better idea.

See full post.