Editors’ Choice: beware of geeks baring gifs: when d&d-style AI alignment breaks
There’s a fascinating piece of research out from Anthropic. Agentic Misalignment: How LLMs could be insider threats. Maybe you’ve seen it doing the rounds this weekend as it landed on Friday I think. There’s a lot of hyperbolic reaction, even for something that merits serious concern. Nevertheless, it seems these things aren’t ready to be […]