Editors’ Choice: LLMBench: A Comparative Close Reading Workbench For Large Language Models
Editors’ Summary: In this post, the author provides a detailed overview of the functions of the new tool, LLMbench. Berry points to Google PAIR’s LLM Comparator as a useful tool for side-by-side evaluation of models from the perspective of model developers, but the tool lacked the ability to do comparative close reading. LLMbench is a […]