Editors' Choice

Editors’ Choice: LLMBench: A Comparative Close Reading Workbench For Large Language Models

DHNow

Editors’ Summary: In this post, the author provides a detailed overview of the functions of the new tool, LLMbench. Berry points to Google PAIR’s LLM Comparator as a useful tool for side-by-side evaluation of models from the perspective of model developers, but the tool lacked the ability to do comparative close reading. LLMbench is a browser-based tool, like Voyant, that treats the text itself as a probabilistic object. The author details the six modes of LLMbench and how they can be utilized for humanistic research, including the ‘Compare’ mode that relies on the “logprob” data. He argues that the logprobs are an underutilized tool in humanistic and social scientific readings of AI. The deployed version is available at https://llm-bench-mu.vercel.app/.

See full post.