Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons Sandan, I. B.; Dinh, T. A.; Niehues, J. 2025. O. Arviv, M. Clinciu, K. Dhole, R. Dror, S. Gehrmann, E. Habba, I. Itzhak, S. Mille, Y. Perlitz, E. Santus, J. Sedoc, M. Shmueli Scheuer, G. Stanovsky & O. Tafjord (Eds.), Proceedings of the 4th Workshop on Generation, Evaluation and Metrics (GEM) / The 63rd Annual Meeting of the Association for Computational Linguistics. Ed.: K. Dhole, 121–128, Association for Computational Linguistics (ACL)