Is your LLM really clever? Can it mark its own homework?
ELOQUENT Lab 2026
3d edition of the lab for evaluation of generative language model quality at CLEF, the Conference and Labs of the Evaluation Forum
Here's how to participate
Sign up here!

Participating in the tasks can be done as a simple one-off experiment on one or more of the tasks or a more elaborate experimental approach, depending on how you want to work with the challenges.

We welcome experimental reports, which will be published in the working notes of the workshop, but there is no requirement to submit a formal report; if experiments involve hypothesis testing and exploration of more lasting value, they can be revised and later published elsewhere in more archival channels.

Typically, this is done after the workshops where ideas and learnings have been exchanged between participants.

Sign up now by e-mail to ELOQUENT organisers

eloquent-clef2026-organizers@googlegroups.com

Timeline

  • Fall 2025: task formulation
  • January 1 2026: tasks open and public announcement of tasks
  • 29 March to 2 April, 2026: presentation of ELOQUENT at the ECIR
  • End of May 2026: submission deadline of experimental runs from participants (TBC)
  • First week of June 2026: participant report submission deadline (TBC)
  • First week of July 2026: camera ready report submission deadline (TBC)
  • 21-24 September 2026: ELOQUENT workshop at CLEF in Jena

Task Voight-Kampff

Can your LLM fool a classifier to believe it is human?

This task explores whether automatically-generated text can be distinguished from human-authored text, and is organised in collaboration with the PAN lab at CLEF.

part human, part machine

Task Robustness and Consistency

Will your machine respond with the same content to all of us?

This task has run in two variants in previous editions, and this year will tests how well a model achieves consistency across several languages and how well the model adapts to the local culture of a linguistic area.

janus, a two-faced deity, depicted on a roman coin

Task Generating and Scoring Exams

Can your language model prep, sit, or rate an exam for you?

In an evolved verion of the first year's Topical Quiz Task and the second year's Sensemaking task, this year the Exam task is developed together with the OECD for the purposes of supporting future PISA tests. It will have two subtasks: creating test items from a given text and scoring student responses to test items.

study session

ELOQUENT goals

The ELOQUENT evaluation lab experiments with new evaluation methods for generative language models in order to meet some of the challenges in the path from laboratory to application. The intention of the lab is to explore the following important characteristics of generative language model quality:

  1. Trustworthiness: a many-faceted notion which involves topical relevance and truthfulness, discourse competence, reasoning in language, controllability, and robustness across varied input.
  2. Multi-linguality and cultural fit: the suitability of a language model for some cultural and linguistic area.
  3. Self-assessment: the reliability of a language model to assess the quality of itself or some other language model, using as little human effort as possible.
  4. Limits of language models: the delimitation of world knowledge and generative capacity.
Student projects

Do you teach a class related to generative language models? Do you supervise students interested in generative language models? Are you a student searching for a project?

a teacher

The ELOQUENT tasks are suitable for use as a class assignment or as a diploma project. Get in touch with us for suggestions of extensions and other ideas!

2026 Workshop in Jena

The third ELOQUENT Workshop will be held at the CLEF conference in Jena, September 21-24 2026.

a spektrometer built in Jena

The workshop program will hold overview presentations, an invited keynote, and some selected participant presentations.

Previous Editions

The first edition of ELOQUENT ran in 2024 and involved four tasks: Voight-Kampff, Hallucigen, Robustness and Consistency, and Topical Quiz.

The second edition of ELOQUENT ran in 2025 and involved four tasks: Voight-Kampff, Robustness and Consistency, Preference Prediction, and Sensemaking (a development of the Topical Quiz task).

Organising Committee
some people at a work table

  • AMD Silo AI: Maria Barrett, Jussi Karlgren, Georgios Stampoulidis
  • Charles University: Ondřej Bojar and Pavel Šindelář
  • Fraunhofer IAIS: Marie Isabel Engels
  • OECD: Mario Piacentini, Luis Francisco Vargas Madriz, Katherina Thomas
  • Université Grenoble Alpes: Diandra Fabre, Lorraine Goeuriot, Philippe Mulhem, Didier Schwab, Markarit Vartampetian
  • Université de Toulouse, IRIT: Josiane Mothe
Contact us at eloquent-clef2026-organizers AT googlegroups.com

Committee members for past editions:
2024: Luise Dürlich, Evangelia Gogoulou, Liane Guillou, Joakim Nivre, Magnus Sahlgren, Aarne Talman
2025: Ekaterina Artemova, Vladislav Mikhailov, Erik Velldal, Lilja Øvrelid

Thank you

The ELOQUENT lab is partially supported by the OpenEuroLLM and the DeployAI projects through their activities on building, evaluating, and disseminating generative language models.

Page layout from Codepen.