Eloquent CLEF Lab for evaluation of generative language model quality

Here's how to participate

Sign up here!

Participating in the tasks can be done as a simple one-off experiment on one or more of the tasks or a more elaborate experimental approach, depending on how you want to work with the challenges.

We welcome experimental reports, which will be published in the working notes of the workshop, but there is no requirement to submit a formal report; if experiments involve hypothesis testing and exploration of more lasting value, they can be revised and later published elsewhere in more archival channels.

Typically, this is done after the workshops where ideas and learnings have been exchanged between participants.

Sign up now by e-mail to ELOQUENT organisers

eloquent-clef2026-organizers@googlegroups.com

Timeline

Fall 2025: task formulation
January 1 2026: tasks open and public announcement of tasks
29 March to 2 April, 2026: presentation of ELOQUENT at the ECIR
End of May 2026: submission deadline of experimental runs from participants (TBC)
First week of June 2026: participant report submission deadline (TBC)
First week of July 2026: camera ready report submission deadline (TBC)
21-24 September 2026: ELOQUENT workshop at CLEF in Jena

Task Voight-Kampff

Can your LLM fool a classifier to believe it is human?

This task explores whether automatically-generated text can be distinguished from human-authored text, and is organised in collaboration with the PAN lab at CLEF.

part human, part machine

Task Robustness and Consistency

Will your machine respond with the same content to all of us?

This task has run in two variants in previous editions, and this year will tests how well a model achieves consistency across several languages and how well the model adapts to the local culture of a linguistic area.

janus, a two-faced deity, depicted on a roman coin

Task Generating and Scoring Exams

Can your language model prep, sit, or rate an exam for you?

In an evolved verion of the first year's Topical Quiz Task and the second year's Sensemaking task, this year the Exam task is developed together with the OECD for the purposes of supporting future PISA tests. It will have two subtasks: creating test items from a given text and scoring student responses to test items.

study session

ELOQUENT goals

The ELOQUENT evaluation lab experiments with new evaluation methods for generative language models in order to meet some of the challenges in the path from laboratory to application. The intention of the lab is to explore the following important characteristics of generative language model quality:

Trustworthiness: a many-faceted notion which involves topical relevance and truthfulness, discourse competence, reasoning in language, controllability, and robustness across varied input.
Multi-linguality and cultural fit: the suitability of a language model for some cultural and linguistic area.
Self-assessment: the reliability of a language model to assess the quality of itself or some other language model, using as little human effort as possible.
Limits of language models: the delimitation of world knowledge and generative capacity.

Student projects

Do you teach a class related to generative language models? Do you supervise students interested in generative language models? Are you a student searching for a project?

The ELOQUENT tasks are suitable for use as a class assignment or as a diploma project. Get in touch with us for suggestions of extensions and other ideas!

2026 Workshop in Jena

The third ELOQUENT Workshop will be held at the CLEF conference in Jena, September 21-24 2026.

The workshop program will hold overview presentations, an invited keynote, and some selected participant presentations.

Previous Editions

The first edition of ELOQUENT ran in 2024 and involved four tasks: Voight-Kampff, Hallucigen, Robustness and Consistency, and Topical Quiz.

A first presentation and announcement of the first 2024 edition of ELOQUENT is in the Proceedings of the 2024 European Conference on Information Retrieval (ECIR)
Overview of the results from ELOQUENT 2024 is in the CLEF 2024 publication, volume 2
Comprehensive task reports and participant papers from 2024 are in the CLEF 2024 Working notes

The second edition of ELOQUENT ran in 2025 and involved four tasks: Voight-Kampff, Robustness and Consistency, Preference Prediction, and Sensemaking (a development of the Topical Quiz task).

A first presentation and announcement of the second 2025 edition of ELOQUENT is in the Proceedings of the 2025 European Conference on Information Retrieval (ECIR)
Overview of ELOQUENT 2025 is in the CLEF 2025 Proceedings : Experimental IR Meets Multilinguality, Multimodality, and Interaction 16th International Conference of the CLEF Association. 2025. Springer.
Comprehensive task reports and participant papers from 2025 are in the CLEF 2025 Working notes

Organising Committee

AMD Silo AI: Maria Barrett, Jussi Karlgren, Georgios Stampoulidis
Charles University: Ondřej Bojar and Pavel Šindelář
Fraunhofer IAIS: Marie Isabel Engels
OECD: Mario Piacentini, Luis Francisco Vargas Madriz, Katherina Thomas
Université Grenoble Alpes: Diandra Fabre, Lorraine Goeuriot, Philippe Mulhem, Didier Schwab, Markarit Vartampetian
Université de Toulouse, IRIT: Josiane Mothe

Contact us at eloquent-clef2026-organizers AT googlegroups.com

Committee members for past editions:
2024: Luise Dürlich, Evangelia Gogoulou, Liane Guillou, Joakim Nivre, Magnus Sahlgren, Aarne Talman
2025: Ekaterina Artemova, Vladislav Mikhailov, Erik Velldal, Lilja Øvrelid

Thank you

The ELOQUENT lab is partially supported by the OpenEuroLLM and the DeployAI projects through their activities on building, evaluating, and disseminating generative language models.

Page layout from Codepen.