
Organization
Nov 25, 2024: Anonymized Runs for Pilot Released
AutoJudge Pilot at TREC 25
Obtain the Data Set
- Please obtain anonymized participant runs from the dataset page. – password is the same as for TREC “active members”
- Obtain necessary corpora and task instruction from the web pages of the host tracks
Submission
Pilot participants will hand in
- (MUST) a leaderboard, in a format congruent to outputs of
trec_eval -q which contains per-topic eveluation scores as well as overall evaluation scores “all”
- (Optional) judgments as a qrels file
- (Optional) nugget banks (format TBD)
- (Optional) any additional useful auto-judge artifact that arises
- (Optional) annotations in Rubric Autograder Workbench Format format.
I am explicitly inviting “manual runs”, i.e. judgments. This could be based on manual nuggets with auto-scanning. It could also allow someone to develop new support tools for human judgments.
I am planning to keep this submission open “for submission” until the end of January. But I would highly appreciate it if you are able to share intermediate results at the TREC 25 workshop (Dec 11).
When official leaderboards and manual assessments are shared with participants of the host tracks, we will share them with you as well, so participants can perform a meta-evaluation.
Ideally this would allow everybody to drill into questions on consistency between different human judges vs different LLMs. How judgments relate to nuggets. Even whether manual judgments can be retrofit to new nugget banks.
This meta-evaluation will be informal, but I encourage you to incorporate it into your TREC notebook for the proceedings.
Also, please share any prior findings and opinions with the AutoJudge organizers, so we can make the official AutoJudge track more useful.
Proposal for TREC AutoJudge 2026
TREC Auto Judge Proposal
Track Coordinators
Main:
- Laura Dietz, University of New Hampshire, USA, dietz@cs.unh.edu
- Naghmeh Farzi, University of New Hampshire, USA, naghmeh.farzi@unh.edu
- Eugene Yang, Johns Hopkins University, USA, eugene.yang@jhu.edu
- (Content Modification) Oleg Zendel, RMIT University, Australia, oleg.zendel@rmit.edu.au
Advisory:
- Charles L. A. Clarke, University of Waterloo, Canada, claclark@plg.uwaterloo.ca
- Hossein A. Rahmani, University College London, UK,
hossein.rahmani.22@ucl.ac.uk
TIRA integration:
- TIRA Maik Fröbe, Friedrich-Schiller-Universität Jena, Germany, maik.froebe@uni-jena.de
- TIRA Tim Hagen, University of Kassel and hessian.AI, Germany, tim.hagen@uni-kassel.de
- TIRA Martin Potthast, University of Kassel, hessian.AI, and ScaDS.AI, Germany, martin.potthast@uni-kassel.de
Host Track Liaisons:
- RAG liaison Ronak Pradeep, University of Waterloo, Canada, rpradeep@uwaterloo.ca
- RAGTIME liaison Dawn Lawrie, Johns Hopkins University, USA, lawrie@jhu.edu
- RAGTIME liaison Eugene Yang, Johns Hopkins University, USA, eugene.yang@jhu.edu
- DRAGUN liaison Dake Zhang, University of Waterloo, Canada, dake.zhang@uwaterloo.ca
- BioGen liaison Deepak Gupta, NIH, USA, deepak.gupta@nih.gov