Which model is used as LLM-as-a-judge?

 title: 'Flowcharts illustrating various research frameworks: Huggingface Open DR, GPT Researcher, Open Deep Research, and Test-Time Diffusion DR'

The model used as LLM-as-a-judge in the evaluation of the Test-Time Diffusion Deep Researcher (TTD-DR) is Gemini-1.5-pro. This model was calibrated with human ratings to ensure alignment with human judgment in evaluating long-form responses produced by the research agents, as stated in the text[1].

In the context of the evaluation, the Gemini-1.5-pro model played a critical role by providing fitness scores and generating textual critiques to enhance the quality of the outputs generated by the TTD-DR framework[1].