Quiz: Report evaluation metrics in AI research

What are two metrics used for evaluating long-form LLM responses in research?[๐ŸŽ“]
Difficulty: Easy
What methodology is employed to evaluate the performance of deep research agents?[๐Ÿ“Š]
Difficulty: Medium
How is the correctness of responses measured for multi-hop short-form QA tasks?[๐Ÿ”]
Difficulty: Hard

Related Content From The Pandipedia