Quiz: Report evaluation metrics in AI research

What are two metrics used for evaluating long-form LLM responses in research?[🎓]
Difficulty: Easy
What methodology is employed to evaluate the performance of deep research agents?[📊]
Difficulty: Medium
How is the correctness of responses measured for multi-hop short-form QA tasks?[🔍]
Difficulty: Hard