Q1. What are two metrics used for evaluating long-form LLM responses in research?[🎓]
- Helpfulness and Comprehensiveness
- Accuracy and Clarity
- Speed and Efficiency
- Novelty and Relevance
Answer: Helpfulness and Comprehensiveness
Q2. What methodology is employed to evaluate the performance of deep research agents?[📊]
- Side-by-side quality comparison
- Randomized controlled trials
- Expert opinion surveys
- Cross-sectional analysis
Answer: Side-by-side quality comparison
Q3. How is the correctness of responses measured for multi-hop short-form QA tasks?[🔍]
- By comparing long-form answers with ground-truths
- Through peer reviews
- Using automatic scoring systems
- Via user satisfaction ratings
Answer: By comparing long-form answers with ground-truths

Quiz: Report evaluation metrics in AI research

Related Content From The Pandipedia