Q1. 🤖 What are AI benchmarks primarily used for?
- Training AI models
- Measuring and comparing AI model performance
- Deploying AI applications
- Generating synthetic data
Answer: Measuring and comparing AI model performance
Q2. 📚 An AI model is presented with a sentence like "She picked up the heavy book and..." and needs to choose the most plausible continuation from several options. Which AI benchmark is designed to evaluate a model's ability to choose the most plausible ending to a given sentence, testing commonsense natural language inference?
- MMLU
- SuperGLUE
- HellaSwag
- HumanEval
Answer: HellaSwag
Q3. 💻 A software development team wants to evaluate an AI's capability to autonomously identify and fix bugs in a large codebase. The AI is given access to GitHub issues and the repository, and its task is to generate and apply code patches. Which benchmark specifically evaluates an LLM's ability to resolve real-world software issues from GitHub by generating code patches for identified problems?
- MBPP
- HumanEval
- SWE-bench
- DS-1000
Answer: SWE-bench

AI benchmarks and evaluation acronyms quiz

Related Content From The Pandipedia