AI benchmarks and evaluation acronyms quiz

🤖 What are AI benchmarks primarily used for?
Difficulty: Easy
📚 An AI model is presented with a sentence like "She picked up the heavy book and..." and needs to choose the most plausible continuation from several options. Which AI benchmark is designed to evaluate a model's ability to choose the most plausible ending to a given sentence, testing commonsense natural language inference?
Difficulty: Medium
💻 A software development team wants to evaluate an AI's capability to autonomously identify and fix bugs in a large codebase. The AI is given access to GitHub issues and the repository, and its task is to generate and apply code patches. Which benchmark specifically evaluates an LLM's ability to resolve real-world software issues from GitHub by generating code patches for identified problems?
Difficulty: Hard

Related Content From The Pandipedia