Insights on evaluating large language models

"GPT-5 is a unified system with a smart and fast model that answers most questions." — OpenAI "We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy." — OpenAI "safe-completions seek to maximize helpfulness subject to the safety policy...

View

Quotes about AI safety and deception

"We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy." — OpenAI "Deception can also be learned during reinforcement learning in post-training." — OpenAI "While reasoning models provide unique affordances to observe deception, underst...

View

How well do you know GPT-5's multilingual abilities?

Q1. What is one of the languages that GPT-5 can perform well in? 🌍 - Spanish - Latin - Ancient Greek - Esperanto Answer: Spanish Q2. How does GPT-5's performance in multilingual contexts compare to existing models? 🌐 - It performs worse than previous models. - It performs on par with existing mode...

View

Reflections on model transparency and monitoring

"Monitoring a reasoning model’s chain of thought was highly effective at detecting misbehavior." — Unknown "Our commitment to keep our reasoning models' chain of thought as monitorable as possible allows us to conduct studies." — Unknown "Vigilant monitoring supports improvements in reasoning models...

View

Quick facts: GPT-5's defense against adversarial attacks

gpt-5-thinking is trained to follow OpenAI's safety policies. Two-tiered system monitors and blocks unsafe prompts and generations. User accounts may be banned for attempting to extract harmful bio information. Safe-completions training improves the model's response safety. Extensive red teaming ide...

View

Is deception reduced in GPT-5 models?

Yes, deception has been reduced in GPT-5 models. The developers implemented several measures to mitigate deceptive behaviors that were observed in previous models. The gpt-5-thinking model has shown a significantly lower deception rate compared to OpenAI o3, with a rate of 2.1% versus 4.8% for OpenA...

View

How are biological risks mitigated?

Biological risks are mitigated through a comprehensive approach outlined in OpenAI’s Preparedness Framework. This includes implementing a multi-layered defense stack that combines model safety training, real-time automated monitoring, and robust system-level protections. The model is trained to refu...

View

What is the PAC framework?

The PAC (Probably Approximately Correct) framework is a theoretical framework that analyzes whether a model (i.e., a product) derived via a machine learning algorithm (i.e., a generalization process) from a random sample of data can be expected to achieve a low prediction error on new data from the ...

View

How does transfer learning relate to analogy?

The text indicates that analogy is related to generalization processes in both humans and AI. It states that analogy involves the transformation or adaptation of knowledge or schemas to fit a new context. This resembles the transfer learning approach, where knowledge gained from one domain or task i...

View

Who excels at few-shot learning?

The text states that 'humans excel at generalising from a few examples, compositionality, and robust generalisation to noise, shifts, and Out-Of-Distribution (OOD) data'. This highlights human proficiency in few-shot learning, where they can effectively apply knowledge from limited data points. In ...

View

Highlighting compositionality across AI systems

Statistical methods excel in large-scale data and inference efficiency. Compositionality is a universal principle observed not only in humans but also in many other species. Neurosymbolic AI combines statistical and analytical models for robust generalisation. Statistical approaches enable universal...

View

5 key AI evaluation methods explained

AI alignment aims to make AI systems act according to our preferences. Humans excel at generalising from few examples and dealing with noise. Statistical AI models struggle with out-of-domain generalisation. Explainable mechanisms are key to achieving alignment in human-AI teaming. Evaluating AI's g...

View

What does over-parametrisation risk in continual learning?

In continual learning, over-parameterization can increase the risk of catastrophic forgetting, which refers to the model's tendency to lose previously learned information when it is adapted to new data or tasks. Larger models may exhibit a higher degree of catastrophic forgetting as they struggle to...

View

What is compositionality in AI?

Compositionality in AI refers to the ability to generate and produce novel combinations from known components, which is essential for systematic generalization. It is a fundamental principle in the design of traditional, logic-based systems. Many statistical methods have struggled with compositional...

View

Quiz: Understanding generalisation in cognitive science and AI

Q1. What is generalization in cognitive science commonly defined as? 🤔 - The process of transferring knowledge or skills from specific instances to new contexts - The ability to memorize facts - A type of data analysis - The act of learning a new skill Answer: The process of transferring knowledge ...

View