
Compared to Gemini 1.5 models, the 2.0 models were substantially safer, but over-refused on a wide variety of benign user requests[1]. In Gemini 2.5, the focus has been on improving helpfulness/instruction following, specifically to reduce refusals on such benign requests[1]. This means training Gemini to answer questions as accurately as possible, while prioritizing safety and minimizing unhelpful responses[1].
New models are more willing to engage with prompts where previous models may have over-refused, and this nuance can impact automated safety scores[1]. Manual review confirmed losses were overwhelmingly false positives or not egregious, concentrated around explicit requests to produce sexually suggestive or hateful content, mostly in the context of creative use-cases[1]. There was no increased violation outside these specific contexts[1].
Get more accurate answers with Super Pandi, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: