Which AI model scored highest in humor caption ratings?

arXiv logo

GPT-4o achieved the highest humor score among models for English datasets[3]. Clean Dataset achieved the highest humor score in Russian, followed by GPT-4o[3]. In a study, HumorSkills captions were rated only 0.08 points lower on a 5-point scale than top-rated human captions, with p=0.053[1]. This makes HumorSkills not statistically less funny than the best human captions[1].

The HumorSkills system was rated as significantly funnier than the VLM baseline, GPT-4o[1]. The study also showed that GPT-4 is capable of explaining the mechanics of jokes[2]. Another study's results indicated that humor is rooted in the frontal lobe of the cerebral cortex[4].