How are distributional shifts measured in AI?

 title: 'Fig. 1: Comparison of the strengths of humans and statistical ML machines, illustrating the complementary ways they generalise in human-AI teaming scenarios. Humans excel at compositionality, common sense, abstraction from a few examples, and robustness. Statistical ML excels at large-scale data and inference efficiency, inference correctness, handling data complexity, and the universality of approximation. Overgeneralisation biases remain challenging for both humans and machines. Collaborative and explainable mechanisms are key to achieving alignment in human-AI teaming. See Table 3 for a complete overview of the properties of machine methods, including instance-based and analytical machines.'

Distributional shifts in AI can be measured using statistical distance measures such as the Kullback-Leibler divergence or the Wasserstein distance, which compare the feature distributions of the training and test sets. Generative models provide an explicit likelihood estimate \(p(x)\) that indicates how typical a sample is to the training distribution. For discriminative models, proxy techniques include calculating cosine similarity between embedding vectors and using nearest-neighbour distances in a transformed feature space. Additionally, perplexity is used to gauge familiarity in large language models when direct access to internal representations is not possible[1].