What is the main challenge in training native GUI agents?

Contribute to bytedance/UI-TARS development by creating an account on GitHub.

One of the primary challenges in training native GUI agents is the data bottleneck^[1]. Training an end-to-end agent model demands data that integrates all components in a unified workflow, capturing the interplay between perception, reasoning, memory, and action^[1]. Comprehensive, high-quality data with rich workflow knowledge from human experts has been scarcely recorded historically, which limits the ability of native agents to generalize across diverse real-world scenarios, hindering their scalability and robustness^[1].

Another challenge is that GUI environments, with their high information density, increase the difficulty of developing robust agents^[1]. Native GUI agent models must recognize and interpret the evolving user interfaces effectively^[1].

Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.

What is the main challenge in training native GUI agents?

Related Content You May Like