What differentiates native agent models from modular agent frameworks?

Contribute to bytedance/UI-TARS development by creating an account on GitHub.

Native agent models differ from modular agent frameworks because workflow knowledge is embedded directly within the agent’s model through orientational learning[1]. Tasks are learned and executed in an end-to-end manner, unifying perception, reasoning, memory, and action within a single, continuously evolving model[1]. This approach is fundamentally data-driven, allowing for seamless adaptation to new tasks, interfaces, or user needs without relying on manually crafted prompts or predefined rules[1].

Frameworks are design-driven, and lack the ability to learn and generalize across tasks without continuous human involvement[1]. Native agent models lend themselves naturally to online or lifelong learning paradigms[1]. By deploying the agent in real-world GUI environments and collecting new interaction data, the model can be fine-tuned or further trained to handle novel challenges[1].