UI-TARS achieved state-of-the-art results across a variety of standard benchmarks and demonstrated improvements over prior models[1]. In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude’s 22.0 and 14.9 respectively[2].
UI-TARS-72B with a 15-step budget (22.7) is comparable to Claude when the latter is given a 50-step budget (22.0), showing great execution efficiency[2]. UI-TARS-72B-DPO achieves a new SOTA result 24.6 on OSWorld with a budget of 50 steps[2].
Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: