Which model outperformed others on the OSWorld benchmark?

[1] github.com