How did "Vision Transformers" improve image recognition?

Follow Up Recommendations