Paper Summary for, An Image is worth 16x16 words:Transformers for image recognition at scale, by Google Research, Brain Team
This paper is about how Vision Transformers can be applied directly to image patches without reliance on CNNs or other hybrid approaches of CNN in conjunction with attention
Annotated paper can be found here
· Deep learning,Image classification