Paper Summary for, An Image is worth 16x16 words:Transf...

Paper Summary for, An Image is worth 16x16 words:Transformers for image recognition at scale, by Google Research, Brain Team

This paper is about how Vision Transformers can be applied directly to image patches without reliance on CNNs or other hybrid approaches of CNN in conjunction with attention
 
Annotated paper can be found here

· Deep learning,Image classification