Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
It builds hierarchical feature maps (like a CNN 👀 and unlike ViT) from smaller-sized patches and merges them with neighboring patches in deeper layers.