Post
2911
Toward the end of last year, the Xet team provided an inside look into the foundations of how we plan to enable rapid experimentation and iteration for the AI builders on the Hub: https://huggingface.co/blog/from-files-to-chunks
But it turns out chunks aren't all you need!
Our goal is to bring:
π Faster uploads
β¬ Speedy downloads
πͺ All without sacrificing your workflow
To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co/blog/from-chunks-to-blocks
Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.
The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.
Check it out and explore for yourself! xet-team/quantization-dedup
But it turns out chunks aren't all you need!
Our goal is to bring:
π Faster uploads
β¬ Speedy downloads
πͺ All without sacrificing your workflow
To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co/blog/from-chunks-to-blocks
Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.
The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.
Check it out and explore for yourself! xet-team/quantization-dedup