Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
jsulzΒ 
posted an update 3 days ago
Post
2911
Toward the end of last year, the Xet team provided an inside look into the foundations of how we plan to enable rapid experimentation and iteration for the AI builders on the Hub: https://huggingface.co/blog/from-files-to-chunks

But it turns out chunks aren't all you need!

Our goal is to bring:
πŸš€ Faster uploads
⏬ Speedy downloads
πŸ’ͺ All without sacrificing your workflow

To do that, we need the infrastructure and system and design to back it up. As we prepare to roll out the first Xet-backed repositories on the Hub, we wrote up a post explaining the nitty gritty details of the decisions that bring this to life https://huggingface.co/blog/from-chunks-to-blocks

Complete with an interactive visualization that shows the power of deduplication in action - taking a 191GB repo to ~97GB and shaving a few hours off upload speeds.

The darker each block in the heatmap, the more we dedupe, the less we have to transfer. Clicking on a file's blocks shows all other files that share blocks.

Check it out and explore for yourself! xet-team/quantization-dedup
In this post