Tamazight-NLP/tosd
Viewer
•
Updated
•
1.57k
•
29
•
3
Love it! Thanks for writing this! Curious about the impact on the FLOP count. Since we're reducing the size of the KV cache, the number of operations F(QK.T)V will also decrease, but is this reduction being replaced by the compression