Commit
·
f628f42
1
Parent(s):
550eb56
Update README.md: clarify this is an attention implementation, not a trained model
Browse files
README.md
CHANGED
@@ -1,6 +1,6 @@
|
|
1 |
# DeepSeek Multi-Latent Attention
|
2 |
|
3 |
-
|
4 |
|
5 |
## Key Features
|
6 |
|
|
|
1 |
# DeepSeek Multi-Latent Attention
|
2 |
|
3 |
+
This repository provides a PyTorch implementation of the Multi-Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
|
4 |
|
5 |
## Key Features
|
6 |
|