bird-of-paradise commited on
Commit
f628f42
·
1 Parent(s): 550eb56

Update README.md: clarify this is an attention implementation, not a trained model

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,6 +1,6 @@
1
  # DeepSeek Multi-Latent Attention
2
 
3
- A PyTorch implementation of the Multi-Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. MLA significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture.
4
 
5
  ## Key Features
6
 
 
1
  # DeepSeek Multi-Latent Attention
2
 
3
+ This repository provides a PyTorch implementation of the Multi-Latent Attention (MLA) mechanism introduced in the DeepSeek-V2 paper. **This is not a trained model, but rather a modular attention implementation** that significantly reduces KV cache for efficient inference while maintaining model performance through its innovative architecture. It can be used as a drop-in attention module in transformer architectures.
4
 
5
  ## Key Features
6