Neural Machine Translation with Attention π
A PyTorch implementation of a Sequence-to-Sequence model with Attention for English-Spanish translation.
π Features
- Bidirectional GRU Encoder: Captures context from both directions of the input sequence
- Attention Mechanism: Helps the model focus on relevant parts of the input sequence
- Teacher Forcing: Implements curriculum learning for better training stability
- Dynamic Batching: Efficient training with variable sequence lengths
- Hugging Face Integration: Uses MarianTokenizer for robust text processing
ποΈ Architecture
The model consists of three main components:
- Encoder: Bidirectional GRU network that processes input sequences
- Attention: Computes attention weights for each encoder state
- Decoder: GRU network that generates translations using attention context
Input β Encoder β Attention β Decoder β Translation
β β β
Embeddings Context Attention Weights
π Quick Start
- Clone the repository:
git clone https://github.com/yourusername/nmt-attention.git
cd nmt-attention
- Install dependencies:
pip install torch transformers datasets
- Train the model:
python train.py
- Translate text:
from translate import translate
text = "How are you?"
translated = translate(model, text, tokenizer)
print(translated)
# Loading a saved model
model = Seq2Seq(encoder, decoder, device)
model.load_state_dict(torch.load('LSTM_text_generator.pth'))
model.eval()
π Model Performance
Training metrics after 10 epochs:
- Initial Loss: 11.147
- Final Loss: 3.527
- Training Time: ~2 hours on NVIDIA V100
π§ Hyperparameters
BATCH_SIZE = 32
LEARNING_RATE = 1e-3
CLIP = 1.0
N_EPOCHS = 10
ENC_EMB_DIM = 256
DEC_EMB_DIM = 256
ENC_HID_DIM = 512
DEC_HID_DIM = 512
π Dataset
Using the loresiensis/corpus-en-es
dataset from Hugging Face Hub, which provides English-Spanish sentence pairs for training.
π€ Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Attention Is All You Need paper
- Hugging Face for the transformers library and datasets
- PyTorch team for the amazing deep learning framework
βοΈ If you found this project helpful, please consider giving it a star!
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.