| # Autoencoder Implementation for Hugging Face Transformers | |
| A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models. | |
| ## π Features | |
| - **Full Hugging Face Integration**: Compatible with `AutoModel`, `AutoConfig`, and `AutoTokenizer` patterns | |
| - **Standard Training Workflows**: Works with `Trainer`, `TrainingArguments`, and all HF training utilities | |
| - **Model Hub Compatible**: Save and share models on Hugging Face Hub with `push_to_hub()` | |
| - **Flexible Architecture**: Configurable encoder-decoder architecture with various activation functions | |
| - **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss | |
| - **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders | |
| - **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more | |
| - **Learnable Preprocessing**: Neural Scaler and Normalizing Flow preprocessors (2D and 3D tensors) | |
| - **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions | |
| - **Production Ready**: Proper serialization, checkpointing, and inference support | |
| ## π¦ Installation | |
| ```bash | |
| uv sync # or: pip install -e . | |
| ``` | |
| Dependencies (see pyproject.toml): | |
| - `torch>=2.8.0` | |
| - `transformers>=4.55.2` | |
| - `numpy>=2.3.2` | |
| - `scikit-learn>=1.7.1` | |
| - `datasets>=4.0.0` | |
| - `accelerate>=1.10.0` | |
| ## ποΈ Architecture | |
| Note: This repository has been trimmed to essentials for easy reuse and distribution. Example scripts and tests were removed by request. | |
| The implementation consists of three main components: | |
| ### 1. AutoencoderConfig | |
| Configuration class that inherits from `PretrainedConfig`: | |
| - Defines model architecture parameters | |
| - Handles validation and serialization | |
| - Enables `AutoConfig.from_pretrained()` functionality | |
| ### 2. AutoencoderModel | |
| Base model class that inherits from `PreTrainedModel`: | |
| - Implements encoder-decoder architecture | |
| - Provides latent space representation | |
| - Returns structured outputs with `AutoencoderOutput` | |
| ### 3. AutoencoderForReconstruction | |
| Task-specific model for reconstruction: | |
| - Adds reconstruction loss calculation | |
| - Compatible with `Trainer` for easy training | |
| - Returns `AutoencoderForReconstructionOutput` with loss | |
| ## π§ Quick Start | |
| ### Basic Usage | |
| ```python | |
| from configuration_autoencoder import AutoencoderConfig | |
| from modeling_autoencoder import AutoencoderForReconstruction | |
| import torch | |
| # Create configuration | |
| config = AutoencoderConfig( | |
| input_dim=784, # Input dimensionality (e.g., 28x28 images flattened) | |
| hidden_dims=[512, 256], # Encoder hidden layers | |
| latent_dim=64, # Latent space dimension | |
| activation="gelu", # Activation function (18+ options available) | |
| reconstruction_loss="mse", # Loss function (12+ options available) | |
| autoencoder_type="classic", # Autoencoder type (7 types available) | |
| # Optional learnable preprocessing | |
| use_learnable_preprocessing=True, | |
| preprocessing_type="neural_scaler", # or "normalizing_flow" | |
| ) | |
| # Create model | |
| model = AutoencoderForReconstruction(config) | |
| # Forward pass | |
| input_data = torch.randn(32, 784) # Batch of 32 samples | |
| outputs = model(input_values=input_data) | |
| print(f"Reconstruction loss: {outputs.loss}") | |
| print(f"Latent shape: {outputs.last_hidden_state.shape}") | |
| print(f"Reconstructed shape: {outputs.reconstructed.shape}") | |
| ``` | |
| ### Training with Hugging Face Trainer | |
| ```python | |
| from transformers import Trainer, TrainingArguments | |
| from torch.utils.data import Dataset | |
| class AutoencoderDataset(Dataset): | |
| def __init__(self, data): | |
| self.data = torch.FloatTensor(data) | |
| def __len__(self): | |
| return len(self.data) | |
| def __getitem__(self, idx): | |
| return { | |
| "input_values": self.data[idx], | |
| "labels": self.data[idx] # For autoencoder, input = target | |
| } | |
| # Prepare data | |
| train_dataset = AutoencoderDataset(your_training_data) | |
| val_dataset = AutoencoderDataset(your_validation_data) | |
| # Training arguments | |
| training_args = TrainingArguments( | |
| output_dir="./autoencoder_output", | |
| num_train_epochs=10, | |
| per_device_train_batch_size=64, | |
| per_device_eval_batch_size=64, | |
| warmup_steps=500, | |
| weight_decay=0.01, | |
| logging_dir="./logs", | |
| evaluation_strategy="steps", | |
| eval_steps=500, | |
| save_steps=1000, | |
| load_best_model_at_end=True, | |
| ) | |
| # Create trainer | |
| trainer = Trainer( | |
| model=model, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| eval_dataset=val_dataset, | |
| ) | |
| # Train | |
| trainer.train() | |
| # Save model | |
| model.save_pretrained("./my_autoencoder") | |
| config.save_pretrained("./my_autoencoder") | |
| ``` | |
| ### Using AutoModel Framework | |
| ```python | |
| from register_autoencoder import register_autoencoder_models | |
| from transformers import AutoConfig, AutoModel | |
| # Register models with AutoModel framework | |
| register_autoencoder_models() | |
| # Now you can use standard HF patterns | |
| config = AutoConfig.from_pretrained("./my_autoencoder") | |
| model = AutoModel.from_pretrained("./my_autoencoder") | |
| # Use the model | |
| outputs = model(input_values=your_data) | |
| ``` | |
| ## βοΈ Configuration Options | |
| The `AutoencoderConfig` class supports extensive customization: | |
| ```python | |
| config = AutoencoderConfig( | |
| input_dim=784, # Input dimension | |
| hidden_dims=[512, 256, 128], # Encoder hidden layers | |
| latent_dim=64, # Latent space dimension | |
| activation="gelu", # Activation function (see full list below) | |
| dropout_rate=0.1, # Dropout rate (0.0 to 1.0) | |
| use_batch_norm=True, # Use batch normalization | |
| tie_weights=False, # Tie encoder/decoder weights | |
| reconstruction_loss="mse", # Loss function (see full list below) | |
| autoencoder_type="variational", # Autoencoder type (see types below) | |
| beta=0.5, # Beta parameter for Ξ²-VAE | |
| temperature=1.0, # Temperature for Gumbel softmax | |
| noise_factor=0.1, # Noise factor for denoising AE | |
| # Recurrent autoencoder parameters | |
| rnn_type="lstm", # RNN type: "lstm", "gru", "rnn" | |
| num_layers=2, # Number of RNN layers | |
| bidirectional=True, # Bidirectional encoding | |
| sequence_length=None, # Fixed sequence length (None for variable) | |
| teacher_forcing_ratio=0.5, # Teacher forcing ratio during training | |
| # Learnable preprocessing parameters | |
| use_learnable_preprocessing=False, # Enable learnable preprocessing | |
| preprocessing_type="none", # "none", "neural_scaler", "normalizing_flow" | |
| preprocessing_hidden_dim=64, # Hidden dimension for preprocessing networks | |
| preprocessing_num_layers=2, # Number of layers in preprocessing networks | |
| learn_inverse_preprocessing=True, # Learn inverse transformation | |
| flow_coupling_layers=4, # Number of coupling layers for flows | |
| ) | |
| ``` | |
| ### ποΈ Available Activation Functions | |
| **Standard Activations:** | |
| - `relu`, `leaky_relu`, `relu6`, `elu`, `prelu` | |
| - `tanh`, `sigmoid`, `hardsigmoid`, `hardtanh` | |
| - `gelu`, `swish`, `silu`, `hardswish` | |
| - `mish`, `softplus`, `softsign`, `tanhshrink`, `threshold` | |
| ### π Available Loss Functions | |
| **Regression Losses:** | |
| - `mse` - Mean Squared Error | |
| - `l1` - L1/MAE Loss | |
| - `huber` - Huber Loss | |
| - `smooth_l1` - Smooth L1 Loss | |
| **Classification/Probability Losses:** | |
| - `bce` - Binary Cross Entropy | |
| - `kl_div` - KL Divergence | |
| - `focal` - Focal Loss | |
| **Similarity Losses:** | |
| - `cosine` - Cosine Similarity Loss | |
| - `ssim` - Structural Similarity Loss | |
| - `perceptual` - Perceptual Loss | |
| **Segmentation Losses:** | |
| - `dice` - Dice Loss | |
| - `tversky` - Tversky Loss | |
| ### ποΈ Available Autoencoder Types | |
| **Classic Autoencoder (`classic`)** | |
| - Standard encoder-decoder architecture | |
| - Direct reconstruction loss minimization | |
| **Variational Autoencoder (`variational`)** | |
| - Probabilistic latent space with mean and variance | |
| - KL divergence regularization | |
| - Reparameterization trick for sampling | |
| **Beta-VAE (`beta_vae`)** | |
| - Variational autoencoder with adjustable Ξ² parameter | |
| - Better disentanglement of latent factors | |
| **Denoising Autoencoder (`denoising`)** | |
| - Adds noise to input during training | |
| - Learns robust representations | |
| - Configurable noise factor | |
| **Sparse Autoencoder (`sparse`)** | |
| - Encourages sparse latent representations | |
| - L1 regularization on latent activations | |
| - Useful for feature selection | |
| **Contractive Autoencoder (`contractive`)** | |
| - Penalizes large gradients of latent w.r.t. input | |
| - Learns smooth manifold representations | |
| - Robust to small input perturbations | |
| **Recurrent Autoencoder (`recurrent`)** | |
| - LSTM/GRU/RNN encoder-decoder architecture | |
| - Bidirectional encoding for better sequence representations | |
| - Variable length sequence support with padding | |
| - Teacher forcing during training for stable learning | |
| - Sequence-to-sequence reconstruction | |
| ``` | |
| ## π Model Outputs | |
| ### AutoencoderOutput | |
| ```python | |
| @dataclass | |
| class AutoencoderOutput(ModelOutput): | |
| last_hidden_state: torch.FloatTensor = None # Latent representation | |
| reconstructed: torch.FloatTensor = None # Reconstructed input | |
| hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states | |
| attentions: Tuple[torch.FloatTensor] = None # Not used | |
| ``` | |
| ### AutoencoderForReconstructionOutput | |
| ```python | |
| @dataclass | |
| class AutoencoderForReconstructionOutput(ModelOutput): | |
| loss: torch.FloatTensor = None # Reconstruction loss | |
| reconstructed: torch.FloatTensor = None # Reconstructed input | |
| last_hidden_state: torch.FloatTensor = None # Latent representation | |
| hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states | |
| ``` | |
| ## π¬ Advanced Usage | |
| ### Custom Loss Functions | |
| You can easily extend the model with custom loss functions: | |
| ```python | |
| class CustomAutoencoder(AutoencoderForReconstruction): | |
| def _compute_reconstruction_loss(self, reconstructed, target): | |
| # Custom loss implementation | |
| return your_custom_loss(reconstructed, target) | |
| ``` | |
| ### Recurrent Autoencoder for Sequences | |
| Perfect for time series, text, and sequential data: | |
| ```python | |
| config = AutoencoderConfig( | |
| input_dim=50, # Feature dimension per timestep | |
| latent_dim=32, # Compressed representation size | |
| autoencoder_type="recurrent", | |
| rnn_type="lstm", # or "gru", "rnn" | |
| num_layers=2, # Number of RNN layers | |
| bidirectional=True, # Bidirectional encoding | |
| teacher_forcing_ratio=0.7, # Teacher forcing during training | |
| sequence_length=None # Variable length sequences | |
| ) | |
| # Usage with sequence data | |
| model = AutoencoderForReconstruction(config) | |
| sequence_data = torch.randn(batch_size, seq_len, input_dim) | |
| outputs = model(input_values=sequence_data) | |
| ``` | |
| ### Learnable Preprocessing | |
| Deep learning-based data normalization that adapts to your data: | |
| ```python | |
| # Neural Scaler - Learnable alternative to StandardScaler | |
| config = AutoencoderConfig( | |
| input_dim=20, | |
| latent_dim=10, | |
| use_learnable_preprocessing=True, | |
| preprocessing_type="neural_scaler", | |
| preprocessing_hidden_dim=64 | |
| ) | |
| # Normalizing Flow - Invertible transformations | |
| config = AutoencoderConfig( | |
| input_dim=20, | |
| latent_dim=10, | |
| use_learnable_preprocessing=True, | |
| preprocessing_type="normalizing_flow", | |
| flow_coupling_layers=4 | |
| ) | |
| # Works with all autoencoder types and sequence data | |
| model = AutoencoderForReconstruction(config) | |
| outputs = model(input_values=data) | |
| print(f"Preprocessing loss: {outputs.preprocessing_loss}") | |
| ``` | |
| ### Variational Autoencoder Extension | |
| The configuration supports variational autoencoders: | |
| ```python | |
| config = AutoencoderConfig( | |
| autoencoder_type="variational", | |
| beta=0.5, # Ξ²-VAE parameter | |
| # ... other parameters | |
| ) | |
| ``` | |
| ### Integration with Datasets Library | |
| ```python | |
| from datasets import Dataset | |
| # Convert your data to HF Dataset | |
| dataset = Dataset.from_dict({ | |
| "input_values": your_data_list | |
| }) | |
| # Use with Trainer | |
| trainer = Trainer( | |
| model=model, | |
| train_dataset=dataset, | |
| # ... other arguments | |
| ) | |
| ``` | |
| ## π§ͺ Testing | |
| This repository has been trimmed to essential files. Example scripts and test files were removed by request. You can create your own quick checks using the Quick Start snippet above. | |
| ## π Project Structure | |
| ``` | |
| autoencoder/ | |
| βββ __init__.py # Package initialization | |
| βββ configuration_autoencoder.py # Configuration class | |
| βββ modeling_autoencoder.py # Model implementations | |
| βββ register_autoencoder.py # AutoModel registration | |
| βββ example_usage.py # Usage examples | |
| βββ test_save_load.py # Test suite | |
| βββ requirements.txt # Dependencies | |
| βββ README.md # This file | |
| ``` | |
| ## π€ Contributing | |
| This implementation follows Hugging Face conventions and can be easily extended: | |
| 1. **Adding new architectures**: Extend `AutoencoderModel` or create new model classes | |
| 2. **Custom configurations**: Add parameters to `AutoencoderConfig` | |
| 3. **Task-specific heads**: Create new classes like `AutoencoderForReconstruction` | |
| 4. **Integration**: Register new models with the AutoModel framework | |
| ## π References | |
| - [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers) | |
| - [Custom Models Guide](https://huggingface.co/docs/transformers/custom_models) | |
| - [AutoModel Documentation](https://huggingface.co/docs/transformers/model_doc/auto) | |
| ## π― Use Cases | |
| This autoencoder implementation is perfect for: | |
| - **Dimensionality Reduction**: Compress high-dimensional data to lower dimensions | |
| - **Anomaly Detection**: Identify outliers based on reconstruction error | |
| - **Data Denoising**: Remove noise from corrupted data | |
| - **Feature Learning**: Learn meaningful representations for downstream tasks | |
| - **Data Generation**: Generate new samples similar to training data | |
| - **Pretraining**: Initialize encoders for other tasks | |
| ## π Model Comparison | |
| | Feature | Standard PyTorch | This Implementation | | |
| |---------|------------------|-------------------| | |
| | HF Integration | β | β | | |
| | AutoModel Support | β | β | | |
| | Trainer Compatible | β | β | | |
| | Hub Integration | β | β | | |
| | Config Management | Manual | β Automatic | | |
| | Serialization | Manual | β Built-in | | |
| | Checkpointing | Manual | β Built-in | | |
| ## π Performance Tips | |
| 1. **Batch Size**: Use larger batch sizes for better GPU utilization | |
| 2. **Learning Rate**: Start with 1e-3 and adjust based on convergence | |
| 3. **Architecture**: Gradually decrease hidden dimensions for better compression | |
| 4. **Regularization**: Use dropout and batch normalization for better generalization | |
| 5. **Loss Function**: Choose appropriate loss based on your data type | |
| ## π License | |
| This implementation is provided as an example and follows the same license terms as Hugging Face Transformers. | |