|
--- |
|
license: gpl-3.0 |
|
language: |
|
- en |
|
base_model: |
|
- facebook/esm2_t6_8M_UR50D |
|
tags: |
|
- Cancer |
|
- Transcriptomics |
|
- biology |
|
--- |
|
# Fine-tuned ESM2 Protein Classifier (pdac_pred_llm) |
|
|
|
This repository contains a fine-tuned ESM2 model for protein sequence classification, specifically the model uploaded to `shubhamc-iiitd/pdac_pred_llm`. The model is trained to predict binary labels based on protein sequences. |
|
|
|
## Model Description |
|
|
|
- **Base Model:** ESM2-t33-650M-UR50D (Fine-tuned) |
|
- **Fine-tuning Task:** Binary protein classification. |
|
- **Architecture:** The model consists of the ESM2 backbone with a linear classification head. |
|
- **Input:** Protein amino acid sequences. |
|
- **Output:** Binary classification labels (0 or 1). |
|
|
|
## Repository Contents |
|
|
|
- `pytorch_model.bin`: The trained model weights. |
|
- `alphabet.bin`: The ESM2 alphabet (used as a tokenizer). |
|
- `config.json`: Configuration file for the model. |
|
- `README.md`: This file. |
|
|
|
## Usage |
|
|
|
### Installation |
|
|
|
1. Install the required libraries: |
|
|
|
```bash |
|
pip install torch esm biopython huggingface_hub |
|
``` |
|
|
|
### Loading the Model from Hugging Face |
|
|
|
```python |
|
import torch |
|
import torch.nn as nn |
|
import esm |
|
from huggingface_hub import hf_hub_download |
|
import json |
|
|
|
# Define the model architecture (same as during training) |
|
class ProteinClassifier(nn.Module): |
|
def __init__(self, esm_model, embedding_dim, num_classes): |
|
super(ProteinClassifier, self).__init__() |
|
self.esm_model = esm_model |
|
self.fc = nn.Linear(embedding_dim, num_classes) |
|
|
|
def forward(self, tokens): |
|
with torch.no_grad(): |
|
results = self.esm_model(tokens, repr_layers=[33]) |
|
embeddings = results["representations"][33].mean(1) |
|
output = self.fc(embeddings) |
|
return output |
|
|
|
# Download the model files from Hugging Face |
|
repo_id = "shubhamc-iiitd/pdac_pred_llm" |
|
model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin") |
|
alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin") |
|
config_path = hf_hub_download(repo_id=repo_id, filename="config.json") |
|
|
|
# Load the ESM2 model (used as backbone) |
|
model, alphabet = esm.pretrained.esm2_t33_650M_UR50D() |
|
|
|
# Load the configuration |
|
with open(config_path, 'r') as f: |
|
config = json.load(f) |
|
|
|
# Initialize the classifier |
|
classifier = ProteinClassifier(model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes']) |
|
|
|
# Load the model weights |
|
classifier.load_state_dict(torch.load(model_weights_path)) |
|
classifier.eval() |
|
|
|
# Load the alphabet |
|
alphabet = torch.load(alphabet_path) |
|
batch_converter = alphabet.get_batch_converter() |
|
|
|
#Move models to device if needed |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model = model.to(device) |
|
``` |