shubhamc-iiitd
/

pdac_pred_llm

Transcriptomics

Model card Files Files and versions Community

shubhamc-iiitd commited on Mar 14

Commit

b6faf8d

·

verified ·

1 Parent(s): ebfd7c9

Update README.md

Files changed (1) hide show

README.md +80 -0

README.md CHANGED Viewed

	@@ -0,0 +1,80 @@

+# Fine-tuned ESM2 Protein Classifier (pdac_pred_llm)
+This repository contains a fine-tuned ESM2 model for protein sequence classification, specifically the model uploaded to `shubhamc-iiitd/pdac_pred_llm`. The model is trained to predict binary labels based on protein sequences.
+## Model Description
+-   **Base Model:** ESM2-t33-650M-UR50D (Fine-tuned)
+-   **Fine-tuning Task:** Binary protein classification.
+-   **Architecture:** The model consists of the ESM2 backbone with a linear classification head.
+-   **Input:** Protein amino acid sequences.
+-   **Output:** Binary classification labels (0 or 1).
+## Repository Contents
+-   `pytorch_model.bin`: The trained model weights.
+-   `alphabet.bin`: The ESM2 alphabet (used as a tokenizer).
+-   `config.json`: Configuration file for the model.
+-   `README.md`: This file.
+## Usage
+### Installation
+1.  Install the required libraries:
+    ```bash
+    pip install torch esm biopython huggingface_hub
+    ```
+### Loading the Model from Hugging Face
+```python
+import torch
+import torch.nn as nn
+import esm
+from huggingface_hub import hf_hub_download
+import json
+# Define the model architecture (same as during training)
+class ProteinClassifier(nn.Module):
+    def __init__(self, esm_model, embedding_dim, num_classes):
+        super(ProteinClassifier, self).__init__()
+        self.esm_model = esm_model
+        self.fc = nn.Linear(embedding_dim, num_classes)
+    def forward(self, tokens):
+        with torch.no_grad():
+            results = self.esm_model(tokens, repr_layers=[33])
+        embeddings = results["representations"][33].mean(1)
+        output = self.fc(embeddings)
+        return output
+# Download the model files from Hugging Face
+repo_id = "shubhamc-iiitd/pdac_pred_llm"
+model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
+alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin")
+config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
+# Load the ESM2 model (used as backbone)
+model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
+# Load the configuration
+with open(config_path, 'r') as f:
+    config = json.load(f)
+# Initialize the classifier
+classifier = ProteinClassifier(model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes'])
+# Load the model weights
+classifier.load_state_dict(torch.load(model_weights_path))
+classifier.eval()
+# Load the alphabet
+alphabet = torch.load(alphabet_path)
+batch_converter = alphabet.get_batch_converter()
+#Move models to device if needed
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = model.to(device)
+```