shubhamc-iiitd commited on
Commit
b6faf8d
·
verified ·
1 Parent(s): ebfd7c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fine-tuned ESM2 Protein Classifier (pdac_pred_llm)
2
+
3
+ This repository contains a fine-tuned ESM2 model for protein sequence classification, specifically the model uploaded to `shubhamc-iiitd/pdac_pred_llm`. The model is trained to predict binary labels based on protein sequences.
4
+
5
+ ## Model Description
6
+
7
+ - **Base Model:** ESM2-t33-650M-UR50D (Fine-tuned)
8
+ - **Fine-tuning Task:** Binary protein classification.
9
+ - **Architecture:** The model consists of the ESM2 backbone with a linear classification head.
10
+ - **Input:** Protein amino acid sequences.
11
+ - **Output:** Binary classification labels (0 or 1).
12
+
13
+ ## Repository Contents
14
+
15
+ - `pytorch_model.bin`: The trained model weights.
16
+ - `alphabet.bin`: The ESM2 alphabet (used as a tokenizer).
17
+ - `config.json`: Configuration file for the model.
18
+ - `README.md`: This file.
19
+
20
+ ## Usage
21
+
22
+ ### Installation
23
+
24
+ 1. Install the required libraries:
25
+
26
+ ```bash
27
+ pip install torch esm biopython huggingface_hub
28
+ ```
29
+
30
+ ### Loading the Model from Hugging Face
31
+
32
+ ```python
33
+ import torch
34
+ import torch.nn as nn
35
+ import esm
36
+ from huggingface_hub import hf_hub_download
37
+ import json
38
+
39
+ # Define the model architecture (same as during training)
40
+ class ProteinClassifier(nn.Module):
41
+ def __init__(self, esm_model, embedding_dim, num_classes):
42
+ super(ProteinClassifier, self).__init__()
43
+ self.esm_model = esm_model
44
+ self.fc = nn.Linear(embedding_dim, num_classes)
45
+
46
+ def forward(self, tokens):
47
+ with torch.no_grad():
48
+ results = self.esm_model(tokens, repr_layers=[33])
49
+ embeddings = results["representations"][33].mean(1)
50
+ output = self.fc(embeddings)
51
+ return output
52
+
53
+ # Download the model files from Hugging Face
54
+ repo_id = "shubhamc-iiitd/pdac_pred_llm"
55
+ model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
56
+ alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin")
57
+ config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
58
+
59
+ # Load the ESM2 model (used as backbone)
60
+ model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
61
+
62
+ # Load the configuration
63
+ with open(config_path, 'r') as f:
64
+ config = json.load(f)
65
+
66
+ # Initialize the classifier
67
+ classifier = ProteinClassifier(model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes'])
68
+
69
+ # Load the model weights
70
+ classifier.load_state_dict(torch.load(model_weights_path))
71
+ classifier.eval()
72
+
73
+ # Load the alphabet
74
+ alphabet = torch.load(alphabet_path)
75
+ batch_converter = alphabet.get_batch_converter()
76
+
77
+ #Move models to device if needed
78
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
79
+ model = model.to(device)
80
+ ```