raghavagps-group
/

pdac_pred_llm

Transcriptomics

Model card Files Files and versions Community

pdac_pred_llm / README.md

shubhamc-iiitd's picture

Update README.md

92cdea2 verified 4 months ago

|

history blame contribute delete

2.77 kB

	---
	license: gpl-3.0
	language:
	- en
	base_model:
	- facebook/esm2_t6_8M_UR50D
	tags:
	- Cancer
	- Transcriptomics
	- biology
	---
	# Fine-tuned ESM2 Protein Classifier (pdac_pred_llm)

	This repository contains a fine-tuned ESM2 model for protein sequence classification, specifically the model uploaded to `shubhamc-iiitd/pdac_pred_llm`. The model is trained to predict binary labels based on protein sequences.

	## Model Description

	- Base Model: ESM2-t33-650M-UR50D (Fine-tuned)
	- Fine-tuning Task: Binary protein classification.
	- Architecture: The model consists of the ESM2 backbone with a linear classification head.
	- Input: Protein amino acid sequences.
	- Output: Binary classification labels (0 or 1).

	## Repository Contents

	- `pytorch_model.bin`: The trained model weights.
	- `alphabet.bin`: The ESM2 alphabet (used as a tokenizer).
	- `config.json`: Configuration file for the model.
	- `README.md`: This file.

	## Usage

	### Installation

	1. Install the required libraries:

	```bash
	pip install torch esm biopython huggingface_hub
	```

	### Loading the Model from Hugging Face

	```python
	import torch
	import torch.nn as nn
	import esm
	from huggingface_hub import hf_hub_download
	import json

	# Define the model architecture (same as during training)
	class ProteinClassifier(nn.Module):
	def __init__(self, esm_model, embedding_dim, num_classes):
	super(ProteinClassifier, self).__init__()
	self.esm_model = esm_model
	self.fc = nn.Linear(embedding_dim, num_classes)

	def forward(self, tokens):
	with torch.no_grad():
	results = self.esm_model(tokens, repr_layers=[33])
	embeddings = results["representations"][33].mean(1)
	output = self.fc(embeddings)
	return output

	# Download the model files from Hugging Face
	repo_id = "shubhamc-iiitd/pdac_pred_llm"
	model_weights_path = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
	alphabet_path = hf_hub_download(repo_id=repo_id, filename="alphabet.bin")
	config_path = hf_hub_download(repo_id=repo_id, filename="config.json")

	# Load the ESM2 model (used as backbone)
	model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()

	# Load the configuration
	with open(config_path, 'r') as f:
	config = json.load(f)

	# Initialize the classifier
	classifier = ProteinClassifier(model, embedding_dim=config['embedding_dim'], num_classes=config['num_classes'])

	# Load the model weights
	classifier.load_state_dict(torch.load(model_weights_path))
	classifier.eval()

	# Load the alphabet
	alphabet = torch.load(alphabet_path)
	batch_converter = alphabet.get_batch_converter()

	#Move models to device if needed
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)
	```