--- library_name: transformers tags: [] --- ESM++ is a faithful implementation of [ESMC](https://www.evolutionaryscale.ai/blog/esm-cambrian) that allows for batching and standard Huggingface compatibility without requiring the ESM package. Use with transformers ```python from transformers import AutoModelForMaskedLM #AutoModel also works model = AutoModelForMaskedLM.from_pretrained('Synthyra/ESMplusplus_small', trust_remote_code=True) tokenizer = model.tokenizer sequences = ['MPRTEIN', 'MSEQWENCE'] tokenized = tokenizer(sequences, return_tensors='pt') # tokenized['labels'] = tokenized['input_ids'].clone() # correctly mask input_ids and set unmasked instances of labels to -100 for MLM training output = model(**tokenized) # get all hidden states with output_hidden_states=True print(output.logits.shape) # language modeling logits, (batch_size, seq_len, vocab_size), (2, 11, 64) print(output.last_hidden_state) # last hidden state of the model, (batch_size, seq_len, hidden_size), (2, 11, 960) print(output.loss) # language modeling loss if you passed labels #print(output.hidden_states) # all hidden states if you passed output_hidden_states=True (in tuple) ``` ESM++ also supports sequence and token level classification tasks like ESM2. Simply pass the number of labels during initialization. ```python from transformers import AutoModelForSequenceClassification, AutoModelForTokenClassification model = AutoModelForSequenceClassification.from_pretrained('Synthyra/ESMplusplus_small', num_labels=2, trust_remote_code=True) logits = model(**tokenized) print(logits.shape) # (batch_size, num_labels), (2, 2) ``` Measured difference between this implementation and version loaded with ESM package (1000 random sequences) Average MSE: 2.4649680199217982e-09