Spaces:
Running
Running
# **ESM-Scan** | |
Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm) | |
<details> | |
<summary> <b> USAGE INSTRUCTIONS </b> </summary> | |
## Setup | |
No setup is required. Simply fill in the input boxes with the necessary data and click the **Run** button. | |
You can find a list of examples at the bottom of the page; clicking on them will autofill the fields for you. | |
If the server remains idle for a period, it will enter standby mode. Running a calculation will wake the tool from standby, but note that the first run may take longer due to startup and model loading. | |
## Input | |
**Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box. | |
Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualised. | |
**Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input: | |
- **Single Substitution**: Input one or more substitutions (e.g. `R218K R218W`) to score specific changes. | |
- **Residue Position**: Provide residue positions to evaluate all possible substitutions at those sites. | |
- **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length. | |
- **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed. | |
**Model Selection**: Choose an ESM model for calculations from those available on Hugging Face Model Hub. | |
The model `esm2_t33_650M_UR50D` offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574). | |
**Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference. | |
While this method is slower, it enhances accuracy. If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy. | |
**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns—especially with longer sequences or during peak server usage times. | |
For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes. | |
Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritise reducing model size when optimizing for runtime. | |
The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length. | |
**Concurrent Substitutions**: To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. Accuracy is not guaranteed as this use case is yet untested. | |
## Output | |
Results are displayed in a color-coded table, except for deep mutational scans, which produce a heatmap. | |
In the table: | |
- Beneficial substitutions are highlighted in green with positive values. | |
- Detrimental substitutions appear in red with negative values. | |
As a rule of thumb, score differences of *4* or more are considered significant. For instance: | |
- A substitution scoring *-6* is likely detrimental to protein functionality. | |
- A score of *+2* is generally regarded as neutral. | |
The **Download raw data** button lets you download the output in CSV format. | |
**If you use this tool in your research, please cite**: | |
- Totaro, M.G. (2023). “ESM-Scan - a tool to guide amino acid substitutions.” bioRxiv. [doi.org/10.1101/2023.12.12.571273](https://doi.org/10.1101/2023.12.12.571273) | |
- Meier, J. (2021). “Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv (Cold Spring Harbor Laboratory), July. [doi.org/10.1101/2021.07.09.450648](https://doi.org/10.1101/2021.07.09.450648) | |
</details> | |