adibvafa commited on
Commit
5511f5c
·
verified ·
1 Parent(s): 7b329f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -15
README.md CHANGED
@@ -1,34 +1,93 @@
1
  ---
2
  library_name: transformers
3
  tags:
 
4
  - Computational Biology
 
5
  - Bioinformatics
6
- - CodonTransformer
7
  license: apache-2.0
 
8
  ---
9
 
 
10
 
11
- # CodonTransformer
12
- CodonTransformer is a state-of-the-art model designed to predict optimized DNA sequences for given protein sequences and organisms. It achieves state-of-the-art performance compared to existing models in the field.
13
- More information will be provided soon.
14
 
15
- https://github.com/Adibvafa/CodonTransformer
16
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- ## How to Get Started with the Model
19
 
20
- Use the code below to get started with CodonTransformer:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ```python
23
- # Load model and tokenizer
24
- from transformers import AutoTokenizer, BigBirdForMaskedLM
 
 
25
 
26
- tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
27
- model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer")
 
 
 
 
 
 
 
 
 
 
 
 
28
  ```
29
 
30
- You can use CodonTransformer at Google Colab:
31
- https://adibvafa.github.io/CodonTransformer/GoogleColab
32
 
33
- You can view the same notebook in the GitHub repo:
34
- https://github.com/Adibvafa/CodonTransformer/blob/main/CodonTransformerDemo.ipynb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  tags:
4
+ - CodonTransformer
5
  - Computational Biology
6
+ - Machine Learning
7
  - Bioinformatics
8
+ - Genetics
9
  license: apache-2.0
10
+ pipeline_tag: token-classification
11
  ---
12
 
13
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c9888b3137cc529d0761c4/GqKutRwiGGif69Gjd8Df3.png)
14
 
15
+ **CodonTransformer** is the ultimate tool for codon optimization, transforming protein sequences into optimized DNA sequences specific for your target organisms. Whether you are a researcher or a practitioner in genetic engineering, CodonTransformer provides a comprehensive suite of features to facilitate your work. By leveraging the Transformer architecture and a user-friendly Jupyter notebook, it reduces the complexity of codon optimization, saving you time and effort.
 
 
16
 
 
17
 
18
+ ## Use Case
19
+ **For an interactive demo, check out our [Google Colab Notebook.](https://adibvafa.github.io/CodonTransformer/GoogleColab)**
20
+ <br></br>
21
+ After installing CodonTransformer, you can use:
22
+ ```python
23
+ import torch
24
+ from transformers import AutoTokenizer, BigBirdForMaskedLM
25
+ from CodonTransformer.CodonPrediction import predict_dna_sequence
26
+ from CodonTransformer.CodonUtils import ORGANISM2ID
27
+ from CodonTransformer.CodonJupyter import format_model_output
28
+ DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
29
 
 
30
 
31
+ # Load model and tokenizer
32
+ tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer")
33
+ model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(DEVICE)
34
+
35
+
36
+ # Set your input data
37
+ protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG"
38
+ organism = "Escherichia coli general"
39
+
40
+
41
+ # Predict with CodonTransformer
42
+ output = predict_dna_sequence(
43
+ protein=protein,
44
+ organism=organism,
45
+ device=DEVICE,
46
+ tokenizer_object=tokenizer,
47
+ model_object=model,
48
+ attention_type="original_full",
49
+ )
50
+ print(format_model_output(output))
51
+ ```
52
+ The output is:
53
+ <br>
54
+
55
 
56
  ```python
57
+ -----------------------------
58
+ | Organism |
59
+ -----------------------------
60
+ Escherichia coli general
61
 
62
+ -----------------------------
63
+ | Input Protein |
64
+ -----------------------------
65
+ MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
66
+
67
+ -----------------------------
68
+ | Processed Input |
69
+ -----------------------------
70
+ M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK
71
+
72
+ -----------------------------
73
+ | Predicted DNA |
74
+ -----------------------------
75
+ ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
76
  ```
77
 
 
 
78
 
79
+ ## Additional Resources
80
+ - **Project Website** <br>
81
+ https://adibvafa.github.io/CodonTransformer/
82
+
83
+ - **GitHub Repository** <br>
84
+ https://github.com/Adibvafa/CodonTransformer
85
+
86
+ - **Google Colab Demo** <br>
87
+ https://adibvafa.github.io/CodonTransformer/GoogleColab
88
+
89
+ - **PyPI Package** <br>
90
+ https://pypi.org/project/CodonTransformer/
91
+
92
+ - **Paper** <br>
93
+ TBD