Enzo Reis de Oliveira commited on
Commit
64428bf
·
1 Parent(s): a3a7416

Adding smi-ted things

Browse files
.gitattributes CHANGED
@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ smi-ted/finetune/moleculenet/qm9/qm9.csv filter=lfs diff=lfs merge=lfs -text
37
+ smi-ted/finetune/moleculenet/qm9/train.csv filter=lfs diff=lfs merge=lfs -text
38
+ smi-ted/images/smi-ted.png filter=lfs diff=lfs merge=lfs -text
39
+ smi-ted/paper/smi_ted_preprint.pdf filter=lfs diff=lfs merge=lfs -text
40
+ smi-ted.png filter=lfs diff=lfs merge=lfs -text
41
+ *.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model weights
2
+ inference/smi_ted_light/smi-ted-Light_40.pt
3
+
4
+ # pyenv
5
+ .python-version
6
+
7
+ # Environments
8
+ .env
9
+ ./venv
10
+ env/
11
+ venv/
12
+ ENV/
13
+ env.bak/
14
+ venv.bak/
15
+
16
+ # editor files
17
+ .vscode/
18
+ .DS_Store
README.md CHANGED
@@ -1,12 +1,188 @@
1
  ---
2
- title: SMI TED Demo1
3
- emoji: 💻
4
- colorFrom: blue
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 5.34.2
8
- app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ metrics:
4
+ - accuracy
5
+ pipeline_tag: feature-extraction
6
+ tags:
7
+ - chemistry
8
+ - foundation models
9
+ - AI4Science
10
+ - materials
11
+ - molecules
12
+ - safetensors
13
+ - pytorch
14
+ - transformer
15
+ - diffusers
16
+ library_name: transformers
17
  ---
18
+ # Introduction to IBM's Foundation Models for Materials
19
 
20
+ Welcome to IBM's series of large foundation models for sustainable materials. Our models span a variety of representations and modalities, including SMILES, SELFIES, 3D atom positions, 3D density grids, molecular graphs, and other formats. These models are designed to support and advance research in materials science and chemistry.
21
+
22
+ GitHub: [GitHub Link](https://github.com/IBM/materials/tree/main)
23
+
24
+ Paper: [arXiv:2407.20267](https://arxiv.org/abs/2407.20267)
25
+
26
+ # SMILES-based Transformer Encoder-Decoder (SMI-TED)
27
+
28
+ ![ted-smi](smi-ted.png)
29
+
30
+ This repository provides PyTorch source code associated with our publication, "A Large Encoder-Decoder Family of Foundation Models for Chemical Language".
31
+
32
+ Paper: [Arxiv Link](https://github.com/IBM/materials/blob/main/smi-ted/paper/smi-ted_preprint.pdf)
33
+
34
+ We provide the model weights in two formats:
35
+
36
+ - PyTorch (`.pt`): [smi-ted-Light_40.pt](smi-ted-Light_40.pt)
37
+ - safetensors (`.safetensors`): [model_weights.safetensors](model_weights.safetensors)
38
+
39
+ For more information contact: [email protected] or [email protected].
40
+
41
+ ## Introduction
42
+
43
+ We present a large encoder-decoder chemical foundation model, SMILES-based Transformer Encoder-Decoder (SMI-TED), pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants (289M and 8X289M). Our experiments across multiple benchmark datasets demonstrate state-of-the-art performance for various tasks. For more information contact: [email protected] or [email protected].
44
+
45
+ ## Table of Contents
46
+
47
+ 1. [Getting Started](#getting-started)
48
+ 1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs)
49
+ 2. [Replicating Conda Environment](#replicating-conda-environment)
50
+ 2. [Pretraining](#pretraining)
51
+ 3. [Finetuning](#finetuning)
52
+ 4. [Feature Extraction](#feature-extraction)
53
+ 5. [Citations](#citations)
54
+
55
+ ## Getting Started
56
+
57
+ **This code and environment have been tested on Nvidia V100s and Nvidia A100s**
58
+
59
+ ### Pretrained Models and Training Logs
60
+
61
+ We provide checkpoints of the SMI-TED model pre-trained on a dataset of ~91M molecules curated from PubChem. The pre-trained model shows competitive performance on classification and regression benchmarks from MoleculeNet.
62
+
63
+ Add the SMI-TED `pre-trained weights.pt` to the `inference/` or `finetune/` directory according to your needs. The directory structure should look like the following:
64
+
65
+ ```
66
+ inference/
67
+ ├── smi_ted_light
68
+ │ ├── smi_ted_light.pt
69
+ │ ├── bert_vocab_curated.txt
70
+ │ └── load.py
71
+ ```
72
+ and/or:
73
+
74
+ ```
75
+ finetune/
76
+ ├── smi_ted_light
77
+ │ ├── smi_ted_light.pt
78
+ │ ├── bert_vocab_curated.txt
79
+ │ └── load.py
80
+ ```
81
+
82
+ ### Replicating Conda Environment
83
+
84
+ Follow these steps to replicate our Conda environment and install the necessary libraries:
85
+
86
+ #### Create and Activate Conda Environment
87
+
88
+ ```
89
+ conda create --name smi-ted-env python=3.10
90
+ conda activate smi-ted-env
91
+ ```
92
+
93
+ #### Install Packages with Conda
94
+
95
+ ```
96
+ conda install pytorch=2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
97
+ ```
98
+
99
+ #### Install Packages with Pip
100
+
101
+ ```
102
+ pip install -r requirements.txt
103
+ pip install pytorch-fast-transformers
104
+ ```
105
+
106
+ ## Pretraining
107
+
108
+ For pretraining, we use two strategies: the masked language model method to train the encoder part and an encoder-decoder strategy to refine SMILES reconstruction and improve the generated latent space.
109
+
110
+ SMI-TED is pre-trained on canonicalized and curated 91M SMILES from PubChem with the following constraints:
111
+
112
+ - Compounds are filtered to a maximum length of 202 tokens during preprocessing.
113
+ - A 95/5/0 split is used for encoder training, with 5% of the data for decoder pretraining.
114
+ - A 100/0/0 split is also used to train the encoder and decoder directly, enhancing model performance.
115
+
116
+ The pretraining code provides examples of data processing and model training on a smaller dataset, requiring 8 A100 GPUs.
117
+
118
+ To pre-train the two variants of the SMI-TED model, run:
119
+
120
+ ```
121
+ bash training/run_model_light_training.sh
122
+ ```
123
+ or
124
+ ```
125
+ bash training/run_model_large_training.sh
126
+ ```
127
+
128
+ Use `train_model_D.py` to train only the decoder or `train_model_ED.py` to train both the encoder and decoder.
129
+
130
+ ## Finetuning
131
+
132
+ The finetuning datasets and environment can be found in the [finetune](https://github.com/IBM/materials/tree/main/smi-ted/finetune) directory. After setting up the environment, you can run a finetuning task with:
133
+
134
+ ```
135
+ bash finetune/smi_ted_light/esol/run_finetune_esol.sh
136
+ ```
137
+
138
+ Finetuning training/checkpointing resources will be available in directories named `checkpoint_<measure_name>`.
139
+
140
+ ## Feature Extraction
141
+
142
+ The example notebook [smi_ted_encoder_decoder_example.ipynb](https://github.com/IBM/materials/blob/main/smi-ted/notebooks/smi_ted_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks. It also includes examples of classification and regression tasks.
143
+
144
+ To load smi-ted, you can simply use:
145
+
146
+ ```python
147
+ model = load_smi_ted(
148
+ folder='../inference/smi_ted_light',
149
+ ckpt_filename='smi_ted_light.pt'
150
+ )
151
+ ```
152
+ or
153
+
154
+ ```python
155
+ with open('model_weights.bin', 'rb') as f:
156
+ state_dict = torch.load(f)
157
+ model.load_state_dict(state_dict)
158
+ )
159
+ ```
160
+
161
+ To encode SMILES into embeddings, you can use:
162
+
163
+ ```python
164
+ with torch.no_grad():
165
+ encoded_embeddings = model.encode(df['SMILES'], return_torch=True)
166
+ ```
167
+ For decoder, you can use the function, so you can return from embeddings to SMILES strings:
168
+
169
+ ```python
170
+ with torch.no_grad():
171
+ decoded_smiles = model.decode(encoded_embeddings)
172
+ ```
173
+
174
+
175
+ ## Citations
176
+
177
+ ```
178
+ @misc{soares2024largeencoderdecoderfamilyfoundation,
179
+ title={A Large Encoder-Decoder Family of Foundation Models For Chemical Language},
180
+ author={Eduardo Soares and Victor Shirasuna and Emilio Vital Brazil and Renato Cerqueira and Dmitry Zubarev and Kristin Schmidt},
181
+ year={2024},
182
+ eprint={2407.20267},
183
+ archivePrefix={arXiv},
184
+ primaryClass={cs.LG},
185
+ url={https://arxiv.org/abs/2407.20267},
186
+ }
187
+
188
+ ```
app.py ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os, sys
2
+
3
+ BASE_DIR = os.path.dirname(__file__)
4
+ INFERENCE_DIR = os.path.join(BASE_DIR, "smi-ted", "inference")
5
+ sys.path.append(INFERENCE_DIR)
6
+
7
+ import gradio as gr
8
+ from smi_ted_light.load import load_smi_ted
9
+
10
+
11
+ # 2) Caminho onde estão pesos e vocabulário
12
+ MODEL_DIR = os.path.join("smi-ted", "inference", "smi_ted_light")
13
+
14
+ # 3) Carrega o modelo SMI‑TED (Light)
15
+ # Se você renomeou o .pt ou o vocab, ajuste aqui.
16
+ model = load_smi_ted(
17
+ folder=MODEL_DIR,
18
+ ckpt_filename="smi-ted-Light_40.pt",
19
+ vocab_filename="bert_vocab_curated.txt",
20
+ )
21
+
22
+ # 4) Função utilizada pela interface
23
+ def gerar_embedding(smiles: str):
24
+ """
25
+ Recebe uma string SMILES e devolve o embedding (lista de 768 floats).
26
+ Em caso de erro, devolve um dicionário com a mensagem.
27
+ """
28
+ smiles = smiles.strip()
29
+ if not smiles:
30
+ return {"erro": "digite uma sequência SMILES primeiro"}
31
+
32
+ try:
33
+ # model.encode devolve tensor shape (1, 768) quando return_torch=True
34
+ vetor_torch = model.encode(smiles, return_torch=True)[0]
35
+ return vetor_torch.tolist() # JSON‑serializável
36
+ except Exception as e:
37
+ return {"erro": str(e)}
38
+
39
+
40
+ # 5) Define a interface Gradio
41
+ demo = gr.Interface(
42
+ fn=gerar_embedding,
43
+ inputs=gr.Textbox(label="SMILES", placeholder="Ex.: CCO"),
44
+ outputs=gr.JSON(label="Embedding (lista de floats)"),
45
+ title="SMI‑TED Embedding Generator",
46
+ description=(
47
+ "Cole uma sequência SMILES e receba o embedding gerado pelo modelo "
48
+ "SMI‑TED Light treinado pela IBM Research."
49
+ ),
50
+ )
51
+
52
+ # 6) Roda localmente ou no Hugging Face Space
53
+ if __name__ == "__main__":
54
+ demo.launch()
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_batch": 32,
3
+ "n_layer": 12,
4
+ "n_head": 12,
5
+ "n_embd": 768,
6
+ "max_len": 202,
7
+ "d_dropout": 0.1,
8
+ "dropout": 0.1,
9
+ "lr_start": 3e-5,
10
+ "lr_multiplier": 1,
11
+ "model_type" : "SMI-TED",
12
+ "max_epochs": 500,
13
+ "num_feats": 32,
14
+ "smi_ted_version": "v1",
15
+ "model_path": "../",
16
+ "ckpt_filename": "smi-ted-Light_40.pt",
17
+ "data_root": "../../moleculenet/esol",
18
+ "dataset_name": "esol",
19
+ "measure_name": "measured log solubility in mols per litre",
20
+ "checkpoints_folder": "./checkpoints_esol",
21
+ "loss_fn": "rmse",
22
+ "target_metric": "rmse",
23
+ "save_ckpt": 1,
24
+ "start_seed": 0,
25
+ "train_decoder": 1
26
+ }
install.sh ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+
2
+ pip install torch==2.1.0
3
+
4
+ pip install pytorch-fast-transformers==0.4.0
5
+
6
+ pip install -r requirements.txt
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ wheel
3
+ torch>=2.1.0
4
+ transformers>=4.40.0
5
+ pytorch-fast-transformers==0.4.0
6
+ regex
7
+ numpy==1.26.4
8
+ pandas==1.4.0
9
+ tqdm>=4.66.4
10
+ rdkit>=2024.3.5
11
+ gradio>=4.32.0
12
+ huggingface-hub
smi-ted/README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SMILES-based Transformer Encoder-Decoder (SMI-TED)
2
+
3
+ This repository provides PyTorch source code associated with our publication, "A Large Encoder-Decoder Family of Foundation Models for Chemical Language".
4
+
5
+ Paper: [Arxiv Link](paper/smi_ted_preprint.pdf)
6
+
7
+ For model weights contact: [email protected] or [email protected] .
8
+
9
+ ![ted-smi](images/smi-ted.png)
10
+
11
+ ## Introduction
12
+
13
+ We present a large encoder-decoder chemical foundation model, SMILES-based Transformer Encoder-Decoder (SMI-TED), pre-trained on a curated dataset of 91 million SMILES samples sourced from PubChem, equivalent to 4 billion molecular tokens. SMI-TED supports various complex tasks, including quantum property prediction, with two main variants ($289M$ and $8 \times 289M$). Our experiments across multiple benchmark datasets demonstrate state-of-the-art performance for various tasks. For model weights contact: [email protected] or [email protected] .
14
+
15
+ ## Table of Contents
16
+
17
+ 1. [Getting Started](#getting-started)
18
+ 1. [Pretrained Models and Training Logs](#pretrained-models-and-training-logs)
19
+ 2. [Replicating Conda Environment](#replicating-conda-environment)
20
+ 2. [Pretraining](#pretraining)
21
+ 3. [Finetuning](#finetuning)
22
+ 4. [Feature Extraction](#feature-extraction)
23
+ 5. [Citations](#citations)
24
+
25
+ ## Getting Started
26
+
27
+ **This code and environment have been tested on Nvidia V100s and Nvidia A100s**
28
+
29
+ ### Pretrained Models and Training Logs
30
+
31
+ We provide checkpoints of the SMI-TED model pre-trained on a dataset of ~91M molecules curated from PubChem. The pre-trained model shows competitive performance on classification and regression benchmarks from MoleculeNet. For model weights contact: [email protected] or [email protected] .
32
+
33
+ Add the SMI-TED `pre-trained weights.pt` to the `inference/` or `finetune/` directory according to your needs. The directory structure should look like the following:
34
+
35
+ ```
36
+ inference/
37
+ ├── smi_ted_light
38
+ │ ├── smi_ted_light.pt
39
+ │ ├── bert_vocab_curated.txt
40
+ │ └── load.py
41
+ ```
42
+ and/or:
43
+
44
+ ```
45
+ finetune/
46
+ ├── smi_ted_light
47
+ │ ├── smi_ted_light.pt
48
+ │ ├── bert_vocab_curated.txt
49
+ │ └── load.py
50
+ ```
51
+
52
+ ### Replicating Conda Environment
53
+
54
+ Follow these steps to replicate our Conda environment and install the necessary libraries:
55
+
56
+ #### Create and Activate Conda Environment
57
+
58
+ ```
59
+ conda create --name smi-ted-env python=3.8.18
60
+ conda activate smi-ted-env
61
+ ```
62
+
63
+ #### Install Packages with Conda
64
+
65
+ ```
66
+ conda install pytorch=1.13.1 cudatoolkit=11.4 -c pytorch
67
+ conda install numpy=1.23.5 pandas=2.0.3
68
+ conda install rdkit=2021.03.5 -c conda-forge
69
+ ```
70
+
71
+ #### Install Packages with Pip
72
+
73
+ ```
74
+ pip install transformers==4.6.0 pytorch-fast-transformers==0.4.0 torch-optimizer==0.3.0 datasets==1.6.2 scikit-learn==1.3.2 scipy==1.12.0 tqdm==4.66.1
75
+ ```
76
+
77
+ ## Pretraining
78
+
79
+ For pretraining, we use two strategies: the masked language model method to train the encoder part and an encoder-decoder strategy to refine SMILES reconstruction and improve the generated latent space.
80
+
81
+ SMI-TED is pre-trained on canonicalized and curated 91M SMILES from PubChem with the following constraints:
82
+
83
+ - Compounds are filtered to a maximum length of 202 tokens during preprocessing.
84
+ - A 95/5/0 split is used for encoder training, with 5% of the data for decoder pretraining.
85
+ - A 100/0/0 split is also used to train the encoder and decoder directly, enhancing model performance.
86
+
87
+ The pretraining code provides examples of data processing and model training on a smaller dataset, requiring 8 A100 GPUs.
88
+
89
+ To pre-train the two variants of the SMI-TED model, run:
90
+
91
+ ```
92
+ bash training/run_model_light_training.sh
93
+ ```
94
+ or
95
+ ```
96
+ bash training/run_model_large_training.sh
97
+ ```
98
+
99
+ Use `train_model_D.py` to train only the decoder or `train_model_ED.py` to train both the encoder and decoder.
100
+
101
+ ## Finetuning
102
+
103
+ The finetuning datasets and environment can be found in the [finetune](finetune/) directory. After setting up the environment, you can run a finetuning task with:
104
+
105
+ ```
106
+ bash finetune/smi_ted_light/esol/run_finetune_esol.sh
107
+ ```
108
+
109
+ Finetuning training/checkpointing resources will be available in directories named `checkpoint_<measure_name>`.
110
+
111
+ ## Feature Extraction
112
+
113
+ The example notebook [smi_ted_encoder_decoder_example.ipynb](notebooks/smi_ted_encoder_decoder_example.ipynb) contains code to load checkpoint files and use the pre-trained model for encoder and decoder tasks. It also includes examples of classification and regression tasks. For model weights contact: [email protected] or [email protected].
114
+
115
+ To load smi-ted, you can simply use:
116
+
117
+ ```python
118
+ model = load_smi_ted(
119
+ folder='../inference/smi_ted_light',
120
+ ckpt_filename='smi_ted_light.pt'
121
+ )
122
+ ```
123
+
124
+ To encode SMILES into embeddings, you can use:
125
+
126
+ ```python
127
+ with torch.no_grad():
128
+ encoded_embeddings = model.encode(df['SMILES'], return_torch=True)
129
+ ```
130
+ For decoder, you can use the function, so you can return from embeddings to SMILES strings:
131
+
132
+ ```python
133
+ with torch.no_grad():
134
+ decoded_smiles = model.decode(encoded_embeddings)
135
+ ```
136
+
137
+
138
+ ## Citations
139
+
140
+ ```
141
+ to include
142
+ ```
smi-ted/inference/smi_ted_light/__init__.py ADDED
File without changes
smi-ted/inference/smi_ted_light/__pycache__/load.cpython-310.pyc ADDED
Binary file (20.6 kB). View file
 
smi-ted/inference/smi_ted_light/bert_vocab_curated.txt ADDED
@@ -0,0 +1,2393 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <bos>
2
+ <eos>
3
+ <pad>
4
+ <mask>
5
+ C
6
+ c
7
+ (
8
+ )
9
+ 1
10
+ O
11
+ N
12
+ 2
13
+ =
14
+ n
15
+ 3
16
+ [C@H]
17
+ [C@@H]
18
+ F
19
+ S
20
+ 4
21
+ Cl
22
+ -
23
+ o
24
+ s
25
+ [nH]
26
+ #
27
+ /
28
+ Br
29
+ [C@]
30
+ [C@@]
31
+ [N+]
32
+ [O-]
33
+ 5
34
+ \
35
+ .
36
+ I
37
+ 6
38
+ [S@]
39
+ [S@@]
40
+ P
41
+ [N-]
42
+ [Si]
43
+ 7
44
+ [n+]
45
+ [2H]
46
+ 8
47
+ [NH+]
48
+ B
49
+ 9
50
+ [C-]
51
+ [Na+]
52
+ [Cl-]
53
+ [c-]
54
+ [CH]
55
+ %10
56
+ [NH2+]
57
+ [P+]
58
+ [B]
59
+ [I-]
60
+ %11
61
+ [CH2-]
62
+ [O+]
63
+ [NH3+]
64
+ [C]
65
+ [Br-]
66
+ [IH2]
67
+ [S-]
68
+ [cH-]
69
+ %12
70
+ [nH+]
71
+ [B-]
72
+ [K+]
73
+ [Sn]
74
+ [Se]
75
+ [CH-]
76
+ [HH]
77
+ [Y]
78
+ [n-]
79
+ [CH3-]
80
+ [SiH]
81
+ [S+]
82
+ %13
83
+ [SiH2]
84
+ [Li+]
85
+ [NH-]
86
+ %14
87
+ [Na]
88
+ [CH2]
89
+ [O-2]
90
+ [U+2]
91
+ [W]
92
+ [Al]
93
+ [P@]
94
+ [Fe+2]
95
+ [PH+]
96
+ %15
97
+ [Cl+3]
98
+ [Zn+2]
99
+ [Ir]
100
+ [Mg+2]
101
+ [Pt+2]
102
+ [OH2+]
103
+ [As]
104
+ [Fe]
105
+ [OH+]
106
+ [Zr+2]
107
+ [3H]
108
+ [Ge]
109
+ [SiH3]
110
+ [OH-]
111
+ [NH4+]
112
+ [Cu+2]
113
+ [P@@]
114
+ p
115
+ [Pt]
116
+ %16
117
+ [Ca+2]
118
+ [Zr]
119
+ [F-]
120
+ [C+]
121
+ [Ti]
122
+ [P-]
123
+ [V]
124
+ [se]
125
+ [U]
126
+ [O]
127
+ [Ni+2]
128
+ [Zn]
129
+ [Co]
130
+ [Ni]
131
+ [Pd+2]
132
+ [Cu]
133
+ %17
134
+ [Cu+]
135
+ [Te]
136
+ [H+]
137
+ [CH+]
138
+ [Li]
139
+ [Pd]
140
+ [Mo]
141
+ [Ru+2]
142
+ [o+]
143
+ [Re]
144
+ [SH+]
145
+ %18
146
+ [Ac]
147
+ [Cr]
148
+ [NH2-]
149
+ [K]
150
+ [13CH2]
151
+ [c]
152
+ [Zr+4]
153
+ [Tl]
154
+ [13C]
155
+ [Mn]
156
+ [N@+]
157
+ [Hg]
158
+ [Rh]
159
+ [Ti+4]
160
+ [Sb]
161
+ [Co+2]
162
+ [Ag+]
163
+ [Ru]
164
+ %19
165
+ [N@@+]
166
+ [Ti+2]
167
+ [Al+3]
168
+ [Pb]
169
+ [I+]
170
+ [18F]
171
+ [s+]
172
+ [Rb+]
173
+ [Ba+2]
174
+ [H-]
175
+ [Fe+3]
176
+ [Ir+3]
177
+ [13cH]
178
+ %20
179
+ [AlH2]
180
+ [Au+]
181
+ [13c]
182
+ [SH2+]
183
+ [Sn+2]
184
+ [Mn+2]
185
+ [Si-]
186
+ [Ag]
187
+ [N]
188
+ [Bi]
189
+ %21
190
+ [In]
191
+ [CH2+]
192
+ [Y+3]
193
+ [Ga]
194
+ %22
195
+ [Co+3]
196
+ [Au]
197
+ [13CH3]
198
+ [Mg]
199
+ [Cs+]
200
+ [W+2]
201
+ [Hf]
202
+ [Zn+]
203
+ [Se-]
204
+ [S-2]
205
+ [Ca]
206
+ [pH]
207
+ [ClH+]
208
+ [Ti+3]
209
+ %23
210
+ [Ru+]
211
+ [SH-]
212
+ [13CH]
213
+ [IH+]
214
+ [Hf+4]
215
+ [Rf]
216
+ [OH3+]
217
+ %24
218
+ [Pt+4]
219
+ [Zr+3]
220
+ [PH3+]
221
+ [Sr+2]
222
+ [Cd+2]
223
+ [Cd]
224
+ %25
225
+ [Os]
226
+ [BH-]
227
+ [Sn+4]
228
+ [Cr+3]
229
+ [Ru+3]
230
+ [PH2+]
231
+ [Rh+2]
232
+ [V+2]
233
+ %26
234
+ [Gd+3]
235
+ [Pb+2]
236
+ [PH]
237
+ [Hg+]
238
+ [Mo+2]
239
+ [AlH]
240
+ [Sn+]
241
+ %27
242
+ [Pd+]
243
+ b
244
+ [Rh+3]
245
+ [Hg+2]
246
+ [15NH]
247
+ [14C]
248
+ %28
249
+ [Mn+3]
250
+ [Si+]
251
+ [SeH]
252
+ [13C@H]
253
+ [NH]
254
+ [Ga+3]
255
+ [SiH-]
256
+ [13C@@H]
257
+ [Ce]
258
+ [Au+3]
259
+ [Bi+3]
260
+ [15N]
261
+ %29
262
+ [BH3-]
263
+ [14cH]
264
+ [Ti+]
265
+ [Gd]
266
+ [cH+]
267
+ [Cr+2]
268
+ [Sb-]
269
+ %30
270
+ [Be+2]
271
+ [Al+]
272
+ [te]
273
+ [11CH3]
274
+ [Sm]
275
+ [Pr]
276
+ [La]
277
+ %31
278
+ [Al-]
279
+ [Ta]
280
+ [125I]
281
+ [BH2-]
282
+ [Nb]
283
+ [Si@]
284
+ %32
285
+ [14c]
286
+ [Sb+3]
287
+ [Ba]
288
+ %33
289
+ [Os+2]
290
+ [Si@@]
291
+ [La+3]
292
+ [15n]
293
+ [15NH2]
294
+ [Nd+3]
295
+ %34
296
+ [14CH2]
297
+ [18O]
298
+ [Nd]
299
+ [GeH]
300
+ [Ni+3]
301
+ [Eu]
302
+ [Dy+3]
303
+ [Sc]
304
+ %36
305
+ [Se-2]
306
+ [As+]
307
+ %35
308
+ [AsH]
309
+ [Tb]
310
+ [Sb+5]
311
+ [Se+]
312
+ [Ce+3]
313
+ [c+]
314
+ [In+3]
315
+ [SnH]
316
+ [Mo+4]
317
+ %37
318
+ [V+4]
319
+ [Eu+3]
320
+ [Hf+2]
321
+ %38
322
+ [Pt+]
323
+ [p+]
324
+ [123I]
325
+ [Tl+]
326
+ [Sm+3]
327
+ %39
328
+ [Yb+3]
329
+ %40
330
+ [Yb]
331
+ [Os+]
332
+ %41
333
+ [10B]
334
+ [Sc+3]
335
+ [Al+2]
336
+ %42
337
+ [Sr]
338
+ [Tb+3]
339
+ [Po]
340
+ [Tc]
341
+ [PH-]
342
+ [AlH3]
343
+ [Ar]
344
+ [U+4]
345
+ [SnH2]
346
+ [Cl+2]
347
+ [si]
348
+ [Fe+]
349
+ [14CH3]
350
+ [U+3]
351
+ [Cl+]
352
+ %43
353
+ [GeH2]
354
+ %44
355
+ [Er+3]
356
+ [Mo+3]
357
+ [I+2]
358
+ [Fe+4]
359
+ [99Tc]
360
+ %45
361
+ [11C]
362
+ %46
363
+ [SnH3]
364
+ [S]
365
+ [Te+]
366
+ [Er]
367
+ [Lu+3]
368
+ [11B]
369
+ %47
370
+ %48
371
+ [P]
372
+ [Tm]
373
+ [Th]
374
+ [Dy]
375
+ [Pr+3]
376
+ [Ta+5]
377
+ [Nb+5]
378
+ [Rb]
379
+ [GeH3]
380
+ [Br+2]
381
+ %49
382
+ [131I]
383
+ [Fm]
384
+ [Cs]
385
+ [BH4-]
386
+ [Lu]
387
+ [15nH]
388
+ %50
389
+ [Ru+6]
390
+ [b-]
391
+ [Ho]
392
+ [Th+4]
393
+ [Ru+4]
394
+ %52
395
+ [14CH]
396
+ %51
397
+ [Cr+6]
398
+ [18OH]
399
+ [Ho+3]
400
+ [Ce+4]
401
+ [Bi+2]
402
+ [Co+]
403
+ %53
404
+ [Yb+2]
405
+ [Fe+6]
406
+ [Be]
407
+ %54
408
+ [SH3+]
409
+ [Np]
410
+ [As-]
411
+ %55
412
+ [14C@@H]
413
+ [Ir+2]
414
+ [GaH3]
415
+ [p-]
416
+ [GeH4]
417
+ [Sn+3]
418
+ [Os+4]
419
+ %56
420
+ [14C@H]
421
+ [sH+]
422
+ [19F]
423
+ [Eu+2]
424
+ [TlH]
425
+ %57
426
+ [Cr+4]
427
+ %58
428
+ [B@@-]
429
+ [SiH+]
430
+ [At]
431
+ [Am]
432
+ [Fe+5]
433
+ [AsH2]
434
+ [Si+4]
435
+ [B@-]
436
+ [Pu]
437
+ [SbH]
438
+ [P-2]
439
+ [Tm+3]
440
+ *
441
+ %59
442
+ [se+]
443
+ [IH-]
444
+ %60
445
+ [oH+]
446
+ [1H]
447
+ [15N+]
448
+ [124I]
449
+ [S@@+]
450
+ [P-3]
451
+ [H]
452
+ [IH2+]
453
+ [TeH]
454
+ [Xe]
455
+ [PH4+]
456
+ [Cr+]
457
+ [Cm]
458
+ [I+3]
459
+ %61
460
+ [Nb+2]
461
+ [Ru+5]
462
+ %62
463
+ [Ta+2]
464
+ [Tc+4]
465
+ [CH3+]
466
+ [Pm]
467
+ [Si@H]
468
+ [No]
469
+ %63
470
+ [Cr+5]
471
+ [Th+2]
472
+ [Zn-2]
473
+ [13C@]
474
+ [Lr]
475
+ %64
476
+ [99Tc+3]
477
+ %65
478
+ [13C@@]
479
+ %66
480
+ [Fe-]
481
+ [17O]
482
+ [siH]
483
+ [Sb+]
484
+ [OH]
485
+ [IH]
486
+ [11CH2]
487
+ [Cf]
488
+ [SiH2+]
489
+ [Gd+2]
490
+ [In+]
491
+ [Si@@H]
492
+ [Mn+]
493
+ [99Tc+4]
494
+ [Ga-]
495
+ %67
496
+ [S@+]
497
+ [Ge+4]
498
+ [Tl+3]
499
+ [16OH]
500
+ %68
501
+ [2H-]
502
+ [Ra]
503
+ [si-]
504
+ [NiH2]
505
+ [P@@H]
506
+ [Rh+]
507
+ [12C]
508
+ [35S]
509
+ [32P]
510
+ [SiH2-]
511
+ [AlH2+]
512
+ [16O]
513
+ %69
514
+ [BiH]
515
+ [BiH2]
516
+ [Zn-]
517
+ [BH]
518
+ [Tc+3]
519
+ [Ir+]
520
+ [Ni+]
521
+ %70
522
+ [InH2]
523
+ [InH]
524
+ [Nb+3]
525
+ [PbH]
526
+ [Bi+]
527
+ %71
528
+ [As+3]
529
+ %72
530
+ [18O-]
531
+ [68Ga+3]
532
+ %73
533
+ [Pa]
534
+ [76Br]
535
+ [Tc+5]
536
+ [pH+]
537
+ [64Cu+2]
538
+ [Ru+8]
539
+ %74
540
+ [PH2-]
541
+ [Si+2]
542
+ [17OH]
543
+ [RuH]
544
+ [111In+3]
545
+ [AlH+]
546
+ %75
547
+ %76
548
+ [W+]
549
+ [SbH2]
550
+ [PoH]
551
+ [Ru-]
552
+ [XeH]
553
+ [Tc+2]
554
+ [13C-]
555
+ [Br+]
556
+ [Pt-2]
557
+ [Es]
558
+ [Cu-]
559
+ [Mg+]
560
+ [3HH]
561
+ [P@H]
562
+ [ClH2+]
563
+ %77
564
+ [SH]
565
+ [Au-]
566
+ [2HH]
567
+ %78
568
+ [Sn-]
569
+ [11CH]
570
+ [PdH2]
571
+ 0
572
+ [Os+6]
573
+ %79
574
+ [Mo+]
575
+ %80
576
+ [al]
577
+ [PbH2]
578
+ [64Cu]
579
+ [Cl]
580
+ [12CH3]
581
+ %81
582
+ [Tc+7]
583
+ [11c]
584
+ %82
585
+ [Li-]
586
+ [99Tc+5]
587
+ [He]
588
+ [12c]
589
+ [Kr]
590
+ [RuH+2]
591
+ [35Cl]
592
+ [Pd-2]
593
+ [GaH2]
594
+ [4H]
595
+ [Sg]
596
+ [Cu-2]
597
+ [Br+3]
598
+ %83
599
+ [37Cl]
600
+ [211At]
601
+ [IrH+2]
602
+ [Mt]
603
+ [Ir-2]
604
+ [In-]
605
+ [12cH]
606
+ [12CH2]
607
+ [RuH2]
608
+ [99Tc+7]
609
+ %84
610
+ [15n+]
611
+ [ClH2+2]
612
+ [16N]
613
+ [111In]
614
+ [Tc+]
615
+ [Ru-2]
616
+ [12CH]
617
+ [si+]
618
+ [Tc+6]
619
+ %85
620
+ %86
621
+ [90Y]
622
+ [Pd-]
623
+ [188Re]
624
+ [RuH+]
625
+ [NiH]
626
+ [SiH3-]
627
+ [14n]
628
+ [CH3]
629
+ [14N]
630
+ [10BH2]
631
+ %88
632
+ %89
633
+ %90
634
+ [34S]
635
+ [77Br]
636
+ [GaH]
637
+ [Br]
638
+ [Ge@]
639
+ [B@@H-]
640
+ [CuH]
641
+ [SiH4]
642
+ [3H-]
643
+ %87
644
+ %91
645
+ %92
646
+ [67Cu]
647
+ [I]
648
+ [177Lu]
649
+ [ReH]
650
+ [67Ga+3]
651
+ [Db]
652
+ [177Lu+3]
653
+ [AlH2-]
654
+ [Si+3]
655
+ [Ti-2]
656
+ [RuH+3]
657
+ [al+]
658
+ [68Ga]
659
+ [2H+]
660
+ [B@H-]
661
+ [WH2]
662
+ [OsH]
663
+ [Ir-3]
664
+ [AlH-]
665
+ [Bk]
666
+ [75Se]
667
+ [14C@]
668
+ [Pt-]
669
+ [N@@H+]
670
+ [Nb-]
671
+ [13NH2]
672
+ %93
673
+ [186Re]
674
+ [Tb+4]
675
+ [PtH]
676
+ [IrH2]
677
+ [Hg-2]
678
+ [AlH3-]
679
+ [PdH+]
680
+ [Md]
681
+ [RhH+2]
682
+ [11cH]
683
+ [Co-2]
684
+ [15N-]
685
+ [ZrH2]
686
+ %94
687
+ [Hg-]
688
+ [127I]
689
+ [AsH2+]
690
+ [MoH2]
691
+ [Te+4]
692
+ [14C@@]
693
+ [As+5]
694
+ [SnH+3]
695
+ [Ge@@]
696
+ [6Li+]
697
+ [WH]
698
+ [Ne]
699
+ [14NH2]
700
+ [14NH]
701
+ [12C@@H]
702
+ [Os+7]
703
+ [RhH]
704
+ [Al-3]
705
+ [SnH+]
706
+ [15NH3+]
707
+ [Zr+]
708
+ [197Hg+]
709
+ %95
710
+ %96
711
+ [90Y+3]
712
+ [Os-2]
713
+ [98Tc+5]
714
+ [15NH3]
715
+ [bH-]
716
+ [33P]
717
+ [Zr-2]
718
+ [15O]
719
+ [Rh-]
720
+ [PbH3]
721
+ [PH2]
722
+ [Ni-]
723
+ [CuH+]
724
+ %97
725
+ %98
726
+ %99
727
+ [Os+5]
728
+ [PtH+]
729
+ [ReH4]
730
+ [16NH]
731
+ [82Br]
732
+ [W-]
733
+ [18F-]
734
+ [15NH4+]
735
+ [Se+4]
736
+ [SeH-]
737
+ [SH4]
738
+ [67Cu+2]
739
+ [12C@H]
740
+ [AsH3]
741
+ [HgH]
742
+ [10B-]
743
+ [99Tc+6]
744
+ [117Sn+4]
745
+ [Te@]
746
+ [P@+]
747
+ [35SH]
748
+ [SeH+]
749
+ [Ni-2]
750
+ [Al-2]
751
+ [TeH2]
752
+ [Bh]
753
+ [99Tc+2]
754
+ [Os+8]
755
+ [PH-2]
756
+ [7Li+]
757
+ [14nH]
758
+ [AlH+2]
759
+ [18FH]
760
+ [SnH4]
761
+ [18O-2]
762
+ [IrH]
763
+ [13N]
764
+ [Te@@]
765
+ [Rh-3]
766
+ [15NH+]
767
+ [AsH3+]
768
+ [SeH2]
769
+ [AsH+]
770
+ [CoH2]
771
+ [16NH2]
772
+ [AsH-]
773
+ [203Hg+]
774
+ [P@@+]
775
+ [166Ho+3]
776
+ [60Co+3]
777
+ [13CH2-]
778
+ [SeH2+]
779
+ [75Br]
780
+ [TlH2]
781
+ [80Br]
782
+ [siH+]
783
+ [Ca+]
784
+ [153Sm+3]
785
+ [PdH]
786
+ [225Ac]
787
+ [13CH3-]
788
+ [AlH4-]
789
+ [FeH]
790
+ [13CH-]
791
+ [14C-]
792
+ [11C-]
793
+ [153Sm]
794
+ [Re-]
795
+ [te+]
796
+ [13CH4]
797
+ [ClH+2]
798
+ [8CH2]
799
+ [99Mo]
800
+ [ClH3+3]
801
+ [SbH3]
802
+ [25Mg+2]
803
+ [16N+]
804
+ [SnH2+]
805
+ [PH4]
806
+ [11C@H]
807
+ [122I]
808
+ [Re-2]
809
+ [RuH2+2]
810
+ [ZrH]
811
+ [Bi-]
812
+ [Pr+]
813
+ [Rn]
814
+ [Fr]
815
+ [36Cl]
816
+ [18o]
817
+ [YH]
818
+ [79Br]
819
+ [121I]
820
+ [113In+3]
821
+ [InH4-]
822
+ [TaH]
823
+ [RhH2]
824
+ [Ta-]
825
+ [67Ga]
826
+ [ZnH+]
827
+ [SnH2-]
828
+ [OsH2]
829
+ [16F]
830
+ [FeH2]
831
+ [14O]
832
+ [PbH2+2]
833
+ [BH2]
834
+ [6H]
835
+ [125Te]
836
+ [197Hg]
837
+ [TaH2]
838
+ [TaH3]
839
+ [76As]
840
+ [Nb-2]
841
+ [14N+]
842
+ [125I-]
843
+ [33S]
844
+ [IH2+2]
845
+ [NH2]
846
+ [PtH2]
847
+ [MnH]
848
+ [19C]
849
+ [17F]
850
+ [1H-]
851
+ [SnH4+2]
852
+ [Mn-2]
853
+ [15NH2+]
854
+ [TiH2]
855
+ [ReH7]
856
+ [Cd-2]
857
+ [Fe-3]
858
+ [SH2]
859
+ [17O-]
860
+ [siH-]
861
+ [CoH+]
862
+ [VH]
863
+ [10BH]
864
+ [Ru-3]
865
+ [13O]
866
+ [5H]
867
+ [CoH]
868
+ [PH5]
869
+ [15n-]
870
+ [153Gd]
871
+ [12C@]
872
+ [11CH3-]
873
+ [IrH3]
874
+ [RuH3]
875
+ [74Se]
876
+ [Se@]
877
+ [Hf+]
878
+ [77Se]
879
+ [166Ho]
880
+ [59Fe+2]
881
+ [203Hg]
882
+ [18OH-]
883
+ [8CH]
884
+ [12C@@]
885
+ [11CH4]
886
+ [15C]
887
+ [249Cf]
888
+ [PbH4]
889
+ [64Zn]
890
+ [PH3]
891
+ [99Tc+]
892
+ [14c-]
893
+ [149Pm]
894
+ [IrH4]
895
+ [Se@@]
896
+ [13OH]
897
+ [14CH3-]
898
+ [28Si]
899
+ [Rh-2]
900
+ [Fe-2]
901
+ [131I-]
902
+ [51Cr]
903
+ [62Cu+2]
904
+ [81Br]
905
+ [121Sb]
906
+ [7Li]
907
+ [89Zr+4]
908
+ [SbH3+]
909
+ [11C@@H]
910
+ [98Tc]
911
+ [59Fe+3]
912
+ [BiH2+]
913
+ [SbH+]
914
+ [TiH]
915
+ [14NH3]
916
+ [15OH]
917
+ [119Sn]
918
+ [201Hg]
919
+ [MnH+]
920
+ [201Tl]
921
+ [51Cr+3]
922
+ [123I-]
923
+ [MoH]
924
+ [AlH6-3]
925
+ [MnH2]
926
+ [WH3]
927
+ [213Bi+3]
928
+ [SnH2+2]
929
+ [123IH]
930
+ [13CH+]
931
+ [Zr-]
932
+ [74As]
933
+ [13C+]
934
+ [32P+]
935
+ [KrH]
936
+ [SiH+2]
937
+ [ClH3+2]
938
+ [13NH]
939
+ [9CH2]
940
+ [ZrH2+2]
941
+ [87Sr+2]
942
+ [35s]
943
+ [239Pu]
944
+ [198Au]
945
+ [241Am]
946
+ [203Hg+2]
947
+ [V+]
948
+ [YH2]
949
+ [SH5]
950
+ [195Pt]
951
+ [203Pb]
952
+ [RuH4]
953
+ [ThH2]
954
+ [AuH]
955
+ [66Ga+3]
956
+ [11B-]
957
+ [F]
958
+ [24Na+]
959
+ [85Sr+2]
960
+ [201Tl+]
961
+ [14CH4]
962
+ [32S]
963
+ [TeH2+]
964
+ [ClH2+3]
965
+ [AgH]
966
+ [Ge@H]
967
+ [44Ca+2]
968
+ [Os-]
969
+ [31P]
970
+ [15nH+]
971
+ [SbH4]
972
+ [TiH+]
973
+ [Ba+]
974
+ [57Co+2]
975
+ [Ta+]
976
+ [125IH]
977
+ [77As]
978
+ [129I]
979
+ [Fe-4]
980
+ [Ta-2]
981
+ [19O]
982
+ [12O]
983
+ [BiH3]
984
+ [237Np]
985
+ [252Cf]
986
+ [86Y]
987
+ [Cr-2]
988
+ [89Y]
989
+ [195Pt+2]
990
+ [si+2]
991
+ [58Fe+2]
992
+ [Hs]
993
+ [S@@H]
994
+ [OsH6]
995
+ [GdH2]
996
+ [IH3]
997
+ [8CH4]
998
+ [164Dy+3]
999
+ [47Ca+2]
1000
+ [57Co]
1001
+ [NbH2]
1002
+ [ReH2]
1003
+ [ZnH2]
1004
+ [CrH2]
1005
+ [17NH]
1006
+ [ZrH3]
1007
+ [RhH3]
1008
+ [12C-]
1009
+ [18O+]
1010
+ [Bi-2]
1011
+ [ClH4+3]
1012
+ [Ni-3]
1013
+ [Ag-]
1014
+ [111In-]
1015
+ [Mo-2]
1016
+ [55Fe+3]
1017
+ [204Hg+]
1018
+ [35Cl-]
1019
+ [211Pb]
1020
+ [75Ge]
1021
+ [8B]
1022
+ [TeH3]
1023
+ [SnH3+]
1024
+ [Zr-3]
1025
+ [28F]
1026
+ [249Bk]
1027
+ [169Yb]
1028
+ [34SH]
1029
+ [6Li]
1030
+ [94Tc]
1031
+ [197Au]
1032
+ [195Pt+4]
1033
+ [169Yb+3]
1034
+ [32Cl]
1035
+ [82Se]
1036
+ [159Gd+3]
1037
+ [213Bi]
1038
+ [CoH+2]
1039
+ [36S]
1040
+ [35P]
1041
+ [Ru-4]
1042
+ [Cr-3]
1043
+ [60Co]
1044
+ [1H+]
1045
+ [18CH2]
1046
+ [Cd-]
1047
+ [152Sm+3]
1048
+ [106Ru]
1049
+ [238Pu]
1050
+ [220Rn]
1051
+ [45Ca+2]
1052
+ [89Sr+2]
1053
+ [239Np]
1054
+ [90Sr+2]
1055
+ [137Cs+]
1056
+ [165Dy]
1057
+ [68GaH3]
1058
+ [65Zn+2]
1059
+ [89Zr]
1060
+ [BiH2+2]
1061
+ [62Cu]
1062
+ [165Dy+3]
1063
+ [238U]
1064
+ [105Rh+3]
1065
+ [70Zn]
1066
+ [12B]
1067
+ [12OH]
1068
+ [18CH]
1069
+ [17CH]
1070
+ [OsH3]
1071
+ [SbH-]
1072
+ [SH6]
1073
+ [AlH2-2]
1074
+ [42K]
1075
+ [76Br-]
1076
+ [71As]
1077
+ [NbH3]
1078
+ [ReH3]
1079
+ [OsH-]
1080
+ [WH4]
1081
+ [MoH3]
1082
+ [OsH4]
1083
+ [RuH6]
1084
+ [PtH3]
1085
+ [CuH2]
1086
+ [CoH3]
1087
+ [TiH4]
1088
+ [64Zn+2]
1089
+ [Si-2]
1090
+ [79BrH]
1091
+ [14CH2-]
1092
+ [PtH2+2]
1093
+ [Os-3]
1094
+ [29Si]
1095
+ [Ti-]
1096
+ [Se+6]
1097
+ [22Na+]
1098
+ [42K+]
1099
+ [131Cs+]
1100
+ [86Rb+]
1101
+ [134Cs+]
1102
+ [209Po]
1103
+ [208Po]
1104
+ [81Rb+]
1105
+ [203Tl+]
1106
+ [Zr-4]
1107
+ [148Sm]
1108
+ [147Sm]
1109
+ [37Cl-]
1110
+ [12CH4]
1111
+ [Ge@@H]
1112
+ [63Cu]
1113
+ [13CH2+]
1114
+ [AsH2-]
1115
+ [CeH]
1116
+ [SnH-]
1117
+ [UH]
1118
+ [9c]
1119
+ [21CH3]
1120
+ [TeH+]
1121
+ [57Co+3]
1122
+ [8BH2]
1123
+ [12BH2]
1124
+ [19BH2]
1125
+ [9BH2]
1126
+ [YbH2]
1127
+ [CrH+2]
1128
+ [208Bi]
1129
+ [152Gd]
1130
+ [61Cu]
1131
+ [115In]
1132
+ [60Co+2]
1133
+ [13NH2-]
1134
+ [120I]
1135
+ [18OH2]
1136
+ [75SeH]
1137
+ [SbH2+]
1138
+ [144Ce]
1139
+ [16n]
1140
+ [113In]
1141
+ [22nH]
1142
+ [129I-]
1143
+ [InH3]
1144
+ [32PH3]
1145
+ [234U]
1146
+ [235U]
1147
+ [59Fe]
1148
+ [82Rb+]
1149
+ [65Zn]
1150
+ [244Cm]
1151
+ [147Pm]
1152
+ [91Y]
1153
+ [237Pu]
1154
+ [231Pa]
1155
+ [253Cf]
1156
+ [127Te]
1157
+ [187Re]
1158
+ [236Np]
1159
+ [235Np]
1160
+ [72Zn]
1161
+ [253Es]
1162
+ [159Dy]
1163
+ [62Zn]
1164
+ [101Tc]
1165
+ [149Tb]
1166
+ [124I-]
1167
+ [SeH3+]
1168
+ [210Pb]
1169
+ [40K]
1170
+ [210Po]
1171
+ [214Pb]
1172
+ [218Po]
1173
+ [214Po]
1174
+ [7Be]
1175
+ [212Pb]
1176
+ [205Pb]
1177
+ [209Pb]
1178
+ [123Te]
1179
+ [202Pb]
1180
+ [72As]
1181
+ [201Pb]
1182
+ [70As]
1183
+ [73Ge]
1184
+ [200Pb]
1185
+ [198Pb]
1186
+ [66Ga]
1187
+ [73Se]
1188
+ [195Pb]
1189
+ [199Pb]
1190
+ [144Ce+3]
1191
+ [235U+2]
1192
+ [90Tc]
1193
+ [114In+3]
1194
+ [128I]
1195
+ [100Tc+]
1196
+ [82Br-]
1197
+ [191Pt+2]
1198
+ [191Pt+4]
1199
+ [193Pt+4]
1200
+ [31PH3]
1201
+ [125I+2]
1202
+ [131I+2]
1203
+ [125Te+4]
1204
+ [82Sr+2]
1205
+ [149Sm]
1206
+ [81BrH]
1207
+ [129Xe]
1208
+ [193Pt+2]
1209
+ [123I+2]
1210
+ [Cr-]
1211
+ [Co-]
1212
+ [227Th+4]
1213
+ [249Cf+3]
1214
+ [252Cf+3]
1215
+ [187Os]
1216
+ [16O-]
1217
+ [17O+]
1218
+ [16OH-]
1219
+ [98Tc+7]
1220
+ [58Co+2]
1221
+ [69Ga+3]
1222
+ [57Fe+2]
1223
+ [43K+]
1224
+ [16C]
1225
+ [52Fe+3]
1226
+ [SeH5]
1227
+ [194Pb]
1228
+ [196Pb]
1229
+ [197Pb]
1230
+ [213Pb]
1231
+ [9B]
1232
+ [19B]
1233
+ [11CH-]
1234
+ [9CH]
1235
+ [20OH]
1236
+ [25OH]
1237
+ [8cH]
1238
+ [TiH+3]
1239
+ [SnH6+3]
1240
+ [N@H+]
1241
+ [ZnH]
1242
+ [VH3]
1243
+ [52Mn+2]
1244
+ [64Ga]
1245
+ [13B]
1246
+ [216Bi]
1247
+ [117Sn+2]
1248
+ [232Th]
1249
+ [SnH+2]
1250
+ [BiH5]
1251
+ [77Kr]
1252
+ [103Cd]
1253
+ [62Ni]
1254
+ [LaH3]
1255
+ [SmH3]
1256
+ [EuH3]
1257
+ [MoH5]
1258
+ [64Ni]
1259
+ [66Zn]
1260
+ [68Zn]
1261
+ [186W]
1262
+ [FeH4]
1263
+ [MoH4]
1264
+ [HgH2]
1265
+ [15NH2-]
1266
+ [UH2]
1267
+ [204Hg]
1268
+ [GaH4-]
1269
+ [ThH4]
1270
+ [WH6]
1271
+ [PtH4]
1272
+ [VH2]
1273
+ [UH3]
1274
+ [FeH3]
1275
+ [RuH5]
1276
+ [BiH4]
1277
+ [80Br-]
1278
+ [CeH3]
1279
+ [37ClH]
1280
+ [157Gd+3]
1281
+ [205Tl]
1282
+ [203Tl]
1283
+ [62Cu+]
1284
+ [64Cu+]
1285
+ [61Cu+]
1286
+ [37SH2]
1287
+ [30Si]
1288
+ [28Al]
1289
+ [19OH2]
1290
+ [8He]
1291
+ [6He]
1292
+ [153Pm]
1293
+ [209Bi]
1294
+ [66Zn+2]
1295
+ [10CH4]
1296
+ [191Ir]
1297
+ [66Cu]
1298
+ [16O+]
1299
+ [25O]
1300
+ [10c]
1301
+ [Co-3]
1302
+ [Sn@@]
1303
+ [17OH-]
1304
+ [206Po]
1305
+ [204Po]
1306
+ [202Po]
1307
+ [201Po]
1308
+ [200Po]
1309
+ [199Po]
1310
+ [198Po]
1311
+ [197Po]
1312
+ [196Po]
1313
+ [195Po]
1314
+ [194Po]
1315
+ [193Po]
1316
+ [192Po]
1317
+ [191Po]
1318
+ [190Po]
1319
+ [217Po]
1320
+ [BiH4-]
1321
+ [TeH4]
1322
+ [222Ra]
1323
+ [62Ga]
1324
+ [39Ar]
1325
+ [144Sm]
1326
+ [58Fe]
1327
+ [153Eu]
1328
+ [85Rb]
1329
+ [171Yb]
1330
+ [172Yb]
1331
+ [114Cd]
1332
+ [51Fe]
1333
+ [142Ce]
1334
+ [207Tl]
1335
+ [92Mo]
1336
+ [115Sn]
1337
+ [140Ce]
1338
+ [202Hg]
1339
+ [180W]
1340
+ [182W]
1341
+ [183W]
1342
+ [184W]
1343
+ [96Mo]
1344
+ [47Ti]
1345
+ [111Cd]
1346
+ [143Nd]
1347
+ [145Nd]
1348
+ [126Te]
1349
+ [128Te]
1350
+ [130Te]
1351
+ [185Re]
1352
+ [97Mo]
1353
+ [98Mo]
1354
+ [183Re]
1355
+ [52V]
1356
+ [80Se]
1357
+ [87Kr]
1358
+ [137Xe]
1359
+ [196Au]
1360
+ [146Ce]
1361
+ [88Kr]
1362
+ [51Ti]
1363
+ [138Xe]
1364
+ [112Cd]
1365
+ [116Sn]
1366
+ [120Sn]
1367
+ [28SiH3]
1368
+ [35S-]
1369
+ [15NH-]
1370
+ [13CH3+]
1371
+ [34S+]
1372
+ [34s]
1373
+ [SiH4-]
1374
+ [100Tc+5]
1375
+ [NiH2+2]
1376
+ [239Th]
1377
+ [186Lu]
1378
+ [AuH3]
1379
+ [I@@-]
1380
+ [XeH2]
1381
+ [B+]
1382
+ [16CH2]
1383
+ [8C]
1384
+ [TaH5]
1385
+ [FeH4-]
1386
+ [19C@H]
1387
+ [10NH]
1388
+ [FeH6-3]
1389
+ [22CH]
1390
+ [25N]
1391
+ [25N+]
1392
+ [25N-]
1393
+ [21CH2]
1394
+ [18cH]
1395
+ [113I]
1396
+ [ScH3]
1397
+ [30PH3]
1398
+ [43Ca+2]
1399
+ [41Ca+2]
1400
+ [106Cd]
1401
+ [122Sn]
1402
+ [18CH3]
1403
+ [58Co+3]
1404
+ [98Tc+4]
1405
+ [70Ge]
1406
+ [76Ge]
1407
+ [108Cd]
1408
+ [116Cd]
1409
+ [130Xe]
1410
+ [94Mo]
1411
+ [124Sn]
1412
+ [186Os]
1413
+ [188Os]
1414
+ [190Os]
1415
+ [192Os]
1416
+ [106Pd]
1417
+ [110Pd]
1418
+ [120Te]
1419
+ [132Ba]
1420
+ [134Ba]
1421
+ [136Ba]
1422
+ [136Ce]
1423
+ [138Ce]
1424
+ [156Dy]
1425
+ [158Dy]
1426
+ [160Dy]
1427
+ [163Dy]
1428
+ [162Er]
1429
+ [164Er]
1430
+ [167Er]
1431
+ [176Hf]
1432
+ [26Mg]
1433
+ [144Nd]
1434
+ [150Nd]
1435
+ [41K]
1436
+ [46Ti]
1437
+ [48Ti]
1438
+ [49Ti]
1439
+ [50Ti]
1440
+ [170Yb]
1441
+ [173Yb]
1442
+ [91Zr]
1443
+ [92Zr]
1444
+ [96Zr]
1445
+ [34S-]
1446
+ [CuH2-]
1447
+ [38Cl]
1448
+ [25Mg]
1449
+ [51V]
1450
+ [93Nb]
1451
+ [95Mo]
1452
+ [45Sc]
1453
+ [123Sb]
1454
+ [139La]
1455
+ [9Be]
1456
+ [99Y+3]
1457
+ [99Y]
1458
+ [156Ho]
1459
+ [67Zn]
1460
+ [144Ce+4]
1461
+ [210Tl]
1462
+ [42Ca]
1463
+ [54Fe]
1464
+ [193Ir]
1465
+ [92Nb]
1466
+ [141Cs]
1467
+ [52Cr]
1468
+ [35ClH]
1469
+ [46Ca]
1470
+ [139Cs]
1471
+ [65Cu]
1472
+ [71Ga]
1473
+ [60Ni]
1474
+ [16NH3]
1475
+ [148Nd]
1476
+ [72Ge]
1477
+ [161Dy]
1478
+ [49Ca]
1479
+ [43Ca]
1480
+ [8Be]
1481
+ [48Ca]
1482
+ [44Ca]
1483
+ [120Xe]
1484
+ [80Rb]
1485
+ [215At]
1486
+ [180Re]
1487
+ [146Sm]
1488
+ [19Ne]
1489
+ [74Kr]
1490
+ [134La]
1491
+ [76Kr]
1492
+ [219Fr]
1493
+ [121Xe]
1494
+ [220Fr]
1495
+ [216At]
1496
+ [223Ac]
1497
+ [218At]
1498
+ [37Ar]
1499
+ [135I]
1500
+ [110Cd]
1501
+ [94Tc+7]
1502
+ [86Y+3]
1503
+ [135I-]
1504
+ [15O-2]
1505
+ [151Eu+3]
1506
+ [161Tb+3]
1507
+ [197Hg+2]
1508
+ [109Cd+2]
1509
+ [191Os+4]
1510
+ [170Tm+3]
1511
+ [205Bi+3]
1512
+ [233U+4]
1513
+ [126Sb+3]
1514
+ [127Sb+3]
1515
+ [132Cs+]
1516
+ [136Eu+3]
1517
+ [136Eu]
1518
+ [125Sn+4]
1519
+ [175Yb+3]
1520
+ [100Mo]
1521
+ [22Ne]
1522
+ [13c-]
1523
+ [13NH4+]
1524
+ [17C]
1525
+ [9C]
1526
+ [31S]
1527
+ [31SH]
1528
+ [133I]
1529
+ [126I]
1530
+ [36SH]
1531
+ [30S]
1532
+ [32SH]
1533
+ [19CH2]
1534
+ [19c]
1535
+ [18c]
1536
+ [15F]
1537
+ [10C]
1538
+ [RuH-]
1539
+ [62Zn+2]
1540
+ [32ClH]
1541
+ [33ClH]
1542
+ [78BrH]
1543
+ [12Li+]
1544
+ [12Li]
1545
+ [233Ra]
1546
+ [68Ge+4]
1547
+ [44Sc+3]
1548
+ [91Y+3]
1549
+ [106Ru+3]
1550
+ [PoH2]
1551
+ [AtH]
1552
+ [55Fe]
1553
+ [233U]
1554
+ [210PoH2]
1555
+ [230Th]
1556
+ [228Th]
1557
+ [222Rn]
1558
+ [35SH2]
1559
+ [227Th]
1560
+ [192Ir]
1561
+ [133Xe]
1562
+ [81Kr]
1563
+ [95Zr]
1564
+ [240Pu]
1565
+ [54Mn]
1566
+ [103Ru]
1567
+ [95Nb]
1568
+ [109Cd]
1569
+ [141Ce]
1570
+ [85Kr]
1571
+ [110Ag]
1572
+ [58Co]
1573
+ [241Pu]
1574
+ [234Th]
1575
+ [140La]
1576
+ [63Ni]
1577
+ [152Eu]
1578
+ [132IH]
1579
+ [226Rn]
1580
+ [154Eu]
1581
+ [36ClH]
1582
+ [228Ac]
1583
+ [155Eu]
1584
+ [106Rh]
1585
+ [243Am]
1586
+ [227Ac]
1587
+ [243Cm]
1588
+ [236U]
1589
+ [144Pr]
1590
+ [232U]
1591
+ [32SH2]
1592
+ [88Y]
1593
+ [82BrH]
1594
+ [135IH]
1595
+ [242Cm]
1596
+ [115Cd]
1597
+ [242Pu]
1598
+ [46Sc]
1599
+ [56Mn]
1600
+ [234Pa]
1601
+ [41Ar]
1602
+ [147Nd]
1603
+ [187W]
1604
+ [151Sm]
1605
+ [59Ni]
1606
+ [233Pa]
1607
+ [52Mn]
1608
+ [94Nb]
1609
+ [219Rn]
1610
+ [236Pu]
1611
+ [13NH3]
1612
+ [93Zr]
1613
+ [51Cr+6]
1614
+ [TlH3]
1615
+ [123Xe]
1616
+ [160Tb]
1617
+ [170Tm]
1618
+ [182Ta]
1619
+ [175Yb]
1620
+ [93Mo]
1621
+ [143Ce]
1622
+ [191Os]
1623
+ [126IH]
1624
+ [48V]
1625
+ [113Cd]
1626
+ [47Sc]
1627
+ [181Hf]
1628
+ [185W]
1629
+ [143Pr]
1630
+ [191Pt]
1631
+ [181W]
1632
+ [33PH3]
1633
+ [97Ru]
1634
+ [97Tc]
1635
+ [111Ag]
1636
+ [169Er]
1637
+ [107Pd]
1638
+ [103Ru+2]
1639
+ [34SH2]
1640
+ [137Ce]
1641
+ [242Am]
1642
+ [117SnH2]
1643
+ [57Ni]
1644
+ [239U]
1645
+ [60Cu]
1646
+ [250Cf]
1647
+ [193Au]
1648
+ [69Zn]
1649
+ [55Co]
1650
+ [139Ce]
1651
+ [127Xe]
1652
+ [159Gd]
1653
+ [56Co]
1654
+ [177Hf]
1655
+ [244Pu]
1656
+ [38ClH]
1657
+ [142Pr]
1658
+ [199Hg]
1659
+ [179Hf]
1660
+ [178Hf]
1661
+ [237U]
1662
+ [156Eu]
1663
+ [157Eu]
1664
+ [105Ru]
1665
+ [171Tm]
1666
+ [199Au]
1667
+ [155Sm]
1668
+ [80BrH]
1669
+ [108Ag]
1670
+ [128IH]
1671
+ [48Sc]
1672
+ [45Ti]
1673
+ [176Lu]
1674
+ [121SnH2]
1675
+ [148Pm]
1676
+ [57Fe]
1677
+ [10BH3]
1678
+ [96Tc]
1679
+ [133IH]
1680
+ [143Pm]
1681
+ [105Rh]
1682
+ [130IH]
1683
+ [134IH]
1684
+ [131IH]
1685
+ [71Zn]
1686
+ [105Ag]
1687
+ [97Zr]
1688
+ [235Pu]
1689
+ [231Th]
1690
+ [109Pd]
1691
+ [93Y]
1692
+ [190Ir]
1693
+ [135Xe]
1694
+ [53Mn]
1695
+ [134Ce]
1696
+ [234Np]
1697
+ [240Am]
1698
+ [246Cf]
1699
+ [240Cm]
1700
+ [241Cm]
1701
+ [226Th]
1702
+ [39ClH]
1703
+ [229Th]
1704
+ [245Cm]
1705
+ [240U]
1706
+ [240Np]
1707
+ [249Cm]
1708
+ [243Pu]
1709
+ [145Pm]
1710
+ [199Pt]
1711
+ [246Bk]
1712
+ [193Pt]
1713
+ [230U]
1714
+ [250Cm]
1715
+ [44Ti]
1716
+ [175Hf]
1717
+ [254Fm]
1718
+ [255Fm]
1719
+ [257Fm]
1720
+ [92Y]
1721
+ [188Ir]
1722
+ [171Lu]
1723
+ [257Md]
1724
+ [247Bk]
1725
+ [121IH]
1726
+ [250Bk]
1727
+ [179Lu]
1728
+ [224Ac]
1729
+ [195Hg]
1730
+ [244Am]
1731
+ [246Pu]
1732
+ [194Au]
1733
+ [252Fm]
1734
+ [173Hf]
1735
+ [246Cm]
1736
+ [135Ce]
1737
+ [49Cr]
1738
+ [248Cf]
1739
+ [247Cm]
1740
+ [248Cm]
1741
+ [174Ta]
1742
+ [176Ta]
1743
+ [154Tb]
1744
+ [172Ta]
1745
+ [177Ta]
1746
+ [175Ta]
1747
+ [180Ta]
1748
+ [158Tb]
1749
+ [115Ag]
1750
+ [189Os]
1751
+ [251Cf]
1752
+ [145Pr]
1753
+ [147Pr]
1754
+ [76BrH]
1755
+ [102Rh]
1756
+ [238Np]
1757
+ [185Os]
1758
+ [246Am]
1759
+ [233Np]
1760
+ [166Dy]
1761
+ [254Es]
1762
+ [244Cf]
1763
+ [193Os]
1764
+ [245Am]
1765
+ [245Bk]
1766
+ [239Am]
1767
+ [238Am]
1768
+ [97Nb]
1769
+ [245Pu]
1770
+ [254Cf]
1771
+ [188W]
1772
+ [250Es]
1773
+ [251Es]
1774
+ [237Am]
1775
+ [182Hf]
1776
+ [258Md]
1777
+ [232Np]
1778
+ [238Cm]
1779
+ [60Fe]
1780
+ [109Pd+2]
1781
+ [234Pu]
1782
+ [141Ce+3]
1783
+ [136Nd]
1784
+ [136Pr]
1785
+ [173Ta]
1786
+ [110Ru]
1787
+ [147Tb]
1788
+ [253Fm]
1789
+ [139Nd]
1790
+ [178Re]
1791
+ [177Re]
1792
+ [200Au]
1793
+ [182Re]
1794
+ [156Tb]
1795
+ [155Tb]
1796
+ [157Tb]
1797
+ [161Tb]
1798
+ [161Ho]
1799
+ [167Tm]
1800
+ [173Lu]
1801
+ [179Ta]
1802
+ [171Er]
1803
+ [44Sc]
1804
+ [49Sc]
1805
+ [49V]
1806
+ [51Mn]
1807
+ [90Nb]
1808
+ [88Nb]
1809
+ [88Zr]
1810
+ [36SH2]
1811
+ [174Yb]
1812
+ [178Lu]
1813
+ [179W]
1814
+ [83BrH]
1815
+ [107Cd]
1816
+ [75BrH]
1817
+ [62Co]
1818
+ [48Cr]
1819
+ [63Zn]
1820
+ [102Ag]
1821
+ [154Sm]
1822
+ [168Er]
1823
+ [65Ni]
1824
+ [137La]
1825
+ [187Ir]
1826
+ [144Pm]
1827
+ [146Pm]
1828
+ [160Gd]
1829
+ [166Yb]
1830
+ [162Dy]
1831
+ [47V]
1832
+ [141Nd]
1833
+ [141Sm]
1834
+ [166Er]
1835
+ [150Sm]
1836
+ [146Eu]
1837
+ [149Eu]
1838
+ [174Lu]
1839
+ [17NH3]
1840
+ [102Ru]
1841
+ [170Hf]
1842
+ [188Pt]
1843
+ [61Ni]
1844
+ [56Ni]
1845
+ [149Gd]
1846
+ [151Gd]
1847
+ [141Pm]
1848
+ [147Gd]
1849
+ [146Gd]
1850
+ [161Er]
1851
+ [103Ag]
1852
+ [145Eu]
1853
+ [153Tb]
1854
+ [155Dy]
1855
+ [184Re]
1856
+ [180Os]
1857
+ [182Os]
1858
+ [186Pt]
1859
+ [181Os]
1860
+ [181Re]
1861
+ [151Tb]
1862
+ [178Ta]
1863
+ [178W]
1864
+ [189Pt]
1865
+ [194Hg]
1866
+ [145Sm]
1867
+ [150Tb]
1868
+ [132La]
1869
+ [158Gd]
1870
+ [104Ag]
1871
+ [193Hg]
1872
+ [94Ru]
1873
+ [137Pr]
1874
+ [155Ho]
1875
+ [117Cd]
1876
+ [99Ru]
1877
+ [146Nd]
1878
+ [218Rn]
1879
+ [95Y]
1880
+ [79Kr]
1881
+ [120IH]
1882
+ [138Pr]
1883
+ [100Pd]
1884
+ [166Tm]
1885
+ [90Mo]
1886
+ [151Nd]
1887
+ [231U]
1888
+ [138Nd]
1889
+ [89Nb]
1890
+ [98Nb]
1891
+ [162Ho]
1892
+ [142Sm]
1893
+ [186Ta]
1894
+ [104Tc]
1895
+ [184Ta]
1896
+ [185Ta]
1897
+ [170Er]
1898
+ [107Rh]
1899
+ [131La]
1900
+ [169Lu]
1901
+ [74BrH]
1902
+ [150Pm]
1903
+ [172Tm]
1904
+ [197Pt]
1905
+ [230Pu]
1906
+ [170Lu]
1907
+ [86Zr]
1908
+ [176W]
1909
+ [177W]
1910
+ [101Pd]
1911
+ [105Pd]
1912
+ [108Pd]
1913
+ [149Nd]
1914
+ [164Ho]
1915
+ [159Ho]
1916
+ [167Ho]
1917
+ [176Yb]
1918
+ [156Sm]
1919
+ [77BrH]
1920
+ [189Re]
1921
+ [99Rh]
1922
+ [100Rh]
1923
+ [151Pm]
1924
+ [232Pa]
1925
+ [228Pa]
1926
+ [230Pa]
1927
+ [66Ni]
1928
+ [194Os]
1929
+ [135La]
1930
+ [138La]
1931
+ [141La]
1932
+ [142La]
1933
+ [195Ir]
1934
+ [96Nb]
1935
+ [157Ho]
1936
+ [183Hf]
1937
+ [162Tm]
1938
+ [172Er]
1939
+ [148Eu]
1940
+ [150Eu]
1941
+ [15CH4]
1942
+ [89Kr]
1943
+ [143La]
1944
+ [58Ni]
1945
+ [61Co]
1946
+ [158Eu]
1947
+ [165Er]
1948
+ [167Yb]
1949
+ [173Tm]
1950
+ [175Tm]
1951
+ [172Hf]
1952
+ [172Lu]
1953
+ [93Tc]
1954
+ [177Yb]
1955
+ [124IH]
1956
+ [194Ir]
1957
+ [147Eu]
1958
+ [101Mo]
1959
+ [180Hf]
1960
+ [189Ir]
1961
+ [87Y]
1962
+ [43Sc]
1963
+ [195Au]
1964
+ [112Ag]
1965
+ [84BrH]
1966
+ [106Ag]
1967
+ [109Ag]
1968
+ [101Rh]
1969
+ [162Yb]
1970
+ [228Rn]
1971
+ [139Pr]
1972
+ [94Y]
1973
+ [201Au]
1974
+ [40PH3]
1975
+ [110Ag+]
1976
+ [104Cd]
1977
+ [133Ba+2]
1978
+ [226Ac]
1979
+ [145Gd]
1980
+ [186Ir]
1981
+ [184Ir]
1982
+ [224Rn]
1983
+ [185Ir]
1984
+ [182Ir]
1985
+ [184Hf]
1986
+ [200Pt]
1987
+ [227Pa]
1988
+ [178Yb]
1989
+ [72Br-]
1990
+ [72BrH]
1991
+ [248Am]
1992
+ [238Th]
1993
+ [161Gd]
1994
+ [35S-2]
1995
+ [107Ag]
1996
+ [FeH6-4]
1997
+ [89Sr]
1998
+ [SnH3-]
1999
+ [SeH3]
2000
+ [TeH3+]
2001
+ [SbH4+]
2002
+ [AsH4+]
2003
+ [4He]
2004
+ [AsH3-]
2005
+ [1HH]
2006
+ [3H+]
2007
+ [82Rb]
2008
+ [85Sr]
2009
+ [90Sr]
2010
+ [137Cs]
2011
+ [133Ba]
2012
+ [131Cs]
2013
+ [SbH5]
2014
+ [224Ra]
2015
+ [22Na]
2016
+ [210Bi]
2017
+ [214Bi]
2018
+ [228Ra]
2019
+ [127Sb]
2020
+ [136Cs]
2021
+ [125Sb]
2022
+ [134Cs]
2023
+ [140Ba]
2024
+ [45Ca]
2025
+ [206Pb]
2026
+ [207Pb]
2027
+ [24Na]
2028
+ [86Rb]
2029
+ [212Bi]
2030
+ [208Pb]
2031
+ [124Sb]
2032
+ [204Pb]
2033
+ [44K]
2034
+ [129Te]
2035
+ [113Sn]
2036
+ [204Tl]
2037
+ [87Sr]
2038
+ [208Tl]
2039
+ [87Rb]
2040
+ [47Ca]
2041
+ [135Cs]
2042
+ [216Po]
2043
+ [137Ba]
2044
+ [207Bi]
2045
+ [212Po]
2046
+ [79Se]
2047
+ [223Ra]
2048
+ [86Sr]
2049
+ [122Sb]
2050
+ [26Al]
2051
+ [32Si]
2052
+ [126Sn]
2053
+ [225Ra]
2054
+ [114In]
2055
+ [72Ga]
2056
+ [132Te]
2057
+ [10Be]
2058
+ [125Sn]
2059
+ [73As]
2060
+ [206Bi]
2061
+ [117Sn]
2062
+ [40Ca]
2063
+ [41Ca]
2064
+ [89Rb]
2065
+ [116In]
2066
+ [129Sb]
2067
+ [91Sr]
2068
+ [71Ge]
2069
+ [139Ba]
2070
+ [69Ga]
2071
+ [120Sb]
2072
+ [121Sn]
2073
+ [123Sn]
2074
+ [131Te]
2075
+ [77Ge]
2076
+ [135Ba]
2077
+ [82Sr]
2078
+ [43K]
2079
+ [131Ba]
2080
+ [92Sr]
2081
+ [88Rb]
2082
+ [129Cs]
2083
+ [144Cs]
2084
+ [127Cs]
2085
+ [200Tl]
2086
+ [202Tl]
2087
+ [141Ba]
2088
+ [117Sb]
2089
+ [116Sb]
2090
+ [78As]
2091
+ [131Sb]
2092
+ [126Sb]
2093
+ [128Sb]
2094
+ [130Sb]
2095
+ [67Ge]
2096
+ [68Ge]
2097
+ [78Ge]
2098
+ [66Ge]
2099
+ [223Fr]
2100
+ [132Cs]
2101
+ [125Cs]
2102
+ [138Cs]
2103
+ [133Te]
2104
+ [84Rb]
2105
+ [83Rb]
2106
+ [81Rb]
2107
+ [142Ba]
2108
+ [200Bi]
2109
+ [115Sb]
2110
+ [194Tl]
2111
+ [70Se]
2112
+ [112In]
2113
+ [118Sb]
2114
+ [70Ga]
2115
+ [27Mg]
2116
+ [202Bi]
2117
+ [83Se]
2118
+ [9Li]
2119
+ [69As]
2120
+ [79Rb]
2121
+ [81Sr]
2122
+ [83Sr]
2123
+ [78Se]
2124
+ [109In]
2125
+ [29Al]
2126
+ [118Sn]
2127
+ [117In]
2128
+ [119Sb]
2129
+ [114Sn]
2130
+ [138Ba]
2131
+ [69Ge]
2132
+ [73Ga]
2133
+ [74Ge]
2134
+ [206Tl]
2135
+ [199Tl]
2136
+ [130Cs]
2137
+ [28Mg]
2138
+ [116Te]
2139
+ [112Sn]
2140
+ [126Ba]
2141
+ [211Bi]
2142
+ [81Se]
2143
+ [127Sn]
2144
+ [143Cs]
2145
+ [134Te]
2146
+ [80Sr]
2147
+ [45K]
2148
+ [215Po]
2149
+ [207Po]
2150
+ [111Sn]
2151
+ [211Po]
2152
+ [128Ba]
2153
+ [198Tl]
2154
+ [227Ra]
2155
+ [213Po]
2156
+ [220Ra]
2157
+ [128Sn]
2158
+ [203Po]
2159
+ [205Po]
2160
+ [65Ga]
2161
+ [197Tl]
2162
+ [88Sr]
2163
+ [110In]
2164
+ [31Si]
2165
+ [201Bi]
2166
+ [121Te]
2167
+ [205Bi]
2168
+ [203Bi]
2169
+ [195Tl]
2170
+ [209Tl]
2171
+ [110Sn]
2172
+ [222Fr]
2173
+ [207At]
2174
+ [119In]
2175
+ [As@]
2176
+ [129IH]
2177
+ [157Dy]
2178
+ [111IH]
2179
+ [230Ra]
2180
+ [144Pr+3]
2181
+ [SiH3+]
2182
+ [3He]
2183
+ [AsH5]
2184
+ [72Se]
2185
+ [95Tc]
2186
+ [103Pd]
2187
+ [121Sn+2]
2188
+ [211Rn]
2189
+ [38SH2]
2190
+ [127IH]
2191
+ [74Br-]
2192
+ [133I-]
2193
+ [100Tc+4]
2194
+ [100Tc]
2195
+ [36Cl-]
2196
+ [89Y+3]
2197
+ [104Rh]
2198
+ [152Sm]
2199
+ [226Ra]
2200
+ [19FH]
2201
+ [104Pd]
2202
+ [148Gd]
2203
+ [157Lu]
2204
+ [33SH2]
2205
+ [121I-]
2206
+ [17FH]
2207
+ [71Se]
2208
+ [157Sm]
2209
+ [148Tb]
2210
+ [164Dy]
2211
+ [15OH2]
2212
+ [15O+]
2213
+ [39K]
2214
+ [40Ar]
2215
+ [50Cr+3]
2216
+ [50Cr]
2217
+ [52Ti]
2218
+ [103Pd+2]
2219
+ [130Ba]
2220
+ [142Pm]
2221
+ [153Gd+3]
2222
+ [151Eu]
2223
+ [103Rh]
2224
+ [124Xe]
2225
+ [152Tb]
2226
+ [17OH2]
2227
+ [20Ne]
2228
+ [52Fe]
2229
+ [94Zr+4]
2230
+ [94Zr]
2231
+ [149Pr]
2232
+ [16OH2]
2233
+ [53Cr+6]
2234
+ [53Cr]
2235
+ [81Br-]
2236
+ [112Pd]
2237
+ [125Xe]
2238
+ [155Gd]
2239
+ [157Gd]
2240
+ [168Yb]
2241
+ [184Os]
2242
+ [166Tb]
2243
+ [221Fr]
2244
+ [212Ra]
2245
+ [75Br-]
2246
+ [79Br-]
2247
+ [113Ag]
2248
+ [23Na]
2249
+ [34Cl-]
2250
+ [34ClH]
2251
+ [38Cl-]
2252
+ [56Fe]
2253
+ [68Cu]
2254
+ [77Br-]
2255
+ [90Zr+4]
2256
+ [90Zr]
2257
+ [102Pd]
2258
+ [154Eu+3]
2259
+ [57Mn]
2260
+ [165Tm]
2261
+ [152Dy]
2262
+ [217At]
2263
+ [77se]
2264
+ [13cH-]
2265
+ [122Te]
2266
+ [156Gd]
2267
+ [124Te]
2268
+ [53Ni]
2269
+ [131Xe]
2270
+ [174Hf+4]
2271
+ [174Hf]
2272
+ [76Se]
2273
+ [168Tm]
2274
+ [167Dy]
2275
+ [154Gd]
2276
+ [95Ru]
2277
+ [210At]
2278
+ [85Br]
2279
+ [59Co]
2280
+ [122Xe]
2281
+ [27Al]
2282
+ [54Cr]
2283
+ [198Hg]
2284
+ [85Rb+]
2285
+ [214Tl]
2286
+ [229Rn]
2287
+ [218Pb]
2288
+ [218Bi]
2289
+ [167Tm+3]
2290
+ [18o+]
2291
+ [P@@H+]
2292
+ [P@H+]
2293
+ [13N+]
2294
+ [212Pb+2]
2295
+ [217Bi]
2296
+ [249Cf+2]
2297
+ [18OH3+]
2298
+ [90Sr-]
2299
+ [Cf+3]
2300
+ [200Hg]
2301
+ [86Tc]
2302
+ [141Pr+3]
2303
+ [141Pr]
2304
+ [16nH]
2305
+ [14NH4+]
2306
+ [132Xe]
2307
+ [83Kr]
2308
+ [70Zn+2]
2309
+ [137Ba+2]
2310
+ [36Ar]
2311
+ [38Ar]
2312
+ [21Ne]
2313
+ [126Xe]
2314
+ [136Xe]
2315
+ [128Xe]
2316
+ [134Xe]
2317
+ [84Kr]
2318
+ [86Kr]
2319
+ [78Kr]
2320
+ [80Kr]
2321
+ [82Kr]
2322
+ [67Zn+2]
2323
+ [65Cu+2]
2324
+ [110Te]
2325
+ [58Fe+3]
2326
+ [142Nd]
2327
+ [38K]
2328
+ [198Au+3]
2329
+ [122IH]
2330
+ [38PH3]
2331
+ [130I-]
2332
+ [40K+]
2333
+ [38K+]
2334
+ [28Mg+2]
2335
+ [208Tl+]
2336
+ [13OH2]
2337
+ [198Bi]
2338
+ [192Bi]
2339
+ [194Bi]
2340
+ [196Bi]
2341
+ [132I-]
2342
+ [83Sr+2]
2343
+ [169Er+3]
2344
+ [122I-]
2345
+ [120I-]
2346
+ [92Sr+2]
2347
+ [126I-]
2348
+ [24Mg]
2349
+ [84Sr]
2350
+ [118Pd+2]
2351
+ [118Pd]
2352
+ [AsH4]
2353
+ [127I-]
2354
+ [9C-]
2355
+ [11CH3+]
2356
+ [17B]
2357
+ [7B]
2358
+ [4HH]
2359
+ [18C-]
2360
+ [22CH3-]
2361
+ [22CH4]
2362
+ [17C-]
2363
+ [15CH3]
2364
+ [16CH3]
2365
+ [11NH3]
2366
+ [21NH3]
2367
+ [11N-]
2368
+ [11NH]
2369
+ [16CH]
2370
+ [17CH2]
2371
+ [99Ru+2]
2372
+ [181Ta+2]
2373
+ [181Ta]
2374
+ [20CH]
2375
+ [32PH2]
2376
+ [55Fe+2]
2377
+ [SH3]
2378
+ [S@H]
2379
+ [Mn-]
2380
+ [IH4]
2381
+ [ThH]
2382
+ [GaH-]
2383
+ [BiH+]
2384
+ [EuH2]
2385
+ [FeH4-3]
2386
+ [FeH6]
2387
+ [IH5]
2388
+ [NiH+]
2389
+ [SrH2]
2390
+ [VH4]
2391
+ [YH3]
2392
+ [seH+]
2393
+ <unk>
smi-ted/inference/smi_ted_light/load.py ADDED
@@ -0,0 +1,642 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ PATTERN = "(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\(|\)|\.|=|#|-|\+|\\\\|\/|:|~|@|\?|>|\*|\$|\%[0-9]{2}|[0-9])"
2
+ # Deep learning
3
+ import torch
4
+ import torch.nn as nn
5
+ import torch.nn.functional as F
6
+ import torch.backends.cudnn as cudnn
7
+
8
+ # Transformers
9
+ from fast_transformers.attention import AttentionLayer
10
+ from fast_transformers.events import QKVEvent
11
+ from fast_transformers.transformers import TransformerEncoder, TransformerEncoderLayer
12
+ from fast_transformers.builders.transformer_builders import BaseTransformerEncoderBuilder
13
+ from fast_transformers.builders.attention_builders import AttentionBuilder
14
+ from fast_transformers.feature_maps import GeneralizedRandomFeatures
15
+ from fast_transformers.masking import LengthMask
16
+ from transformers import BertTokenizer
17
+ from huggingface_hub import hf_hub_download
18
+
19
+ # Data
20
+ import numpy as np
21
+ import pandas as pd
22
+
23
+ # Chemistry
24
+ from rdkit import Chem
25
+ from rdkit.Chem import PandasTools
26
+ from rdkit.Chem import Descriptors
27
+ PandasTools.RenderImagesInAllDataFrames(True)
28
+
29
+ # Standard library
30
+ from functools import partial
31
+ import regex as re
32
+ import random
33
+ import os
34
+ import gc
35
+ from tqdm import tqdm
36
+ tqdm.pandas()
37
+
38
+
39
+ # function to canonicalize SMILES
40
+ def normalize_smiles(smi, canonical=True, isomeric=False):
41
+ try:
42
+ normalized = Chem.MolToSmiles(
43
+ Chem.MolFromSmiles(smi), canonical=canonical, isomericSmiles=isomeric
44
+ )
45
+ except:
46
+ normalized = None
47
+ return normalized
48
+
49
+
50
+ class MolTranBertTokenizer(BertTokenizer):
51
+ def __init__(self, vocab_file: str = '',
52
+ do_lower_case=False,
53
+ unk_token='<pad>',
54
+ sep_token='<eos>',
55
+ pad_token='<pad>',
56
+ cls_token='<bos>',
57
+ mask_token='<mask>',
58
+ **kwargs):
59
+ super().__init__(vocab_file,
60
+ unk_token=unk_token,
61
+ sep_token=sep_token,
62
+ pad_token=pad_token,
63
+ cls_token=cls_token,
64
+ mask_token=mask_token,
65
+ **kwargs)
66
+
67
+ self.regex_tokenizer = re.compile(PATTERN)
68
+ self.wordpiece_tokenizer = None
69
+ self.basic_tokenizer = None
70
+ with open(vocab_file) as f:
71
+ self.padding_idx = f.readlines().index(pad_token+'\n')
72
+
73
+ def _tokenize(self, text):
74
+ split_tokens = self.regex_tokenizer.findall(text)
75
+ return split_tokens
76
+
77
+ def convert_idx_to_tokens(self, idx_tensor):
78
+ tokens = [self.convert_ids_to_tokens(idx) for idx in idx_tensor.tolist()]
79
+ return tokens
80
+
81
+ def convert_tokens_to_string(self, tokens):
82
+ stopwords = ['<bos>', '<eos>']
83
+ clean_tokens = [word for word in tokens if word not in stopwords]
84
+ out_string = ''.join(clean_tokens)
85
+ return out_string
86
+
87
+ def get_padding_idx(self):
88
+ return self.padding_idx
89
+
90
+ def idx_to_smiles(self, torch_model, idx):
91
+ '''Convert tokens idx back to SMILES text'''
92
+ rev_tokens = torch_model.tokenizer.convert_idx_to_tokens(idx)
93
+ flat_list_tokens = [item for sublist in rev_tokens for item in sublist]
94
+ decoded_smiles = torch_model.tokenizer.convert_tokens_to_string(flat_list_tokens)
95
+ return decoded_smiles
96
+
97
+
98
+ ## Transformer layers
99
+ class RotaryEmbedding(torch.nn.Module):
100
+
101
+ def __init__(self, dim, base=10000):
102
+ super().__init__()
103
+ inv_freq = 1. / (base ** (torch.arange(0, dim, 2).float() / dim))
104
+ self.register_buffer('inv_freq', inv_freq)
105
+ self.seq_len_cached = 0
106
+ self.cos_cached = None
107
+ self.sin_cached = None
108
+
109
+ def forward(self, x, seq_dim=1):
110
+ seq_len = x.shape[seq_dim]
111
+ if seq_len != self.seq_len_cached:
112
+ self.seq_len_cached = seq_len
113
+
114
+ t = torch.arange(x.shape[seq_dim], device=x.device).type_as(self.inv_freq)
115
+ freqs = torch.einsum('i,j->ij', t, self.inv_freq)
116
+ emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
117
+
118
+ self.cos_cached = emb.cos()[None,:, None, :]
119
+ self.sin_cached = emb.sin()[None,:, None, :]
120
+
121
+ return self.cos_cached, self.sin_cached
122
+
123
+ def rotate_half(x):
124
+ x1, x2 = x[..., :x.shape[-1] // 2], x[..., x.shape[-1] // 2:]
125
+ return torch.cat((-x2, x1), dim=x1.ndim - 1) # dim=-1 triggers a bug in earlier torch versions
126
+
127
+ @torch.jit.script
128
+ def apply_rotary_pos_emb(q, k, cos, sin):
129
+ return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
130
+
131
+ class RotateAttentionLayer(AttentionLayer):
132
+ """Rotate attention layer inherits from fast_transformer attention layer.
133
+ The only thing added is an Embedding encoding, for more information
134
+ on the attention layer see the fast_transformers code
135
+ """
136
+ def __init__(self, attention, d_model, n_heads, d_keys=None,
137
+ d_values=None, event_dispatcher=""):
138
+ super(RotateAttentionLayer, self).__init__(attention,d_model, n_heads, d_keys=d_keys,
139
+ d_values=d_values, event_dispatcher=event_dispatcher)
140
+
141
+ self.rotaryemb = RotaryEmbedding(d_keys)
142
+ print('Using Rotation Embedding')
143
+
144
+ def forward(self, queries, keys, values, attn_mask, query_lengths,
145
+ key_lengths):
146
+ """
147
+ Using the same frame work as the fast_Transformers attention layer
148
+ but injecting rotary information to the queries and the keys
149
+ after the keys and queries are projected.
150
+ In the argument description we make use of the following sizes
151
+ - N: the batch size
152
+ - L: The maximum length of the queries
153
+ - S: The maximum length of the keys (the actual length per sequence
154
+ is given by the length mask)
155
+ - D: The input feature dimensionality passed in the constructor as
156
+ 'd_model'
157
+ Arguments
158
+ ---------
159
+ queries: (N, L, D) The tensor containing the queries
160
+ keys: (N, S, D) The tensor containing the keys
161
+ values: (N, S, D) The tensor containing the values
162
+ attn_mask: An implementation of BaseMask that encodes where each
163
+ query can attend to
164
+ query_lengths: An implementation of BaseMask that encodes how
165
+ many queries each sequence in the batch consists of
166
+ key_lengths: An implementation of BaseMask that encodes how
167
+ many queries each sequence in the batch consists of
168
+ Returns
169
+ -------
170
+ The new value for each query as a tensor of shape (N, L, D).
171
+ """
172
+ # Extract the dimensions into local variables
173
+ N, L, _ = queries.shape
174
+ _, S, _ = keys.shape
175
+ H = self.n_heads
176
+
177
+ # Project the queries/keys/values
178
+ queries = self.query_projection(queries).view(N, L, H, -1)
179
+ keys = self.key_projection(keys).view(N, S, H, -1)
180
+ cos, sin = self.rotaryemb(queries)
181
+ queries, keys = apply_rotary_pos_emb(queries, keys, cos, sin)
182
+ values = self.value_projection(values).view(N, S, H, -1)
183
+ # Let the world know of the qkv
184
+ self.event_dispatcher.dispatch(QKVEvent(self, queries, keys, values))
185
+
186
+
187
+ # Compute the attention
188
+ new_values = self.inner_attention(
189
+ queries,
190
+ keys,
191
+ values,
192
+ attn_mask,
193
+ query_lengths,
194
+ key_lengths
195
+ ).view(N, L, -1)
196
+
197
+ # Project the output and return
198
+ return self.out_projection(new_values)
199
+
200
+ class RotateEncoderBuilder(BaseTransformerEncoderBuilder):
201
+ """Build a batch transformer encoder with Relative Rotary embeddings
202
+ for training or processing of sequences all elements at a time.
203
+ Example usage:
204
+ builder = RotateEncoderBuilder()
205
+ builder.n_layers = 12
206
+ builder.n_heads = 8
207
+ builder.feed_forward_dimensions = 1024
208
+ builder.query_dimensions = 64
209
+ builder.value_dimensions = 64
210
+ builder.dropout = 0.1
211
+ builder.attention_dropout = 0.1
212
+ builder.attention_type = "linear"
213
+ transformer = builder.get()
214
+ """
215
+ def _get_attention_builder(self):
216
+ """Return an instance of the appropriate attention builder."""
217
+ return AttentionBuilder()
218
+
219
+ def _get_attention_layer_class(self):
220
+ """Return the class for the layer that projects queries keys and
221
+ values."""
222
+ return RotateAttentionLayer
223
+
224
+ def _get_encoder_class(self):
225
+ """Return the class for the transformer encoder."""
226
+ return TransformerEncoder
227
+
228
+ def _get_encoder_layer_class(self):
229
+ """Return the class for the transformer encoder layer."""
230
+ return TransformerEncoderLayer
231
+
232
+
233
+ class AutoEncoderLayer(nn.Module):
234
+
235
+ def __init__(self, feature_size, latent_size):
236
+ super().__init__()
237
+ self.encoder = self.Encoder(feature_size, latent_size)
238
+ self.decoder = self.Decoder(feature_size, latent_size)
239
+
240
+ class Encoder(nn.Module):
241
+
242
+ def __init__(self, feature_size, latent_size):
243
+ super().__init__()
244
+ self.is_cuda_available = torch.cuda.is_available()
245
+ self.fc1 = nn.Linear(feature_size, latent_size)
246
+ self.ln_f = nn.LayerNorm(latent_size)
247
+ self.lat = nn.Linear(latent_size, latent_size, bias=False)
248
+
249
+ def forward(self, x):
250
+ if self.is_cuda_available:
251
+ self.fc1.cuda()
252
+ self.ln_f.cuda()
253
+ self.lat.cuda()
254
+ x = x.cuda()
255
+ x = F.gelu(self.fc1(x))
256
+ x = self.ln_f(x)
257
+ x = self.lat(x)
258
+ return x # -> (N, D)
259
+
260
+ class Decoder(nn.Module):
261
+
262
+ def __init__(self, feature_size, latent_size):
263
+ super().__init__()
264
+ self.is_cuda_available = torch.cuda.is_available()
265
+ self.fc1 = nn.Linear(latent_size, latent_size)
266
+ self.ln_f = nn.LayerNorm(latent_size)
267
+ self.rec = nn.Linear(latent_size, feature_size, bias=False)
268
+
269
+ def forward(self, x):
270
+ if self.is_cuda_available:
271
+ self.fc1.cuda()
272
+ self.ln_f.cuda()
273
+ self.rec.cuda()
274
+ x = x.cuda()
275
+ x = F.gelu(self.fc1(x))
276
+ x = self.ln_f(x)
277
+ x = self.rec(x)
278
+ return x # -> (N, L*D)
279
+
280
+
281
+ class LangLayer(nn.Module):
282
+
283
+ def __init__(self, n_embd, n_vocab):
284
+ super().__init__()
285
+ self.is_cuda_available = torch.cuda.is_available()
286
+ self.embed = nn.Linear(n_embd, n_embd)
287
+ self.ln_f = nn.LayerNorm(n_embd)
288
+ self.head = nn.Linear(n_embd, n_vocab, bias=False)
289
+
290
+ def forward(self, tensor):
291
+ if self.is_cuda_available:
292
+ self.embed.cuda()
293
+ self.ln_f.cuda()
294
+ self.head.cuda()
295
+ tensor = tensor.cuda()
296
+ tensor = self.embed(tensor)
297
+ tensor = F.gelu(tensor)
298
+ tensor = self.ln_f(tensor)
299
+ tensor = self.head(tensor)
300
+ return tensor
301
+
302
+
303
+ class Net(nn.Module):
304
+
305
+ def __init__(self, smiles_embed_dim, n_output=1, dropout=0.2):
306
+ super().__init__()
307
+ self.desc_skip_connection = True
308
+ self.fc1 = nn.Linear(smiles_embed_dim, smiles_embed_dim)
309
+ self.dropout1 = nn.Dropout(dropout)
310
+ self.relu1 = nn.GELU()
311
+ self.fc2 = nn.Linear(smiles_embed_dim, smiles_embed_dim)
312
+ self.dropout2 = nn.Dropout(dropout)
313
+ self.relu2 = nn.GELU()
314
+ self.final = nn.Linear(smiles_embed_dim, n_output)
315
+
316
+ def forward(self, smiles_emb, multitask=False):
317
+ x_out = self.fc1(smiles_emb)
318
+ x_out = self.dropout1(x_out)
319
+ x_out = self.relu1(x_out)
320
+
321
+ if self.desc_skip_connection is True:
322
+ x_out = x_out + smiles_emb
323
+
324
+ z = self.fc2(x_out)
325
+ z = self.dropout2(z)
326
+ z = self.relu2(z)
327
+ if self.desc_skip_connection is True:
328
+ z = self.final(z + x_out)
329
+ else:
330
+ z = self.final(z)
331
+
332
+ if multitask:
333
+ return F.sigmoid(z)
334
+ return z
335
+
336
+
337
+ class MoLEncoder(nn.Module):
338
+
339
+ def __init__(self, config, n_vocab):
340
+ super(MoLEncoder, self).__init__()
341
+
342
+ # embeddings
343
+ self.config = config
344
+ self.tok_emb = nn.Embedding(n_vocab, config['n_embd'])
345
+ self.drop = nn.Dropout(config['d_dropout'])
346
+
347
+ # transformer
348
+ builder = RotateEncoderBuilder.from_kwargs(
349
+ n_layers=config['n_layer'],
350
+ n_heads=config['n_head'],
351
+ query_dimensions=config['n_embd']//config['n_head'],
352
+ value_dimensions=config['n_embd']//config['n_head'],
353
+ feed_forward_dimensions=config['n_embd'],
354
+ attention_type='linear',
355
+ # unless we do deterministic_eval here, we will have random outputs
356
+ feature_map=partial(GeneralizedRandomFeatures,
357
+ n_dims=config['num_feats'],
358
+ deterministic_eval=True),
359
+ activation='gelu'
360
+ )
361
+ self.blocks = builder.get()
362
+
363
+ # classification
364
+ self.lang_model = LangLayer(config['n_embd'], n_vocab)
365
+
366
+ def forward(self, idx, mask):
367
+ # transformer encoder
368
+ x = self.tok_emb(idx) # each index maps to a (learnable) vector
369
+ x = self.drop(x)
370
+ x = self.blocks(x, length_mask=LengthMask(mask.sum(-1), max_len=idx.shape[1]))
371
+
372
+ # add padding
373
+ token_embeddings = x
374
+ input_mask_expanded = mask.unsqueeze(-1).expand(token_embeddings.size()).float()
375
+ mask_embeddings = (token_embeddings * input_mask_expanded)
376
+ token_embeddings = F.pad(mask_embeddings, pad=(0, 0, 0, self.config['max_len'] - mask_embeddings.shape[1]), value=0)
377
+
378
+ return token_embeddings
379
+
380
+
381
+ class MoLDecoder(nn.Module):
382
+
383
+ def __init__(self, n_vocab, max_len, n_embd, n_gpu=None):
384
+ super(MoLDecoder, self).__init__()
385
+
386
+ self.max_len = max_len
387
+ self.n_embd = n_embd
388
+ self.n_gpu = n_gpu
389
+ self.autoencoder = AutoEncoderLayer(n_embd*max_len, n_embd)
390
+ self.lang_model = LangLayer(n_embd, n_vocab)
391
+
392
+
393
+ class Smi_ted(nn.Module):
394
+ """materials.smi-ted-Light 289M Parameters"""
395
+
396
+ def __init__(self, tokenizer, config=None):
397
+ super(Smi_ted, self).__init__()
398
+
399
+ # configuration
400
+ self.config = config
401
+ self.tokenizer = tokenizer
402
+ self.padding_idx = tokenizer.get_padding_idx()
403
+ self.n_vocab = len(self.tokenizer.vocab)
404
+ self.is_cuda_available = torch.cuda.is_available()
405
+
406
+ # instantiate modules
407
+ if self.config:
408
+ self.encoder = MoLEncoder(self.config, self.n_vocab)
409
+ self.decoder = MoLDecoder(self.n_vocab, self.config['max_len'], self.config['n_embd'])
410
+ self.net = Net(self.config['n_embd'], n_output=self.config['n_output'], dropout=self.config['d_dropout'])
411
+
412
+ def load_checkpoint(self, ckpt_path):
413
+ # load checkpoint file
414
+ checkpoint = torch.load(ckpt_path, map_location=torch.device('cpu'))
415
+
416
+ # load hyparameters
417
+ self.config = checkpoint['hparams']
418
+ self.max_len = self.config['max_len']
419
+ self.n_embd = self.config['n_embd']
420
+ self._set_seed(self.config['seed'])
421
+
422
+ # instantiate modules
423
+ self.encoder = MoLEncoder(self.config, self.n_vocab)
424
+ self.decoder = MoLDecoder(self.n_vocab, self.max_len, self.n_embd)
425
+ self.net = Net(self.n_embd, n_output=self.config['n_output'] if 'n_output' in self.config else 1, dropout=self.config['d_dropout'])
426
+
427
+ # load weights
428
+ if 'state_dict' in checkpoint:
429
+ if isinstance(checkpoint['state_dict'], list):
430
+ self.encoder.load_state_dict(checkpoint['state_dict'][0], strict=False)
431
+ self.decoder.load_state_dict(checkpoint['state_dict'][1], strict=False)
432
+ else:
433
+ self.load_state_dict(checkpoint['state_dict'], strict=False)
434
+ elif 'MODEL_STATE' in checkpoint:
435
+ self.load_state_dict(checkpoint['MODEL_STATE'], strict=False)
436
+
437
+ # load RNG states each time the model and states are loaded from checkpoint
438
+ if 'rng' in self.config:
439
+ rng = self.config['rng']
440
+ for key, value in rng.items():
441
+ if key =='torch_state':
442
+ torch.set_rng_state(value.cpu())
443
+ elif key =='cuda_state':
444
+ torch.cuda.set_rng_state(value.cpu())
445
+ elif key =='numpy_state':
446
+ np.random.set_state(value)
447
+ elif key =='python_state':
448
+ random.setstate(value)
449
+ else:
450
+ print('unrecognized state')
451
+
452
+ def _set_seed(self, value):
453
+ print('Random Seed:', value)
454
+ random.seed(value)
455
+ torch.manual_seed(value)
456
+ torch.cuda.manual_seed(value)
457
+ torch.cuda.manual_seed_all(value)
458
+ np.random.seed(value)
459
+ cudnn.deterministic = True
460
+ cudnn.benchmark = False
461
+
462
+ def forward(self, smiles, batch_size=100):
463
+ return self.decode(self.encode(smiles, batch_size=batch_size, return_torch=True))
464
+
465
+ def tokenize(self, smiles):
466
+ """Tokenize a string into tokens."""
467
+ if isinstance(smiles, str):
468
+ batch = [smiles]
469
+ else:
470
+ batch = smiles
471
+
472
+ tokens = self.tokenizer(
473
+ batch,
474
+ padding=True,
475
+ truncation=True,
476
+ add_special_tokens=True,
477
+ return_tensors="pt",
478
+ max_length=self.max_len,
479
+ )
480
+
481
+ idx = tokens['input_ids'].clone().detach()
482
+ mask = tokens['attention_mask'].clone().detach()
483
+
484
+ if self.is_cuda_available:
485
+ return idx.cuda(), mask.cuda()
486
+
487
+ return idx, mask
488
+
489
+ def extract_all(self, smiles):
490
+ """Extract all elements from each part of smi-ted. Be careful."""
491
+ # evaluation mode
492
+ self.encoder.eval()
493
+ self.decoder.eval()
494
+ if self.is_cuda_available:
495
+ self.encoder.cuda()
496
+ self.decoder.cuda()
497
+
498
+ # handle single str or a list of str
499
+ smiles = pd.Series(smiles) if isinstance(smiles, str) else pd.Series(list(smiles))
500
+ smiles = smiles.apply(normalize_smiles)
501
+
502
+ # tokenizer
503
+ idx, mask = self.tokenize(smiles.to_list())
504
+
505
+ ###########
506
+ # Encoder #
507
+ ###########
508
+ # encoder forward
509
+ x = self.encoder.tok_emb(idx) # each index maps to a (learnable) vector
510
+ x = self.encoder.drop(x)
511
+ x = self.encoder.blocks(x, length_mask=LengthMask(mask.sum(-1)))
512
+
513
+ # mean pooling
514
+ token_embeddings = x
515
+ input_mask_expanded = mask.unsqueeze(-1).expand(token_embeddings.size()).float()
516
+ sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
517
+ sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
518
+ true_set = sum_embeddings / sum_mask # DO NOT USE THIS FOR DOWNSTREAM TASKS, USE `pred_set` INSTEAD
519
+
520
+ # add padding
521
+ mask_embeddings = (token_embeddings * input_mask_expanded)
522
+ token_embeddings = F.pad(mask_embeddings, pad=(0, 0, 0, self.max_len - mask_embeddings.shape[1]), value=0)
523
+ idx = F.pad(idx, pad=(0, self.max_len - idx.shape[1], 0, 0), value=2)
524
+
525
+ true_ids = idx
526
+ true_cte = token_embeddings
527
+ true_cte = true_cte.view(-1, self.max_len*self.n_embd)
528
+
529
+ ###########
530
+ # Decoder #
531
+ ###########
532
+ # CTE autoencoder
533
+ pred_set = self.decoder.autoencoder.encoder(true_cte)
534
+ pred_cte = self.decoder.autoencoder.decoder(pred_set)
535
+
536
+ # reconstruct tokens
537
+ pred_ids = self.decoder.lang_model(pred_cte.view(-1, self.max_len, self.n_embd))
538
+ pred_ids = torch.argmax(pred_ids, axis=-1)
539
+
540
+ return ((true_ids, pred_ids), # tokens
541
+ (true_cte, pred_cte), # token embeddings
542
+ (true_set, pred_set)) # smiles embeddings
543
+
544
+ def extract_embeddings(self, smiles):
545
+ """Extract token and SMILES embeddings."""
546
+ # evaluation mode
547
+ self.encoder.eval()
548
+ if self.is_cuda_available:
549
+ self.encoder.cuda()
550
+
551
+ # tokenizer
552
+ idx, mask = self.tokenize(smiles)
553
+
554
+ # encoder forward
555
+ token_embeddings = self.encoder(idx, mask)
556
+
557
+ # aggregate token embeddings (similar to mean pooling)
558
+ # CAUTION: use the embeddings from the autoencoder.
559
+ smiles_embeddings = self.decoder.autoencoder.encoder(token_embeddings.view(-1, self.max_len*self.n_embd))
560
+
561
+ # add padding
562
+ idx = F.pad(idx, pad=(0, self.max_len - idx.shape[1], 0, 0), value=self.padding_idx)
563
+
564
+ return idx, token_embeddings, smiles_embeddings
565
+
566
+ def encode(self, smiles, useCuda=False, batch_size=100, return_torch=False):
567
+ """Extract efficiently SMILES embeddings per batches."""
568
+ # TODO: remove useCuda argument
569
+
570
+ # handle single str or a list of str
571
+ smiles = pd.Series(smiles) if isinstance(smiles, str) else pd.Series(list(smiles))
572
+ smiles = smiles.apply(normalize_smiles)
573
+ n_split = smiles.shape[0] // batch_size if smiles.shape[0] >= batch_size else smiles.shape[0]
574
+
575
+ # process in batches
576
+ embeddings = [
577
+ self.extract_embeddings(list(batch))[2].cpu().detach().numpy()
578
+ for batch in tqdm(np.array_split(smiles, n_split))
579
+ ]
580
+ flat_list = [item for sublist in embeddings for item in sublist]
581
+
582
+ # clear GPU memory
583
+ if self.is_cuda_available:
584
+ torch.cuda.empty_cache()
585
+ gc.collect()
586
+
587
+ if return_torch:
588
+ return torch.tensor(np.array(flat_list))
589
+ return pd.DataFrame(flat_list)
590
+
591
+ def decode(self, smiles_embeddings):
592
+ """Decode SMILES embeddings back to SMILES."""
593
+ # evaluation mode
594
+ self.decoder.eval()
595
+ if self.is_cuda_available:
596
+ self.decoder.cuda()
597
+
598
+ # reconstruct token embeddings
599
+ pred_token_embds = self.decoder.autoencoder.decoder(smiles_embeddings)
600
+
601
+ # reconstruct tokens
602
+ pred_idx = self.decoder.lang_model(pred_token_embds.view(-1, self.max_len, self.n_embd))
603
+ pred_idx = torch.argmax(pred_idx, axis=-1).cpu().detach().numpy()
604
+
605
+ # convert idx to tokens
606
+ pred_smiles = []
607
+ for i in range(pred_idx.shape[0]):
608
+ idx = pred_idx[i]
609
+ smiles = self.tokenizer.idx_to_smiles(self, idx)
610
+ smiles = smiles.replace('<bos>', '') # begin token
611
+ smiles = smiles.replace('<eos>', '') # end token
612
+ smiles = smiles.replace('<pad>', '') # pad token
613
+ pred_smiles.append(smiles)
614
+
615
+ # clear GPU memory
616
+ if self.is_cuda_available:
617
+ torch.cuda.empty_cache()
618
+ gc.collect()
619
+
620
+ return pred_smiles
621
+
622
+ def __str__(self):
623
+ return 'smi-ted-Light'
624
+
625
+
626
+ def load_smi_ted(folder="./smi_ted_light",
627
+ ckpt_filename="smi-ted-Light_40.pt",
628
+ vocab_filename="bert_vocab_curated.txt"
629
+ ):
630
+ repo_id = "ibm/materials.smi-ted"
631
+ filename = "bert_vocab_curated.txt"
632
+ vocab_filename = hf_hub_download(repo_id=repo_id, filename=filename)
633
+ tokenizer = MolTranBertTokenizer(vocab_filename)
634
+ model = Smi_ted(tokenizer)
635
+
636
+ filename = "smi-ted-Light_40.pt"
637
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
638
+ model.load_checkpoint(file_path)
639
+ model.eval()
640
+ print('Vocab size:', len(tokenizer.vocab))
641
+ print(f'[INFERENCE MODE - {str(model)}]')
642
+ return model