File size: 4,252 Bytes
8e66b23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Model documentation & parameters

**Algorithm Version**: Which model version to use.

**Property goals**: One or multiple properties that will be optimized.

**Protein target**: An AAS of a protein target used for conditioning. Leave blank unless you use `affinity` as a `property goal`.

**Decoding temperature**: The temperature parameter in the SMILES/SELFIES decoder. Higher values lead to more explorative choices, smaller values culminate in mode collapse.

**Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule.

**Number of samples**: How many samples should be generated (between 1 and 50).

**Limit**: Hypercube limits in the latent space.

**Number of steps**: Number of steps for a GP optmization round. The longer the slower. Has to be at least `Number of initial points`.

**Number of initial points**: Number of initial points evaluated. The longer the slower.

**Number of optimization rounds**: Maximum number of optimization rounds.

**Sampling variance**: Variance of the Gaussian noise applied during sampling from the optimal point.

**Samples for evaluation**: Number of samples averaged for each minimization function evaluation. 

**Max. sampling steps**: Maximum number of sampling steps in an optmization round.

**Seed**: The random seed used for initialization.



# Model card -- PaccMannGP

**Model Details**: [PaccMann<sup>GP</sup>](https://github.com/PaccMann/paccmann_gp) is a language-based Variational Autoencoder that is coupled with a GaussianProcess for controlled sampling. This model systematically explores the latent space of a trained molecular VAE.

**Developers**: Jannis Born, Matteo Manica and colleagues from IBM Research.

**Distributors**: Original authors' code wrapped and distributed by GT4SD Team (2023) from IBM Research.

**Model date**: Published in 2022.

**Model version**: A molecular VAE trained on 1.5M molecules from ChEMBL. 

**Model type**: A language-based molecular generative model that can be explored with Gaussian Processes to generate molecules with desired properties.

**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: 
Described in the [original paper](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889).

**Paper or other resource for more information**: 
[Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model (2022; *Journal of Chemical Information & Modeling*)](https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889).

**License**: MIT

**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).

**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.

**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.

**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.

**Factors**: Not applicable.

**Metrics**: High reward on generating molecules with desired properties.

**Datasets**: ChEMBL.

**Ethical Considerations**: Unclear, please consult with original authors in case of questions.

**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.

Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)

## Citation
```bib
@article{born2022active,
	author = {Born, Jannis and Huynh, Tien and Stroobants, Astrid and Cornell, Wendy D. and Manica, Matteo},
	title = {Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model},
	journal = {Journal of Chemical Information and Modeling},
	volume = {62},
	number = {2},
	pages = {240-257},
	year = {2022},
	doi = {10.1021/acs.jcim.1c00889},
	note ={PMID: 34905358},
	URL = {https://doi.org/10.1021/acs.jcim.1c00889}
}
```