| # Model documentation & parameters | |
| **Algorithm Version**: Which model checkpoint to use (trained on different datasets). | |
| **Scaffolds**: One or multiple scaffolds, provided as '.'-separated SMILES. If empty, no scaffolds are used. Note that this is a hard-constraint, | |
| i.e., the scaffold will certainly be present in the generated molecule. If multiple scaffolds are given, they are paired with the seed SMILES | |
| (if applicable) and every molecule will be guaranteed to contain exactly one scaffold. | |
| **Seed SMILES**: One or multiple seed molecules, provided as '.'-separated SMILES. If empty, no scaffolds are used. | |
| There's no guarantee for a seed SMILES (or a substructure of it) to be present in the generated molecule as it's merely used for decoder initialization. | |
| **Number of samples**: How many samples should be generated (between 1 and 50). | |
| **Beam size**: Beam size used in beam search decoding (the higher the slower but better). | |
| **Sigma**: Variance of the Gaussian noise that is added to the latent code (before passing to the decoder). | |
| **Seed**: The random seed used for initialization. | |
| # Model card | |
| **Model Details**: MoLeR is a graph-based molecular generative model that can be conditioned (primed) on scaffolds. The model decorates scaffolds with realistic structural motifs. | |
| **Developers**: Krzysztof Maziarz and co-authors from Microsoft Research and Novartis (full reference at bottom). | |
| **Distributors**: Developer's code wrapped and distributed by GT4SD Team (2023) from IBM Research. | |
| **Model date**: Released around March 2022. | |
| **Model version**: Model provided by original authors, see [their GitHub repo](https://github.com/microsoft/molecule-generation). | |
| **Model type**: An encoder-decoder-based GNN for molecular generation. | |
| **Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**: Trained by the original authors with the default parameters provided [on GitHub](https://github.com/microsoft/molecule-generation). | |
| **Paper or other resource for more information**: [Learning to Extend Molecular Scaffolds with Structural Motifs (ICLR 2022)](https://openreview.net/forum?id=ZTsoE8G3GG). | |
| **License**: MIT | |
| **Where to send questions or comments about the model**: Open an issue on original author's [GitHub repository](https://github.com/microsoft/molecule-generation). | |
| **Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery. | |
| **Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes. | |
| **Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties. | |
| **Factors**: Not applicable. | |
| **Metrics**: Validation loss on decoding correct molecules. Evaluated on several downstream tasks. | |
| **Datasets**: 1.5M drug-like molecules from GuacaMol benchmark. Finetuning on 20 molecular optimization tasks from GuacaMol. | |
| **Ethical Considerations**: Unclear, please consult with original authors in case of questions. | |
| **Caveats and Recommendations**: Unclear, please consult with original authors in case of questions. | |
| Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs) | |
| ## Citation | |
| ```bib | |
| @inproceedings{maziarz2021learning, | |
| author={Krzysztof Maziarz and Henry Richard Jackson{-}Flux and Pashmina Cameron and | |
| Finton Sirockin and Nadine Schneider and Nikolaus Stiefl and Marwin H. S. Segler and Marc Brockschmidt}, | |
| title = {Learning to Extend Molecular Scaffolds with Structural Motifs}, | |
| booktitle = {The Tenth International Conference on Learning Representations, {ICLR}}, | |
| year = {2022} | |
| } | |
| ``` | |