jannisborn's picture
update
a4eba41 unverified
|
raw
history blame
3.08 kB
# Model documentation & parameters
**Algorithm Version**: Which model version to use.
**Target binding energy**: The desired binding energy.
**Primer SMILES**: A SMILES string used to prime the generation.
**Maximal sequence length**: The maximal number of SMILES tokens in the generated molecule.
**Number of points**: Number of points to sample with the Gaussian Process.
**Number of steps**: Number of optimization steps in the Gaussian Process optimization.
**Number of samples**: How many samples should be generated (between 1 and 50).
# Model card -- AdvancedManufacturing
**Model Details**: *AdvancedManufacturing* is a sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.
**Developers**: Oliver Schilter and colleagues from IBM Research.
**Distributors**: Original authors' code integrated into GT4SD.
**Model date**: Not yet published.
**Model version**: Different types of models trained on NCCR data using SMILES or SELFIES, potentially also with augmentation.
**Model type**: A sequence-based molecular generator tuned to generate catalysts. The model relies on a recurrent Variational Autoencoder with a binding-energy predictor trained on the latent code. The framework uses Gaussian Processes for generating targeted molecules.
**Information about training algorithms, parameters, fairness constraints or other applied approaches, and features**:
N.A.
**Paper or other resource for more information**:
TBD
**License**: MIT
**Where to send questions or comments about the model**: Open an issue on [GT4SD repository](https://github.com/GT4SD/gt4sd-core).
**Intended Use. Use cases that were envisioned during development**: Chemical research, in particular drug discovery.
**Primary intended uses/users**: Researchers and computational chemists using the model for model comparison or research exploration purposes.
**Out-of-scope use cases**: Production-level inference, producing molecules with harmful properties.
**Metrics**: N.A.
**Datasets**: Data provided through NCCR.
**Ethical Considerations**: Unclear, please consult with original authors in case of questions.
**Caveats and Recommendations**: Unclear, please consult with original authors in case of questions.
Model card prototype inspired by [Mitchell et al. (2019)](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=XD4eHiE2cRUAAAAA:NL11gMa1hGPOUKTAbtXnbVQBDBbjxwcjGECF_i-WC_3g1aBgU1Hbz_f2b4kI_m1in-w__1ztGeHnwHs)
## Citation
TBD, temporarily please cite:
```bib
@article{manica2022gt4sd,
title={GT4SD: Generative Toolkit for Scientific Discovery},
author={Manica, Matteo and Cadow, Joris and Christofidellis, Dimitrios and Dave, Ashish and Born, Jannis and Clarke, Dean and Teukam, Yves Gaetan Nana and Hoffman, Samuel C and Buchan, Matthew and Chenthamarakshan, Vijil and others},
journal={arXiv preprint arXiv:2207.03928},
year={2022}
}
```