File size: 1,889 Bytes
e3eae4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6568ddd
e3eae4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6568ddd
 
e3eae4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# HOMO–LUMO Gap Predictor

This web app uses a trained Graph Neural Network (GNN) to predict HOMO–LUMO energy gaps from molecular SMILES strings. Built with [Streamlit](https://streamlit.io), it enables fast single or batch predictions with visualization.

### Live App

[Click here to launch the app](https://www.willfillinoncedeployed.com)  


---

## Features

- Predict HOMO–LUMO gap for one or many molecules
- Accepts comma-separated SMILES or CSV uploads
- RDKit rendering of molecule structures
- Downloadable CSV of predictions
- Powered by a trained hybrid GNN model with RDKit descriptors

---

## Usage

1. **Input Options**:
   - Type one or more SMILES strings separated by commas
   - OR upload a `.csv` file with a single column of SMILES

2. **Example SMILES**: CC(=O)Oc1ccccc1C(=O)O, C1=CC=CC=C1

3. **CSV Format**:
- One column
- No header 
- Each row contains a SMILES string

4. **Output**:
- Predictions displayed in-browser (up to 10 molecules shown)
- Full results available for download as CSV

---

## Project Structure

streamlit-app/
│
├── app.py # Main Streamlit app
├── model.py # Hybrid GNN architecture and model loader
├── utils.py # RDKit and SMILES processing
├── requirements.txt # Python dependencies
└── predictions.db # SQLite log of predictions 

---

## Requirements

To run locally:
```
pip install -r requirements.txt
streamlit run app.py

```


## Model Info

The app uses a trained hybrid GNN model combining:

* AtomEncoder and BondEncoder from OGB
* GINEConv layers from PyTorch Geometric
* Global mean pooling
* RDKit-based physicochemical descriptors

Trained on the [OGB PCQM4Mv2 dataset](https://ogb.stanford.edu/docs/lsc/pcqm4mv2/), optimized using Optuna


## Author

Developed by [Matthew Graham](https://github.com/MooseML)
For inquiries, collaborations, or ideas, feel free to reach out!