{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Accessing the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to be able to access the data on Hugging Face Hub, we must import the\n", "necessary libraries" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset # Loading datasets from Hugging Face Hub\n", "from pprint import pprint # Pretty print" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After importing the modules, we set a few variables that will be used throughout\n", "this demo." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# path to the dataset repository on the Hugging Face Hub\n", "path = \"molssiai-hub/pubchemqc-b3lyp\"\n", "\n", "# set the dataset configuration/subset name\n", "name = \"b3lyp_pm6\"\n", "\n", "# set the dataset split\n", "split = \"train\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we are going to work with the\n", "[PubChemQC-B3LYP/6-31G*//PM6\n", "Dataset](https://huggingface.co/datasets/molssiai-hub/pubchemqc-b3lyp)\n", "(PubChemQC-B3LYP for short) from the [PubChemQC dataset\n", "collection](https://huggingface.co/collections/molssiai-hub/pubchemqc-datasets-669e5482260861ba7cce3d1c).\n", "Let us load the dataset as shown below" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "IterableDataset({\n", " features: ['cid', 'state', 'pubchem-inchi', 'pubchem-charge', 'pubchem-version', 'name', 'coordinates', 'atomic-numbers', 'atom-count', 'heavy-atom-count', 'core-electrons', 'bond-order', 'connection-indices', 'formula', 'version', 'obabel-inchi', 'pm6-obabel-canonical-smiles', 'charge', 'energy-beta-gap', 'energy-beta-homo', 'energy-beta-lumo', 'energy-alpha-gap', 'energy-alpha-homo', 'energy-alpha-lumo', 'total-energy', 'homos', 'orbital-energies', 'mo-count', 'basis-count', 'multiplicity', 'molecular-mass', 'number-of-atoms', 'lowdin-partial-charges', 'mulliken-partial-charges', 'dipole-moment', 'pubchem-multiplicity', 'pubchem-obabel-canonical-smiles', 'pubchem-isomeric-smiles', 'pubchem-molecular-weight', 'pubchem-molecular-formula'],\n", " n_shards: 430\n", "})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# load the dataset\n", "hub_dataset = load_dataset(path=path,\n", " name=name,\n", " split=split,\n", " streaming=True,\n", " trust_remote_code=True)\n", "\n", "hub_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PubChemQC datasets are very large and downloading them on your local machine can\n", "be a heavy lift for your internet network and disk storage. Therefore, we set\n", "the `streaming` parameter to `True` to avoid downloading the dataset on disk and\n", "ensure streaming the data from the hub. In this mode, the `load_dataset`\n", "function returns an `IterableDataset` object that can be iterated over to access\n", "the data. The `trust_remote_code` argument is also set to `True` to allow the\n", "usage of a custom [load\n", "script](https://huggingface.co/datasets/molssiai-hub/pubchemqc-b3lyp/blob/main/pubchemqc-b3lyp.py)\n", "for the data.\n", "\n", "The PubChemQC-B3LYP dataset is made of several files called `shards` that enable\n", "multiprocessing and parallelization of the data loading process. Multiprocessing\n", "can speed up the loading process significantly if you are absolutely sure that\n", "you have enough CPU cores to handle the load and enough memory and storage space\n", "to download and store the data.\n", "\n", "You can choose the number of processes to use for loading the data by setting\n", "the `num_proc` parameter in the `load_dataset` function.\n", "\n", "```python\n", "\n", " >>> dataset = load_dataset(path=path,\n", " split=split,\n", " streaming=True,\n", " trust_remote_code=True,\n", " num_proc=4)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we create our `Dataset` or `IterableDataset` instance, we can access its\n", "features or column names using the `features` or `column_names` attributes of\n", "the dataset object.\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['cid',\n", " 'state',\n", " 'pubchem-inchi',\n", " 'pubchem-charge',\n", " 'pubchem-version',\n", " 'name',\n", " 'coordinates',\n", " 'atomic-numbers',\n", " 'atom-count',\n", " 'heavy-atom-count',\n", " 'core-electrons',\n", " 'bond-order',\n", " 'connection-indices',\n", " 'formula',\n", " 'version',\n", " 'obabel-inchi',\n", " 'pm6-obabel-canonical-smiles',\n", " 'charge',\n", " 'energy-beta-gap',\n", " 'energy-beta-homo',\n", " 'energy-beta-lumo',\n", " 'energy-alpha-gap',\n", " 'energy-alpha-homo',\n", " 'energy-alpha-lumo',\n", " 'total-energy',\n", " 'homos',\n", " 'orbital-energies',\n", " 'mo-count',\n", " 'basis-count',\n", " 'multiplicity',\n", " 'molecular-mass',\n", " 'number-of-atoms',\n", " 'lowdin-partial-charges',\n", " 'mulliken-partial-charges',\n", " 'dipole-moment',\n", " 'pubchem-multiplicity',\n", " 'pubchem-obabel-canonical-smiles',\n", " 'pubchem-isomeric-smiles',\n", " 'pubchem-molecular-weight',\n", " 'pubchem-molecular-formula']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print the column names\n", "hub_dataset.column_names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `take(n)` to fetch the first $n$ examples from the dataset.\n", "For demonstration purposes, we set $n$ to 2 which yields a list of\n", "two dictionaries, each containing the features of the corresponding\n", "data point." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'cid': 1,\n", " 'state': 'S0',\n", " 'pubchem-inchi': 'InChI=1S/C9H17NO4/c1-7(11)14-8(5-9(12)13)6-10(2,3)4/h8H,5-6H2,1-4H3',\n", " 'pubchem-charge': 0,\n", " 'pubchem-version': '20160829',\n", " 'name': '000000001.B3LYP@PM6.S0',\n", " 'coordinates': [4.543149670829423,\n", " -2.8411897941733857,\n", " -1.6418598810432616,\n", " 5.164339625816055,\n", " -1.776079871333543,\n", " -0.8127099411272803,\n", " 6.303009543349172,\n", " -1.3517299020731781,\n", " -0.8573999378940053,\n", " 4.255369691704345,\n", " -1.2926799063382324,\n", " 0.08501999385276329,\n", " 4.60962966603199,\n", " -0.12267999112548451,\n", " 0.9209899332554315,\n", " 3.2824497621808693,\n", " 0.5602999593942104,\n", " 1.1527099164801087,\n", " 3.3767697553673353,\n", " 1.7012998767313179,\n", " 2.2331298382255156,\n", " 3.9935697106646764,\n", " 1.3808998999270228,\n", " 3.2894897616953807,\n", " 2.881229791246801,\n", " 2.7805597985320656,\n", " 1.9378698595831967,\n", " 5.234679620732287,\n", " -0.683349950484708,\n", " 2.212169839745514,\n", " 6.611009521024224,\n", " -0.1308399905026251,\n", " 2.561759814391751,\n", " 7.676599443790074,\n", " -0.7797299435035565,\n", " 1.7129698758981244,\n", " 6.666809516973522,\n", " 1.3677999008968658,\n", " 2.3902798268023666,\n", " 6.871359502159529,\n", " -0.444549967769325,\n", " 4.023889708442672,\n", " 4.4186396798669705,\n", " -2.4972898190777815,\n", " -2.6834198055950926,\n", " 3.5434197432646983,\n", " -3.130939773157949,\n", " -1.275939907569318,\n", " 5.173619625161661,\n", " -3.7428297288108783,\n", " -1.6751898786439012,\n", " 5.315839614871386,\n", " 0.506469963304397,\n", " 0.3344999757684973,\n", " 2.5048198185388078,\n", " -0.14947998914622534,\n", " 1.4864398923099749,\n", " 2.8906797905706103,\n", " 1.0035699272902916,\n", " 0.2186999841469097,\n", " 5.252789619400259,\n", " -1.7890198704060034,\n", " 2.186789841567616,\n", " 4.524179672197005,\n", " -0.36438997361718856,\n", " 3.068649777657863,\n", " 7.449789460262727,\n", " -0.6580899523412297,\n", " 0.6238299547913445,\n", " 7.7404594391760275,\n", " -1.8650498648728673,\n", " 1.890259863072561,\n", " 8.670379371800149,\n", " -0.34142997526909646,\n", " 1.891069862987387,\n", " 6.552169525296595,\n", " 1.671489878887934,\n", " 1.3366699031511258,\n", " 7.59468944975336,\n", " 1.8007598695576095,\n", " 2.782579798388253,\n", " 5.791549580395056,\n", " 1.8473198661374974,\n", " 2.9428997867983613,\n", " 6.800889507241287,\n", " -1.5241398895588907,\n", " 4.229209693567309,\n", " 6.113409557071307,\n", " 0.06863999502835459,\n", " 4.660949662319101,\n", " 7.864189430232259,\n", " -0.0930299932397971,\n", " 4.345629685137421],\n", " 'atomic-numbers': [6,\n", " 6,\n", " 8,\n", " 8,\n", " 6,\n", " 6,\n", " 6,\n", " 8,\n", " 8,\n", " 6,\n", " 7,\n", " 6,\n", " 6,\n", " 6,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1],\n", " 'atom-count': 31,\n", " 'heavy-atom-count': 14,\n", " 'core-electrons': [0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0,\n", " 0],\n", " 'bond-order': [1,\n", " 1,\n", " 1,\n", " 1,\n", " 2,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 2,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1,\n", " 1],\n", " 'connection-indices': [15,\n", " 1,\n", " 17,\n", " 1,\n", " 1,\n", " 16,\n", " 1,\n", " 2,\n", " 3,\n", " 2,\n", " 2,\n", " 4,\n", " 4,\n", " 5,\n", " 20,\n", " 6,\n", " 18,\n", " 5,\n", " 23,\n", " 12,\n", " 5,\n", " 6,\n", " 5,\n", " 10,\n", " 6,\n", " 19,\n", " 6,\n", " 7,\n", " 26,\n", " 13,\n", " 12,\n", " 24,\n", " 12,\n", " 25,\n", " 12,\n", " 11,\n", " 9,\n", " 7,\n", " 21,\n", " 10,\n", " 10,\n", " 11,\n", " 10,\n", " 22,\n", " 7,\n", " 8,\n", " 13,\n", " 11,\n", " 13,\n", " 27,\n", " 13,\n", " 28,\n", " 11,\n", " 14,\n", " 14,\n", " 29,\n", " 14,\n", " 31,\n", " 14,\n", " 30],\n", " 'formula': 'C9H17NO4',\n", " 'version': '1.0',\n", " 'obabel-inchi': 'InChI=1S/C9H17NO4/c1-7(11)14-8(5-9(12)13)6-10(2,3)4/h8H,5-6H2,1-4H3/t8-/m0/s1',\n", " 'pm6-obabel-canonical-smiles': '[O]C(=O)C[C@@H](C[N](C)(C)C)OC(=O)C',\n", " 'charge': 0,\n", " 'energy-beta-gap': 4.34837933099,\n", " 'energy-beta-homo': -4.60960862747,\n", " 'energy-beta-lumo': -0.2612292964799998,\n", " 'energy-alpha-gap': 4.34837933099,\n", " 'energy-alpha-homo': -4.60960862747,\n", " 'energy-alpha-lumo': -0.2612292964799998,\n", " 'total-energy': -19286.973573267132,\n", " 'homos': [54],\n", " 'orbital-energies': [[-522.303488065215,\n", " -521.209590386205,\n", " -518.042185166385,\n", " -517.742859930835,\n", " -394.46712223881997,\n", " -281.21333766072,\n", " -279.556164311175,\n", " -279.273165906655,\n", " -279.235069967585,\n", " -278.79696666828,\n", " -278.7534284522,\n", " -278.06770154894,\n", " -277.890827546115,\n", " -276.377874537335,\n", " -30.082186172775,\n", " -28.150177834225,\n", " -27.81547779811,\n", " -26.512052454215002,\n", " -24.166431062905,\n", " -22.0412218905,\n", " -20.950045349995,\n", " -20.754123377635,\n", " -20.60446075986,\n", " -19.681994806665003,\n", " -17.831620623265,\n", " -16.88738556203,\n", " -15.630219572720002,\n", " -14.933608115439998,\n", " -14.808435744210001,\n", " -14.484620262115,\n", " -13.8233836054,\n", " -13.145820117655,\n", " -13.07234937802,\n", " -12.65329404825,\n", " -12.460093214395,\n", " -12.26689238054,\n", " -11.975730560505,\n", " -11.722664679540001,\n", " -11.529463845685001,\n", " -11.49408904512,\n", " -11.36891667389,\n", " -11.246465441165,\n", " -11.11312965442,\n", " -10.664141801095,\n", " -10.54713284538,\n", " -9.87229049614,\n", " -9.619224615175,\n", " -9.543032737035,\n", " -8.88723835733,\n", " -8.479067581579999,\n", " -8.411039118955,\n", " -7.894022803005001,\n", " -5.303498946245,\n", " -5.015058264715,\n", " -4.60960862747,\n", " -0.26122929648,\n", " 0.982331000305,\n", " 1.6762213190800002,\n", " 1.9837099701450003,\n", " 2.266708374665,\n", " 2.2993620367250003,\n", " 2.3782750533700003,\n", " 2.5905238567600004,\n", " 2.79188810613,\n", " 3.0395117100849998,\n", " 3.2381548209499997,\n", " 3.485778424905,\n", " 3.5646914415500004,\n", " 3.6871426742750004,\n", " 3.7714979679300003,\n", " 4.291235422385,\n", " 4.359263885010001,\n", " 4.468109425210001,\n", " 4.93614524807,\n", " 5.12662494342,\n", " 5.2844509767100005,\n", " 5.52935344216,\n", " 5.793303877145,\n", " 6.0572543121299995,\n", " 6.381069794225,\n", " 7.17292109918,\n", " 7.921234188055001,\n", " 8.08178135985,\n", " 8.28858788623,\n", " 9.401533534775,\n", " 11.143062177974999,\n", " 11.98933625303,\n", " 12.0818549622,\n", " 12.656015186755,\n", " 12.857379436125,\n", " 13.02881116194,\n", " 13.654673018090001,\n", " 14.212506411615,\n", " 14.533600755205,\n", " 14.59074466381,\n", " 15.63566184973,\n", " 15.877843176675,\n", " 16.08737084156,\n", " 16.419349739170002,\n", " 16.862895315485,\n", " 17.05609614934,\n", " 17.407123016485002,\n", " 17.695563698015,\n", " 18.348636939215,\n", " 18.69966380636,\n", " 18.90647033274,\n", " 19.12960369015,\n", " 19.47246714178,\n", " 19.58675495899,\n", " 19.861589947995,\n", " 20.78405590119,\n", " 21.1160347988,\n", " 21.4425714194,\n", " 21.50243646651,\n", " 21.831694225615,\n", " 22.234422724355003,\n", " 22.277960940435,\n", " 22.58000731449,\n", " 22.952803289675,\n", " 23.322878126355,\n", " 23.57050173031,\n", " 23.60315539237,\n", " 23.6739049935,\n", " 23.95690339802,\n", " 24.43038149789,\n", " 24.754196979985,\n", " 24.792292919055,\n", " 25.06168563105,\n", " 25.21406938733,\n", " 25.29570354248,\n", " 25.461692991285,\n", " 25.739249118794998,\n", " 26.005920692285,\n", " 26.33517845139,\n", " 26.6127345789,\n", " 27.453566376945,\n", " 27.722959088939998,\n", " 28.057659125054997,\n", " 28.770597413365,\n", " 29.211421851174997,\n", " 29.859052815365,\n", " 30.648182981815,\n", " 31.352957854609997,\n", " 31.71759041428,\n", " 32.846862893855004,\n", " 33.12714015987,\n", " 33.230543423060006,\n", " 34.61832406061,\n", " 35.01016800533,\n", " 36.253728302115,\n", " 36.525842152615,\n", " 37.07823326913,\n", " 37.79661383445,\n", " 38.30818787339,\n", " 39.10276031685,\n", " 39.334057089775,\n", " 39.8918904833,\n", " 40.716395450315,\n", " 41.353141860485,\n", " 42.40622246192,\n", " 42.92323877787,\n", " 44.11509744306,\n", " 44.376326739540005,\n", " 44.47428772572,\n", " 45.785876485130004,\n", " 46.436228587825,\n", " 46.78725545497,\n", " 47.2389644468,\n", " 48.0553059983,\n", " 48.253949109165006,\n", " 48.564158898735,\n", " 48.760080871095,\n", " 49.25804921751,\n", " 49.33424109565,\n", " 49.96554522881,\n", " 50.25398591034,\n", " 50.47711926775,\n", " 50.931549398085,\n", " 51.42407546749,\n", " 51.72340070304,\n", " 52.23497474198,\n", " 52.4635503764,\n", " 53.149277279660005,\n", " 53.364247221555004,\n", " 53.658130180095,\n", " 54.36290505289,\n", " 54.449981485049996,\n", " 54.934344138940006,\n", " 55.064958787180004,\n", " 55.367005161235,\n", " 55.731637720905,\n", " 56.06633775702,\n", " 56.85002564646,\n", " 57.258196422210005,\n", " 57.56296393477,\n", " 58.38202662477501,\n", " 59.105849467104996,\n", " 59.312655993485,\n", " 59.48952999631,\n", " 59.70994221521501,\n", " 59.971171511695,\n", " 60.226958531164996,\n", " 60.504514658675,\n", " 61.26915457857999,\n", " 62.504551459850006,\n", " 63.127692177495,\n", " 64.17533050192,\n", " 64.983508637905,\n", " 66.240674627215,\n", " 67.084227563765,\n", " 67.74002194347,\n", " 68.384931769155,\n", " 69.146850550555,\n", " 70.43394906342,\n", " 71.15777190575001,\n", " 72.19724681466,\n", " 72.591811897885,\n", " 72.934675349515,\n", " 73.34556726377,\n", " 74.393205588195,\n", " 75.606833361425,\n", " 75.759217117705,\n", " 77.27489126499,\n", " 79.318466282245,\n", " 80.89400547664,\n", " 82.33620888429,\n", " 82.57022679572,\n", " 83.75120090688999,\n", " 84.93761729507,\n", " 87.28868096339001,\n", " 108.59519545754,\n", " 109.6618817515,\n", " 109.963928125555,\n", " 111.476881134335,\n", " 113.16670814594,\n", " 114.475575766845,\n", " 115.455185628645,\n", " 116.527314199615,\n", " 117.46610698383999,\n", " 120.60630081861,\n", " 121.550535879845,\n", " 121.89612046997999,\n", " 122.69069291344,\n", " 127.37105114203999]],\n", " 'mo-count': 244,\n", " 'basis-count': 244,\n", " 'multiplicity': 1,\n", " 'molecular-mass': 203.23557999999983,\n", " 'number-of-atoms': 31,\n", " 'lowdin-partial-charges': [-0.459759,\n", " 0.210106,\n", " -0.286001,\n", " -0.204529,\n", " 0.007889,\n", " -0.335887,\n", " 0.111477,\n", " -0.468258,\n", " -0.356875,\n", " -0.234623,\n", " 0.097362,\n", " -0.372531,\n", " -0.388717,\n", " -0.348849,\n", " 0.18522,\n", " 0.178539,\n", " 0.177997,\n", " 0.155744,\n", " 0.158006,\n", " 0.158613,\n", " 0.156412,\n", " 0.217594,\n", " 0.202351,\n", " 0.166462,\n", " 0.165118,\n", " 0.174338,\n", " 0.164152,\n", " 0.228815,\n", " 0.170737,\n", " 0.199278,\n", " 0.169819],\n", " 'mulliken-partial-charges': [-0.542286,\n", " 0.622923,\n", " -0.486172,\n", " -0.478169,\n", " 0.118421,\n", " -0.37835,\n", " 0.548339,\n", " -0.621169,\n", " -0.538256,\n", " -0.190811,\n", " -0.34019,\n", " -0.352855,\n", " -0.380417,\n", " -0.330181,\n", " 0.196453,\n", " 0.19234,\n", " 0.183625,\n", " 0.162753,\n", " 0.149387,\n", " 0.154003,\n", " 0.163522,\n", " 0.255434,\n", " 0.260419,\n", " 0.179949,\n", " 0.177859,\n", " 0.197429,\n", " 0.171353,\n", " 0.288802,\n", " 0.188726,\n", " 0.241415,\n", " 0.185706],\n", " 'dipole-moment': 11.419443262233626,\n", " 'pubchem-multiplicity': 1,\n", " 'pubchem-obabel-canonical-smiles': '[O-]C(=O)CC(C[N+](C)(C)C)OC(=O)C',\n", " 'pubchem-isomeric-smiles': 'CC(=O)OC(CC(=O)[O-])C[N+](C)(C)C',\n", " 'pubchem-molecular-weight': 203.23558,\n", " 'pubchem-molecular-formula': 'C9H17NO4'},\n", " {'cid': 3,\n", " 'state': 'S0',\n", " 'pubchem-inchi': 'InChI=1S/C7H8O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-3,5-6,8-9H,(H,10,11)',\n", " 'pubchem-charge': 0,\n", " 'pubchem-version': '20160829',\n", " 'name': '000000003.B3LYP@PM6.S0',\n", " 'coordinates': [1.040909924594702,\n", " 0.939649931940047,\n", " 0.08098999410797753,\n", " 1.3650499010782415,\n", " -0.3605999738883242,\n", " 0.0714099948350763,\n", " 0.34219997520772777,\n", " -1.4541898946486875,\n", " -0.08665999372343035,\n", " -1.141219917312014,\n", " -0.9958599278305537,\n", " -0.01889999864761909,\n", " -1.3611799013969879,\n", " 0.49082996446293237,\n", " -0.07803999435546188,\n", " -0.34924997468982877,\n", " 1.3769399002199543,\n", " -0.04337999683195858,\n", " -2.774219799024303,\n", " 0.9673199298928236,\n", " -0.17327998747066597,\n", " -3.142559772328053,\n", " 1.9693498573058525,\n", " -0.7360399466708769,\n", " -3.7237097301883524,\n", " 0.20901998488060394,\n", " 0.4530499671557916,\n", " -1.7447198735923888,\n", " -1.499439891346727,\n", " 1.1895999138141986,\n", " 0.5514599600513355,\n", " -2.1256198459830777,\n", " -1.3319699035140309,\n", " 1.795109869930254,\n", " 1.7233998751149642,\n", " 0.18157998687032353,\n", " 2.396099826407926,\n", " -0.7108599484797881,\n", " 0.1488699892182084,\n", " 0.5035599635280761,\n", " -2.278269834909643,\n", " 0.6579799523273178,\n", " -1.7243698750404042,\n", " -1.534139888846425,\n", " -0.8135899410798237,\n", " -0.5331099613674413,\n", " 2.4564898220333253,\n", " -0.10719999224171652,\n", " -3.3772997552900454,\n", " -0.6197999550994765,\n", " 0.9207699332805251,\n", " -1.3247099040246133,\n", " -1.098319920406063,\n", " 1.9847398562167238,\n", " 0.5780199581090726,\n", " -1.4850598923904088,\n", " -2.077949849455232],\n", " 'atomic-numbers': [6, 6, 6, 6, 6, 6, 6, 8, 8, 8, 8, 1, 1, 1, 1, 1, 1, 1, 1],\n", " 'atom-count': 19,\n", " 'heavy-atom-count': 11,\n", " 'core-electrons': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n", " 'bond-order': [1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1],\n", " 'connection-indices': [19,\n", " 11,\n", " 11,\n", " 3,\n", " 15,\n", " 4,\n", " 8,\n", " 7,\n", " 7,\n", " 5,\n", " 7,\n", " 9,\n", " 16,\n", " 6,\n", " 3,\n", " 4,\n", " 3,\n", " 2,\n", " 3,\n", " 14,\n", " 5,\n", " 6,\n", " 5,\n", " 4,\n", " 6,\n", " 1,\n", " 4,\n", " 10,\n", " 2,\n", " 1,\n", " 2,\n", " 13,\n", " 1,\n", " 12,\n", " 9,\n", " 17,\n", " 10,\n", " 18],\n", " 'formula': 'C7H8O4',\n", " 'version': '1.0',\n", " 'obabel-inchi': 'InChI=1S/C7H8O4/c8-5-3-1-2-4(6(5)9)7(10)11/h1-3,5-6,8-9H,(H,10,11)/t5-,6-/m0/s1',\n", " 'pm6-obabel-canonical-smiles': 'O[C@H]1C=CC=C([C@@H]1O)C(=O)O',\n", " 'charge': 0,\n", " 'energy-beta-gap': 4.58783951943,\n", " 'energy-beta-homo': -6.72121210735,\n", " 'energy-beta-lumo': -2.1333725879200003,\n", " 'energy-alpha-gap': 4.58783951943,\n", " 'energy-alpha-homo': -6.72121210735,\n", " 'energy-alpha-lumo': -2.1333725879200003,\n", " 'total-energy': -15575.751067250567,\n", " 'homos': [40],\n", " 'orbital-energies': [[-522.17559455548,\n", " -521.66402051654,\n", " -521.49530992923,\n", " -520.37420086517,\n", " -280.601081497095,\n", " -279.501741541075,\n", " -279.37656916984497,\n", " -278.26906579831,\n", " -278.14389342708,\n", " -278.108518626515,\n", " -278.059538133425,\n", " -29.358363330445,\n", " -28.729780335790004,\n", " -28.117524172165,\n", " -27.06988584774,\n", " -23.401791143,\n", " -21.271139693585,\n", " -20.48473066564,\n", " -17.896927947385,\n", " -17.79080354569,\n", " -16.2724082599,\n", " -15.589402495144999,\n", " -14.906396730389998,\n", " -13.657394156595,\n", " -13.450587630215,\n", " -13.390722583105,\n", " -13.254665657855,\n", " -12.04375902313,\n", " -11.777087449640002,\n", " -11.567559784755,\n", " -11.360753258375,\n", " -10.639651554550001,\n", " -10.359374288535,\n", " -9.72534901687,\n", " -9.51854249049,\n", " -9.32806279514,\n", " -8.721248908525,\n", " -8.114435021910001,\n", " -7.71714880018,\n", " -7.344352824994999,\n", " -6.72121210735,\n", " -2.13337258792,\n", " 0.44626671482,\n", " 1.161926141635,\n", " 1.8775855684500002,\n", " 2.076228679315,\n", " 2.413649853935,\n", " 2.98781007849,\n", " 3.251760513475,\n", " 3.68442153577,\n", " 3.986467909825,\n", " 4.487157394745,\n", " 5.0395485112600005,\n", " 5.5946607662800005,\n", " 5.951129910435,\n", " 6.65046250622,\n", " 7.153873129645,\n", " 7.2763243623700005,\n", " 8.61512450683,\n", " 9.80698317202,\n", " 10.152567762155,\n", " 12.035595607615,\n", " 12.666899740775001,\n", " 13.662836433605,\n", " 14.027468993274999,\n", " 14.177131611050001,\n", " 14.81387802122,\n", " 14.881906483845002,\n", " 15.662873234780001,\n", " 15.910496838735,\n", " 16.08737084156,\n", " 16.367648107575,\n", " 16.628877404055,\n", " 17.075144118875,\n", " 17.93774502496,\n", " 18.03570601114,\n", " 18.634356482239998,\n", " 18.80850934656,\n", " 19.788119208359998,\n", " 20.174520876069998,\n", " 21.129640491325,\n", " 21.4969941895,\n", " 21.918770657775,\n", " 22.239865001365,\n", " 22.462998358775,\n", " 22.702458547215002,\n", " 23.050764275855,\n", " 23.23308055569,\n", " 23.714722071075002,\n", " 24.61541891623,\n", " 25.178694586765,\n", " 25.856258074510002,\n", " 26.158304448565,\n", " 26.479398792155,\n", " 26.751512642655,\n", " 27.094376094285,\n", " 27.44540296143,\n", " 27.878063983725,\n", " 28.612771380075003,\n", " 29.303940560345,\n", " 29.546121887290003,\n", " 29.777418660215,\n", " 30.604644765735003,\n", " 31.225064344875,\n", " 32.193789652655,\n", " 33.56796459768,\n", " 33.97069309642,\n", " 34.648256584165004,\n", " 35.34214690294,\n", " 36.14760390042,\n", " 37.170751978300004,\n", " 38.29730331937,\n", " 39.21704813406,\n", " 39.4565083225,\n", " 39.611613217285,\n", " 40.520473477955,\n", " 42.817114376175,\n", " 43.698763251795,\n", " 44.65932514406,\n", " 45.013073149709996,\n", " 45.451176449015,\n", " 45.56546426622501,\n", " 46.16139359882,\n", " 47.606318144975,\n", " 48.06619055232,\n", " 48.444428804515,\n", " 48.977771951495,\n", " 49.31791426462,\n", " 49.837651719075,\n", " 50.27031274137,\n", " 50.87440548948,\n", " 50.964203060145,\n", " 51.268970572705,\n", " 51.52203645367,\n", " 53.07852767853,\n", " 53.549284639895,\n", " 54.07446437136,\n", " 54.615970933854996,\n", " 55.715310889875,\n", " 56.466345117255,\n", " 56.75206466028,\n", " 58.011951788095,\n", " 58.882716109695,\n", " 59.742595877275,\n", " 60.22423739266,\n", " 61.85147821865,\n", " 62.877347435035006,\n", " 63.394363750985,\n", " 63.987571945075004,\n", " 64.38213702830001,\n", " 65.571274554985,\n", " 66.46380798462499,\n", " 67.13592919536,\n", " 68.07472197958501,\n", " 68.40670087719501,\n", " 69.08970664195,\n", " 70.104691304315,\n", " 71.23124264538501,\n", " 71.70199960675,\n", " 72.88297371792,\n", " 73.11699162935,\n", " 73.593190867725,\n", " 76.11568626186,\n", " 76.92114325934,\n", " 77.664014071205,\n", " 78.45586537616,\n", " 81.000129878335,\n", " 83.051868311105,\n", " 84.559379042875,\n", " 88.62475996934499,\n", " 101.659013408295,\n", " 103.759732334155,\n", " 105.58561627101,\n", " 110.20066717549001,\n", " 112.26873243929,\n", " 114.1245488997,\n", " 116.195335302005,\n", " 119.86070886824,\n", " 120.57364715655001,\n", " 122.5056554951,\n", " 128.723456979025]],\n", " 'mo-count': 181,\n", " 'basis-count': 181,\n", " 'multiplicity': 1,\n", " 'molecular-mass': 156.13601999999992,\n", " 'number-of-atoms': 19,\n", " 'lowdin-partial-charges': [-0.165834,\n", " -0.166217,\n", " -0.00709,\n", " -0.013209,\n", " -0.079093,\n", " -0.123254,\n", " 0.188072,\n", " -0.245033,\n", " -0.385089,\n", " -0.452317,\n", " -0.452864,\n", " 0.167807,\n", " 0.16777,\n", " 0.161347,\n", " 0.170277,\n", " 0.178644,\n", " 0.358744,\n", " 0.351181,\n", " 0.346156],\n", " 'mulliken-partial-charges': [-0.122757,\n", " -0.14337,\n", " 0.076825,\n", " 0.022155,\n", " 0.032184,\n", " -0.158536,\n", " 0.54755,\n", " -0.448879,\n", " -0.6056,\n", " -0.631763,\n", " -0.61658,\n", " 0.149995,\n", " 0.155257,\n", " 0.162402,\n", " 0.184383,\n", " 0.175571,\n", " 0.426152,\n", " 0.402121,\n", " 0.392891],\n", " 'dipole-moment': 5.099240805823648,\n", " 'pubchem-multiplicity': 1,\n", " 'pubchem-obabel-canonical-smiles': 'OC1C=CC=C(C1O)C(=O)O',\n", " 'pubchem-isomeric-smiles': 'C1=CC(C(C(=C1)C(=O)O)O)O',\n", " 'pubchem-molecular-weight': 156.13602,\n", " 'pubchem-molecular-formula': 'C7H8O4'}]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hub_dataset.take(2))" ] } ], "metadata": { "kernelspec": { "display_name": "hugface", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 2 }