{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2025-04-09T06:13:55.446824Z", "iopub.status.busy": "2025-04-09T06:13:55.445794Z", "iopub.status.idle": "2025-04-09T06:13:56.137367Z", "shell.execute_reply": "2025-04-09T06:13:56.136554Z", "shell.execute_reply.started": "2025-04-09T06:13:55.446782Z" }, "tags": [] }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "from datasets_common import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-04-09T06:13:56.140406Z", "iopub.status.busy": "2025-04-09T06:13:56.138861Z", "iopub.status.idle": "2025-04-09T06:13:56.182854Z", "shell.execute_reply": "2025-04-09T06:13:56.182207Z", "shell.execute_reply.started": "2025-04-09T06:13:56.140363Z" }, "tags": [] }, "outputs": [], "source": [ "dest_dir = Path(globals()[\"_dh\"][0])\n", "json_filename = \"arxiv-metadata-oai-snapshot.json\"\n", "dataset = \"Cornell-University/arxiv\"\n", "old_label = \"categories\"\n", "new_label = \"category\"\n", "train_filename = \"arxiv_train.json\"\n", "test_filename = \"arxiv_test.json\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2025-04-09T06:13:56.184655Z", "iopub.status.busy": "2025-04-09T06:13:56.183825Z", "iopub.status.idle": "2025-04-09T06:15:23.665384Z", "shell.execute_reply": "2025-04-09T06:15:23.664523Z", "shell.execute_reply.started": "2025-04-09T06:13:56.184630Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset already exists, do not download\n", "Reading dataset...\n", "Dataset read\n" ] } ], "source": [ "df = download_and_read_dataset(dest_dir=dest_dir, dataset=dataset, filename=json_filename)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-04-09T06:15:23.667946Z", "iopub.status.busy": "2025-04-09T06:15:23.666981Z", "iopub.status.idle": "2025-04-09T06:15:23.702581Z", "shell.execute_reply": "2025-04-09T06:15:23.701966Z", "shell.execute_reply.started": "2025-04-09T06:15:23.667909Z" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "submitter | \n", "authors | \n", "title | \n", "comments | \n", "journal-ref | \n", "doi | \n", "report-no | \n", "categories | \n", "license | \n", "abstract | \n", "versions | \n", "update_date | \n", "authors_parsed | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0704.0001 | \n", "Pavel Nadolsky | \n", "C. Bal\\'azs, E. L. Berger, P. M. Nadolsky, C.-... | \n", "Calculation of prompt diphoton production cros... | \n", "37 pages, 15 figures; published version | \n", "Phys.Rev.D76:013009,2007 | \n", "10.1103/PhysRevD.76.013009 | \n", "ANL-HEP-PR-07-12 | \n", "hep-ph | \n", "None | \n", "A fully differential calculation in perturba... | \n", "[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... | \n", "2008-11-26 | \n", "[[Balázs, C., ], [Berger, E. L., ], [Nadolsky,... | \n", "
1 | \n", "0704.0002 | \n", "Louis Theran | \n", "Ileana Streinu and Louis Theran | \n", "Sparsity-certifying Graph Decompositions | \n", "To appear in Graphs and Combinatorics | \n", "None | \n", "None | \n", "None | \n", "math.CO cs.CG | \n", "http://arxiv.org/licenses/nonexclusive-distrib... | \n", "We describe a new algorithm, the $(k,\\ell)$-... | \n", "[{'version': 'v1', 'created': 'Sat, 31 Mar 200... | \n", "2008-12-13 | \n", "[[Streinu, Ileana, ], [Theran, Louis, ]] | \n", "
2 | \n", "0704.0003 | \n", "Hongjun Pan | \n", "Hongjun Pan | \n", "The evolution of the Earth-Moon system based o... | \n", "23 pages, 3 figures | \n", "None | \n", "None | \n", "None | \n", "physics.gen-ph | \n", "None | \n", "The evolution of Earth-Moon system is descri... | \n", "[{'version': 'v1', 'created': 'Sun, 1 Apr 2007... | \n", "2008-01-13 | \n", "[[Pan, Hongjun, ]] | \n", "
3 | \n", "0704.0004 | \n", "David Callan | \n", "David Callan | \n", "A determinant of Stirling cycle numbers counts... | \n", "11 pages | \n", "None | \n", "None | \n", "None | \n", "math.CO | \n", "None | \n", "We show that a determinant of Stirling cycle... | \n", "[{'version': 'v1', 'created': 'Sat, 31 Mar 200... | \n", "2007-05-23 | \n", "[[Callan, David, ]] | \n", "
4 | \n", "0704.0005 | \n", "Alberto Torchinsky | \n", "Wael Abu-Shammala and Alberto Torchinsky | \n", "From dyadic $\\Lambda_{\\alpha}$ to $\\Lambda_{\\a... | \n", "None | \n", "Illinois J. Math. 52 (2008) no.2, 681-689 | \n", "None | \n", "None | \n", "math.CA math.FA | \n", "None | \n", "In this paper we show how to compute the $\\L... | \n", "[{'version': 'v1', 'created': 'Mon, 2 Apr 2007... | \n", "2013-10-15 | \n", "[[Abu-Shammala, Wael, ], [Torchinsky, Alberto, ]] | \n", "