derek-thomas
commited on
Commit
·
9930cd7
1
Parent(s):
ef9cbc8
Adding readme updates and removing debug from logs
Browse files- src/readme_update.py +4 -4
- src/utilities.py +4 -4
src/readme_update.py
CHANGED
|
@@ -69,15 +69,15 @@ and will add [nomic-ai/nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomi
|
|
| 69 |
|
| 70 |
The goal is to be able to have an automatic and free semantic/neural tool for any subreddit.
|
| 71 |
|
| 72 |
-
The last run was on {latest_hour_str} and updated {new_rows}.
|
| 73 |
|
| 74 |
## Creation Details
|
| 75 |
This is done by triggering [derek-thomas/processing-bestofredditorupdates](https://huggingface.co/spaces/derek-thomas/processing-bestofredditorupdates)
|
| 76 |
-
based on a repository update webhook to calculate the embeddings and update the [nomic atlas](https://docs.nomic.ai)
|
| 77 |
-
visualization.
|
| 78 |
|
| 79 |
## Update Frequency
|
| 80 |
-
The dataset is updated based on a webhook trigger, so each time [derek-thomas/dataset-creator-reddit-{subreddit}](https://huggingface.co/datasets/derek-thomas/dataset-creator-reddit-{subreddit})
|
| 81 |
is updated, this dataset will be updated.
|
| 82 |
|
| 83 |
## Opt-out
|
|
|
|
| 69 |
|
| 70 |
The goal is to be able to have an automatic and free semantic/neural tool for any subreddit.
|
| 71 |
|
| 72 |
+
The last run was on {latest_hour_str} and updated {new_rows} new rows.
|
| 73 |
|
| 74 |
## Creation Details
|
| 75 |
This is done by triggering [derek-thomas/processing-bestofredditorupdates](https://huggingface.co/spaces/derek-thomas/processing-bestofredditorupdates)
|
| 76 |
+
based on a repository update [webhook](https://huggingface.co/docs/hub/en/webhooks) to calculate the embeddings and update the [nomic atlas](https://docs.nomic.ai)
|
| 77 |
+
visualization. This is done by this [processing space](https://huggingface.co/spaces/derek-thomas/processing-bestofredditorupdates).
|
| 78 |
|
| 79 |
## Update Frequency
|
| 80 |
+
The dataset is updated based on a [webhook](https://huggingface.co/docs/hub/en/webhooks) trigger, so each time [derek-thomas/dataset-creator-reddit-{subreddit}](https://huggingface.co/datasets/derek-thomas/dataset-creator-reddit-{subreddit})
|
| 81 |
is updated, this dataset will be updated.
|
| 82 |
|
| 83 |
## Opt-out
|
src/utilities.py
CHANGED
|
@@ -17,13 +17,13 @@ logger = setup_logger(__name__)
|
|
| 17 |
|
| 18 |
def load_datasets():
|
| 19 |
# Get latest datasets locally
|
| 20 |
-
logger.
|
| 21 |
dataset = load_dataset(PROCESSED_DATASET, download_mode=DownloadMode.FORCE_REDOWNLOAD)
|
| 22 |
-
logger.
|
| 23 |
|
| 24 |
-
logger.
|
| 25 |
original_dataset = load_dataset(OG_DATASET, download_mode=DownloadMode.FORCE_REDOWNLOAD)
|
| 26 |
-
logger.
|
| 27 |
return dataset, original_dataset
|
| 28 |
|
| 29 |
|
|
|
|
| 17 |
|
| 18 |
def load_datasets():
|
| 19 |
# Get latest datasets locally
|
| 20 |
+
logger.info(f"Trying to download {PROCESSED_DATASET}")
|
| 21 |
dataset = load_dataset(PROCESSED_DATASET, download_mode=DownloadMode.FORCE_REDOWNLOAD)
|
| 22 |
+
logger.info(f"Loaded {PROCESSED_DATASET}")
|
| 23 |
|
| 24 |
+
logger.info(f"Trying to download {OG_DATASET}")
|
| 25 |
original_dataset = load_dataset(OG_DATASET, download_mode=DownloadMode.FORCE_REDOWNLOAD)
|
| 26 |
+
logger.info(f"Loaded {OG_DATASET}")
|
| 27 |
return dataset, original_dataset
|
| 28 |
|
| 29 |
|