Commit History

Splitting prepare_dataset into preparing the base dataset, and the tokenized dataset. This will help us to have further control over caching and loading data, eventually removing the storage of base dataset.
6af9ef6

meg-huggingface commited on

Continuing cache minimizing in new repository. Please see https://github.com/huggingface/DataMeasurements for full history
d8ab532

meg-huggingface commited on

:bug: filter_vocab -> filter_words
78cc3f9

yourusername commited on

:bug: really make sure log_files/ exists
e1cd6af

yourusername commited on