A curated collection of machine translation datasets
Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a dataset
about 8 hours ago
pietrolesci/pile-deduped-preshuffled
updated
a dataset
about 8 hours ago
pietrolesci/pile-deduped-preshuffled
updated
a dataset
about 8 hours ago
pietrolesci/pile-deduped-preshuffled
Organizations
spaces
1
models
19

pietrolesci/smol_llama-370M-tied_bpe32000minipile
Updated
•
56

pietrolesci/smol_llama-1B_bpe32000minipile
Updated
•
54

pietrolesci/smol_llama-81M-tied_bpe2wp32000minipile
Updated

pietrolesci/smol_llama-81M-tied_bpe128000minipile
Updated

pietrolesci/smol_llama-81M-tied_bpe8064minipile
Updated

pietrolesci/smol_llama-81M-tied_wordpiece32000minipile
Updated

pietrolesci/smol_llama-81M-tied_bpe32000minipile
Updated

pietrolesci/tokenisers
Updated

pietrolesci/bert-civilcomments-gradtracking
Updated

pietrolesci/roberta-base_mnli_b9799b8f9b
Updated
datasets
52
pietrolesci/pile-deduped-preshuffled
Updated
•
58
pietrolesci/smol_llama-minipile-evals
Viewer
•
Updated
•
1.82M
•
312
pietrolesci/minipile
Viewer
•
Updated
•
6.06M
•
558
pietrolesci/opus-5langs-1M
Viewer
•
Updated
•
5M
•
143
pietrolesci/opus-raw
Viewer
•
Updated
•
4.06B
•
2.86k
pietrolesci/pythia-pile-stats
Viewer
•
Updated
•
113M
•
176
pietrolesci/slim-pajama-eval
Viewer
•
Updated
•
1.84M
•
91
•
1
pietrolesci/pile-subset
Updated
•
50
pietrolesci/cmnist
Viewer
•
Updated
•
308k
•
88
pietrolesci/pythia-deduped-stats
Viewer
•
Updated
•
16.3M
•
279