Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a dataset
12 minutes ago
pietrolesci/pile-deduped-pythia-preshuffled
updated
a dataset
20 minutes ago
pietrolesci/pile-deduped-pythia-preshuffled
updated
a dataset
about 1 hour ago
pietrolesci/pile-deduped-pythia-preshuffled
Organizations
pietrolesci's activity
🌟 Appreciation for providing seamless access to pre-processed pre-shuffled data
#2 opened 3 days ago
by
pietrolesci

Reconstructing pre-training data
#1 opened 3 days ago
by
pietrolesci

Domain and provenance annotation
9
#1 opened over 1 year ago
by
haukur
Trapezoidal scheduler with cooldown phase
3
#4 opened 8 months ago
by
maveriq

Bias annotation
#2 opened 10 months ago
by
pietrolesci

Tokenizer `merges.txt` files
3
#5 opened 10 months ago
by
pietrolesci

Sequence "packing" logic
2
#2 opened about 1 year ago
by
pietrolesci

Pad-only sequences from mmap'ed dataset after a certain index
#1 opened about 1 year ago
by
pietrolesci

Add full sequences (beyond the first 64 tokens)
3
#1 opened about 1 year ago
by
pietrolesci

Fix swapped start and exclusive_end fields
1
#3 opened over 2 years ago
by
pietrolesci

App down
#1 opened over 2 years ago
by
pietrolesci

`start` and `exclusive_end` seems swapped
1
#1 opened over 2 years ago
by
pietrolesci
