Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
15
4
9
Pietro Lesci
pietrolesci
Follow
Mi6paulino's profile picture
PhilipWhittington's profile picture
yjernite's profile picture
18 followers
·
34 following
https://pietrolesci.github.io/
pietro_lesci
pietrolesci
pietrolesci
pietrolesci.bsky.social
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Organizations
pietrolesci
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
cmeister/multilingual-tok-corpus
9 months ago
Create README.md
#2 opened 9 months ago by
pietrolesci
New activity in
JeanKaddour/minipile
over 1 year ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
New activity in
HuggingFaceTB/SmolLM-135M
over 1 year ago
Trapezoidal scheduler with cooldown phase
👍
1
3
#4 opened over 1 year ago by
maveriq
New activity in
EleutherAI/pythia-160m
almost 2 years ago
Tokenizer `merges.txt` files
3
#5 opened almost 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-deduped-pythia-preshuffled
about 2 years ago
Sequence "packing" logic
👍
2
2
#2 opened about 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-deduped-pythia-preshuffled
over 2 years ago
Pad-only sequences from mmap'ed dataset after a certain index
#1 opened over 2 years ago by
pietrolesci
New activity in
EleutherAI/pile-duped-pythia-random-sampled
over 2 years ago
Add full sequences (beyond the first 64 tokens)
3
#1 opened over 2 years ago by
pietrolesci
Add full sequences (beyond the first 64 tokens)
3
#1 opened over 2 years ago by
pietrolesci
New activity in
JeanKaddour/minipile
over 2 years ago
Domain and provenance annotation
9
#1 opened over 2 years ago by
haukur
Load more