Lists of URLs from various training datasets
Nick Hagar
nhagar
AI & ML interests
digital media, collective attention, computational social science
Recent Activity
updated
a dataset
about 23 hours ago
nhagar/c4_urls_multilingual
updated
a dataset
about 23 hours ago
nhagar/c4_urls_multilingual
updated
a dataset
about 23 hours ago
nhagar/c4_urls_multilingual
Organizations
models
None public yet
datasets
105
nhagar/c4_urls_multilingual
Viewer
•
Updated
•
2M
•
359
nhagar/c4_urls_en
Viewer
•
Updated
•
869k
•
269
nhagar/test-upload-c4
Updated
•
11
nhagar/c4_urls_en.noclean
Viewer
•
Updated
•
1.4M
•
513
nhagar/c4_urls_en.noblocklist
Viewer
•
Updated
•
1.39M
•
220
•
1
nhagar/c4_urls_realnewslike
Viewer
•
Updated
•
1.8M
•
57
nhagar/CC-MAIN-2021-17_urls
Viewer
•
Updated
•
55.9M
•
58
nhagar/CC-MAIN-2017-34_urls
Viewer
•
Updated
•
59.3M
•
60
nhagar/CC-MAIN-2015-40_urls
Viewer
•
Updated
•
21.2M
•
55
nhagar/CC-MAIN-2022-21_urls
Viewer
•
Updated
•
58.7M
•
59