File size: 493 Bytes
323a419 95dd061 323a419 95dd061 |
1 2 3 4 5 6 7 8 9 |
---
license: mit
library_name: fasttext
pipeline_tag: text-classification
---
This is the fastText pretraining data filter targeting the LAMBADA FR task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816. This filter uses perplexity correlations to identify high-quality pretraining data without requiring any LLM training. It is designed to be used with the `fastText` library.
Github: https://github.com/TristanThrush/perplexity-correlations |