Forecast-ing commited on
Commit
da1f79c
·
verified ·
1 Parent(s): 16f1dc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -40
README.md CHANGED
@@ -9,59 +9,80 @@ model-index:
9
  results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # modernBERT-content-regression
16
 
17
- This model is a fine-tuned version of [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on an unknown dataset.
18
- It achieves the following results on the evaluation set:
19
- - Loss: 2.4624
20
- - Mse: 2.4624
21
- - Rmse: 1.5692
22
- - Mae: 1.1822
23
- - R2: 0.3258
24
- - Smape: 56.6145
25
 
26
- ## Model description
 
 
27
 
28
- More information needed
29
 
30
- ## Intended uses & limitations
 
 
 
31
 
32
- More information needed
 
 
 
 
33
 
34
- ## Training and evaluation data
35
 
36
- More information needed
 
37
 
38
- ## Training procedure
39
 
40
- ### Training hyperparameters
 
 
 
 
 
 
 
 
 
41
 
42
- The following hyperparameters were used during training:
43
- - learning_rate: 2.479942619764035e-05
44
- - train_batch_size: 4
45
- - eval_batch_size: 4
46
- - seed: 42
47
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
48
- - lr_scheduler_type: linear
49
- - num_epochs: 5
50
 
51
- ### Training results
 
 
 
 
 
 
52
 
53
- | Training Loss | Epoch | Step | Validation Loss | Mse | Rmse | Mae | R2 | Smape |
54
- |:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:-------:|:-------:|
55
- | 0.1152 | 1.0 | 124 | 4.0842 | 4.0842 | 2.0209 | 1.2199 | -0.1182 | 49.0235 |
56
- | 1.239 | 2.0 | 248 | 3.8036 | 3.8036 | 1.9503 | 1.2892 | -0.0414 | 52.7754 |
57
- | 27.8256 | 3.0 | 372 | 3.2460 | 3.2460 | 1.8017 | 1.1022 | 0.1113 | 51.7470 |
58
- | 0.0001 | 4.0 | 496 | 2.4134 | 2.4134 | 1.5535 | 1.0811 | 0.3392 | 52.2215 |
59
- | 0.1666 | 5.0 | 620 | 2.4624 | 2.4624 | 1.5692 | 1.1822 | 0.3258 | 56.6145 |
60
 
 
 
 
61
 
62
- ### Framework versions
 
63
 
64
- - Transformers 4.48.0.dev0
65
- - Pytorch 2.5.1+cu124
66
- - Datasets 3.2.0
67
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  results: []
10
  ---
11
 
12
+ # ModernBERT Engagement Content Regression
13
+ ### What is this?
14
 
15
+ This is an exploration of using modernBERT for the text regression task of predicting engagement metrics for text content. In this case, we are predicting the clickthrough rate (CTR) of email text content.
16
 
17
+ We will be exploring hyperparameter tuning of modernBert; and how to use it for regression, as well as comparing the results to a benchmark model.
 
 
 
 
 
 
 
18
 
19
+ This type of task if difficult, we can remember the quote
20
+ > “Half my advertising is wasted; the trouble is, I don't know which half”
21
+ > -John Wanamaker
22
 
23
+ We are also excluding other relevant factors such as the time of day the email is sent, the day of the week, the recipient, etc in this experiment.
24
 
25
+ Links for project:
26
+ - Model - [ModernBERT-Engagement-Content-Regression](https://huggingface.co/Forecast-ing/modernBERT-content-regression)
27
+ - Training notebook - [Training Notebook](https://github.com/Forecast-ing/modernbert-content-regression/blob/main/model_training.ipynb)
28
+ - Demo - [Demo Space](https://huggingface.co/spaces/Forecast-ing/modernbert-content-regression)
29
 
30
+ This work is indebted to the work of many community members and blog posts.
31
+ - [ModernBERT Announcement](https://huggingface.co/blog/modernbert)
32
+ - [Fine-tune classifier with ModernBERT in 2025](https://www.philschmid.de/fine-tune-modern-bert-in-2025)
33
+ - [How to set up Trainer for a regression](https://discuss.huggingface.co/t/how-to-set-up-trainer-for-a-regression/12994)
34
+ - Additional thanks to the creators of ModernBERT!
35
 
 
36
 
37
+ ### Our dataset
38
+ We will be using a dataset of 548 emails where we have the text of the email `text` and the CTR we are trying to predict `labels`.
39
 
40
+ We look forward in the improvements of ModernBERT to fine-tune models specifically for each potential users email dataset. The variability of email data, as well as the small size of the dataset pose an interesting regression challenge.
41
 
42
+ ### Benchmarking
43
+ We will start by using the Catboost library as a simple benchmark for text regression. For both the benchmark and the ModernBert run, we are using 'rmse' as the metric.
44
+ We recieve the following results:
45
+ | Metric | Value |
46
+ |--------|------------------|
47
+ | MSE | 2.552100633998035 |
48
+ | RMSE | 1.5975295408843102 |
49
+ | MAE | 1.1439370629666958 |
50
+ | R² | 0.30127932054387174 |
51
+ | SMAPE | 37.63064694052479 |
52
 
53
+ ## Fitting the Modern Bert Model
 
 
 
 
 
 
 
54
 
55
+ ### Install dependencies and activate venv
56
+ ```bash
57
+ uv sync
58
+ source .venv/bin/activate
59
+ ```
60
+ the following values need to be defined in the .env file
61
+ - `HUGGINGFACE_TOKEN`
62
 
63
+ ### Run notebook for model fitting
 
 
 
 
 
 
64
 
65
+ ```bash
66
+ uv run --with jupyter jupyter lab
67
+ ```
68
 
69
+ ### ModernBert Model Performance
70
+ After running hyperparameter tuning for ModernBERT, we get the following results:
71
 
72
+ | Metric | Value |
73
+ |--------|------------------|
74
+ | MSE | 2.4624056816101074 |
75
+ | RMSE | 1.5692054300218654 |
76
+ | MAE | 1.182181715965271 |
77
+ | R² | 0.325836181640625 |
78
+ | SMAPE | 56.61447048187256 |
79
+
80
+ We see improvements in all metrics except for SMAPE. We believe that ModernBERT would scale even better with a larger dataset; as 500 example is very low for fine-tuning and are thus happy with the performance of this evaluation.
81
+
82
+ ### Who are we?
83
+ At [Forecast.ing](https://forecast.ing) we are building a platform to help users create more enriching content by automatically researching trends and generating campaign ideas with AgenticAI.
84
+ We generate the content, and then create fine-tuned scores of how likely we think that content will succeed.
85
+
86
+ ## Conclusion
87
+ We see that ModernBERT is a powerful model for text regression. We believe that with a larger dataset, we would see even better results. We are excited to see the future of ModernBERT and how it will be used for text regression.
88
+ If interested, I can be contacted at [email protected]