Update README.md
Browse files
README.md
CHANGED
@@ -12,9 +12,11 @@ base_model:
|
|
12 |
# Model Card for Model ID
|
13 |
|
14 |
<!-- Provide a quick summary of what the model is/does. -->
|
15 |
-
This model is a personalized reward model for pluralistic alignment and
|
16 |
-
We train the PAL-B-Large model on a variant of Reddit TL;DR summary dataset with 10 users who provide the most amount of feedbacks and achieve higher performance compared with vanilla homogeneous reward model.
|
17 |
|
|
|
|
|
|
|
18 |
|
19 |
## Model Details
|
20 |
|
|
|
12 |
# Model Card for Model ID
|
13 |
|
14 |
<!-- Provide a quick summary of what the model is/does. -->
|
15 |
+
This model is a personalized reward model for pluralistic alignment and serves as a demonstration for our [paper](https://pal-alignment.github.io/).
|
|
|
16 |
|
17 |
+
We train the PAL-B-Large model on a variant of Reddit TL;DR summary dataset, incorporating feedback from the 10 most active users.
|
18 |
+
|
19 |
+
Our approach outperforms the standard homogeneous reward model, demonstrating improved performance with our proposed Pluralistic Alignment method.
|
20 |
|
21 |
## Model Details
|
22 |
|