Update README.md
Browse files
README.md
CHANGED
@@ -96,6 +96,12 @@ The model was pre-trained continuously on a single A10G GPU in an AWS instance f
|
|
96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
#### Authors:
|
100 |
|
101 |
<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
|
|
|
96 |
<br>Thus, hurts the performance of the Abstractive Summarization task.
|
97 |
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.
|
98 |
|
99 |
+
Note:
|
100 |
+
|
101 |
+
Could be used as an encoder-only model but advised not to use it.
|
102 |
+
<br>Because there are already bert-based models with better inference time. (due to longer sequence length)
|
103 |
+
<br>This could be used in case a longer sequence length is required.
|
104 |
+
|
105 |
#### Authors:
|
106 |
|
107 |
<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
|