Update README.md
Browse files
README.md
CHANGED
|
@@ -13,10 +13,10 @@ arxiv: 2502.07272
|
|
| 13 |
|
| 14 |
## **Important Notice**
|
| 15 |
If you are using **GENERator** for sequence generation, please ensure that the length of each input sequence is a multiple of **6**. This can be achieved by either:
|
| 16 |
-
1. Padding the sequence on the left with `'A'` (**left padding**)
|
| 17 |
-
2.
|
| 18 |
|
| 19 |
-
This requirement arises because **GENERator** employs a 6-mer tokenizer. If the input sequence length is not a multiple of **6**, the tokenizer will append an
|
| 20 |
|
| 21 |
We apologize for any inconvenience this may cause and recommend adhering to the above guidelines to ensure accurate and meaningful generation results.
|
| 22 |
|
|
|
|
| 13 |
|
| 14 |
## **Important Notice**
|
| 15 |
If you are using **GENERator** for sequence generation, please ensure that the length of each input sequence is a multiple of **6**. This can be achieved by either:
|
| 16 |
+
1. Padding the sequence on the left with `'A'` (**left padding**);
|
| 17 |
+
2. Truncating the sequence from the left (**left truncation**).
|
| 18 |
|
| 19 |
+
This requirement arises because **GENERator** employs a 6-mer tokenizer. If the input sequence length is not a multiple of **6**, the tokenizer will append an `'<oov>'` (out-of-vocabulary) token to the end of the token sequence. This can result in uninformative subsequent generations, such as repeated `'AAAAAA'`.
|
| 20 |
|
| 21 |
We apologize for any inconvenience this may cause and recommend adhering to the above guidelines to ensure accurate and meaningful generation results.
|
| 22 |
|