# TinyStack Tokenizer | |
ByteLevel BPE tokenizer trained on fhswf/tiny-stack dataset. | |
## Usage | |
```python | |
from tokenizers.implementations import ByteLevelBPETokenizer | |
from tokenizers.processors import BertProcessing | |
tokenizer = ByteLevelBPETokenizer("./vocab.json", "./merges.txt") | |
tokenizer._tokenizer.post_processor = BertProcessing( | |
("</s>", tokenizer.token_to_id("</s>")), | |
("<s>", tokenizer.token_to_id("<s>")), | |
) | |
tokenizer.enable_truncation(max_length=512) | |
``` | |
Vocab size: 52000 | |