CCRss
/

tokenizer_t5_kz

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

CCRss commited on Dec 21, 2023

Commit

e62f600

·

1 Parent(s): ecd3b83

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -1,3 +1,13 @@
 ## A Kazakh Language Tokenizer Based on T5 Model
 The "CCRss/tokenizer_kazakh_t5_new" is a specialized tokenizer developed for processing the Kazakh language. It is designed to integrate seamlessly with models based on the T5 (Text-to-Text Transfer Transformer) architecture, a powerful and versatile framework for various natural language processing tasks.
@@ -21,4 +31,4 @@ This tokenizer is ideal for researchers and developers working on NLP applicatio
 Link to Google Colab https://colab.research.google.com/drive/1Pk4lvRQqGJDpqiaS1MnZNYEzHwSf3oNE#scrollTo=tTnLF8Cq9lKM
 ### Acknowledgments
-The development of this tokenizer was a collaborative effort, drawing on the expertise of linguists and NLP professionals. We acknowledge the contributions of everyone involved in this project and aim to continuously improve the tokenizer based on user feedback and advances in NLP research.

+---
+datasets:
+- CCRss/small-chatgpt-paraphrases-kz
+language:
+- kk
+license: mit
+library_name: transformers
+tags:
+- text-generation-inference
+---
 ## A Kazakh Language Tokenizer Based on T5 Model
 The "CCRss/tokenizer_kazakh_t5_new" is a specialized tokenizer developed for processing the Kazakh language. It is designed to integrate seamlessly with models based on the T5 (Text-to-Text Transfer Transformer) architecture, a powerful and versatile framework for various natural language processing tasks.
 Link to Google Colab https://colab.research.google.com/drive/1Pk4lvRQqGJDpqiaS1MnZNYEzHwSf3oNE#scrollTo=tTnLF8Cq9lKM
 ### Acknowledgments
+The development of this tokenizer was a collaborative effort, drawing on the expertise of linguists and NLP professionals. We acknowledge the contributions of everyone involved in this project and aim to continuously improve the tokenizer based on user feedback and advances in NLP research.