Transformers
Kazakh
text-generation-inference
Inference Endpoints
CCRss commited on
Commit
e62f600
·
1 Parent(s): ecd3b83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -1,3 +1,13 @@
 
 
 
 
 
 
 
 
 
 
1
  ## A Kazakh Language Tokenizer Based on T5 Model
2
 
3
  The "CCRss/tokenizer_kazakh_t5_new" is a specialized tokenizer developed for processing the Kazakh language. It is designed to integrate seamlessly with models based on the T5 (Text-to-Text Transfer Transformer) architecture, a powerful and versatile framework for various natural language processing tasks.
@@ -21,4 +31,4 @@ This tokenizer is ideal for researchers and developers working on NLP applicatio
21
  Link to Google Colab https://colab.research.google.com/drive/1Pk4lvRQqGJDpqiaS1MnZNYEzHwSf3oNE#scrollTo=tTnLF8Cq9lKM
22
  ### Acknowledgments
23
 
24
- The development of this tokenizer was a collaborative effort, drawing on the expertise of linguists and NLP professionals. We acknowledge the contributions of everyone involved in this project and aim to continuously improve the tokenizer based on user feedback and advances in NLP research.
 
1
+ ---
2
+ datasets:
3
+ - CCRss/small-chatgpt-paraphrases-kz
4
+ language:
5
+ - kk
6
+ license: mit
7
+ library_name: transformers
8
+ tags:
9
+ - text-generation-inference
10
+ ---
11
  ## A Kazakh Language Tokenizer Based on T5 Model
12
 
13
  The "CCRss/tokenizer_kazakh_t5_new" is a specialized tokenizer developed for processing the Kazakh language. It is designed to integrate seamlessly with models based on the T5 (Text-to-Text Transfer Transformer) architecture, a powerful and versatile framework for various natural language processing tasks.
 
31
  Link to Google Colab https://colab.research.google.com/drive/1Pk4lvRQqGJDpqiaS1MnZNYEzHwSf3oNE#scrollTo=tTnLF8Cq9lKM
32
  ### Acknowledgments
33
 
34
+ The development of this tokenizer was a collaborative effort, drawing on the expertise of linguists and NLP professionals. We acknowledge the contributions of everyone involved in this project and aim to continuously improve the tokenizer based on user feedback and advances in NLP research.