Spaces:

DAMO-NLP-SG
/

CLEX-Chat

Runtime error

Guanzheng commited on Oct 26, 2023

Commit

e02c5de

1 Parent(s): 5d8ca76

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -15,7 +15,9 @@ DESCRIPTION = """\
 This Space demonstrates model [CLEX-7B-Chat-16K](https://huggingface.co/DAMO-NLP-SG/CLEX-7B-Chat-16K), a Llama-2-7B model fine-tuned using our [CLEX](https://arxiv.org/abs/2310.16450) method. Feel free to play with it, or duplicate to run generations without a queue! If you want to run your own service, you can also [deploy the model on Inference Endpoints](https://huggingface.co/inference-endpoints).
-The model supports the maximun input sequence length of 64k now.
 """
@@ -35,6 +37,7 @@ If you find our project useful, hope you can star our repo and cite our paper as
   author = {Chen, Guanzheng and Li, Xin and Meng, Zaiqiao and Liang, Shangsong and Bing, Lidong},
   title = {CLEX: Continuous Length Extrapolation for Large Language Models},
   year = 2023,
   url = {https://arxiv.org/abs/2310.16450}
 }
 ```
@@ -128,7 +131,7 @@ def generate(
     # for user, assistant in chat_history:
     #     conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
     # conversation.append({"role": "user", "content": message})
-    print(prompt[500:1000])
     # chat = tokenizer.apply_chat_template(conversation, tokenize=False)
     inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
     if len(inputs) > MAX_INPUT_TOKEN_LENGTH:

 This Space demonstrates model [CLEX-7B-Chat-16K](https://huggingface.co/DAMO-NLP-SG/CLEX-7B-Chat-16K), a Llama-2-7B model fine-tuned using our [CLEX](https://arxiv.org/abs/2310.16450) method. Feel free to play with it, or duplicate to run generations without a queue! If you want to run your own service, you can also [deploy the model on Inference Endpoints](https://huggingface.co/inference-endpoints).
+The web demo supports the maximum input sequence length of 10k now (probably OOM).
+This support of PDF input is tentative.
 """
   author = {Chen, Guanzheng and Li, Xin and Meng, Zaiqiao and Liang, Shangsong and Bing, Lidong},
   title = {CLEX: Continuous Length Extrapolation for Large Language Models},
   year = 2023,
+  journal = {arXiv preprint arXiv:2310.16450},
   url = {https://arxiv.org/abs/2310.16450}
 }
 ```
     # for user, assistant in chat_history:
     #     conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
     # conversation.append({"role": "user", "content": message})
+    # print(prompt[500:1000])
     # chat = tokenizer.apply_chat_template(conversation, tokenize=False)
     inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
     if len(inputs) > MAX_INPUT_TOKEN_LENGTH: