Spaces:
Runtime error
Runtime error
Update app.py
Browse files
app.py
CHANGED
@@ -15,7 +15,9 @@ DESCRIPTION = """\
|
|
15 |
|
16 |
This Space demonstrates model [CLEX-7B-Chat-16K](https://huggingface.co/DAMO-NLP-SG/CLEX-7B-Chat-16K), a Llama-2-7B model fine-tuned using our [CLEX](https://arxiv.org/abs/2310.16450) method. Feel free to play with it, or duplicate to run generations without a queue! If you want to run your own service, you can also [deploy the model on Inference Endpoints](https://huggingface.co/inference-endpoints).
|
17 |
|
18 |
-
The
|
|
|
|
|
19 |
|
20 |
"""
|
21 |
|
@@ -35,6 +37,7 @@ If you find our project useful, hope you can star our repo and cite our paper as
|
|
35 |
author = {Chen, Guanzheng and Li, Xin and Meng, Zaiqiao and Liang, Shangsong and Bing, Lidong},
|
36 |
title = {CLEX: Continuous Length Extrapolation for Large Language Models},
|
37 |
year = 2023,
|
|
|
38 |
url = {https://arxiv.org/abs/2310.16450}
|
39 |
}
|
40 |
```
|
@@ -128,7 +131,7 @@ def generate(
|
|
128 |
# for user, assistant in chat_history:
|
129 |
# conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
|
130 |
# conversation.append({"role": "user", "content": message})
|
131 |
-
print(prompt[500:1000])
|
132 |
# chat = tokenizer.apply_chat_template(conversation, tokenize=False)
|
133 |
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
|
134 |
if len(inputs) > MAX_INPUT_TOKEN_LENGTH:
|
|
|
15 |
|
16 |
This Space demonstrates model [CLEX-7B-Chat-16K](https://huggingface.co/DAMO-NLP-SG/CLEX-7B-Chat-16K), a Llama-2-7B model fine-tuned using our [CLEX](https://arxiv.org/abs/2310.16450) method. Feel free to play with it, or duplicate to run generations without a queue! If you want to run your own service, you can also [deploy the model on Inference Endpoints](https://huggingface.co/inference-endpoints).
|
17 |
|
18 |
+
The web demo supports the maximum input sequence length of 10k now (probably OOM).
|
19 |
+
|
20 |
+
This support of PDF input is tentative.
|
21 |
|
22 |
"""
|
23 |
|
|
|
37 |
author = {Chen, Guanzheng and Li, Xin and Meng, Zaiqiao and Liang, Shangsong and Bing, Lidong},
|
38 |
title = {CLEX: Continuous Length Extrapolation for Large Language Models},
|
39 |
year = 2023,
|
40 |
+
journal = {arXiv preprint arXiv:2310.16450},
|
41 |
url = {https://arxiv.org/abs/2310.16450}
|
42 |
}
|
43 |
```
|
|
|
131 |
# for user, assistant in chat_history:
|
132 |
# conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
|
133 |
# conversation.append({"role": "user", "content": message})
|
134 |
+
# print(prompt[500:1000])
|
135 |
# chat = tokenizer.apply_chat_template(conversation, tokenize=False)
|
136 |
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
|
137 |
if len(inputs) > MAX_INPUT_TOKEN_LENGTH:
|