Christoph Holthaus
commited on
Commit
·
4a70f4c
1
Parent(s):
42e10de
tasks
Browse files
README.md
CHANGED
@@ -11,4 +11,17 @@ license: apache-2.0
|
|
11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
12 |
|
13 |
|
14 |
-
This is a test ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
12 |
|
13 |
|
14 |
+
This is a test ...
|
15 |
+
|
16 |
+
TASKS:
|
17 |
+
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion
|
18 |
+
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
19 |
+
- check ho wmuch parallel generation is possible or only one que and set max context etc accordingly. Maybe live-log ram free etc to the interface on "system health" graph if not too resource hungry on its own ... -> Gradio for display??
|
20 |
+
- live stream response (see mistral space!!)
|
21 |
+
- log memory usage to console? Maybe auto reboot if too slim
|
22 |
+
- readd system prompt? maybe checkout how to setup from lm studio - could be a dropdown with an option to set one fix also via env var when only one model available ...
|
23 |
+
- move model to DL into env-var with proper error handling
|
24 |
+
- chore: cleanup ignore, dockerfile etc.
|
25 |
+
- update all deps to one up to date version, then PIN them!
|
26 |
+
- make a short info on how to clone and run custom 7b models in separate spaces
|
27 |
+
- make a pr for popular repos to include in their readme etc.
|