Christoph Holthaus
commited on
Commit
·
d65f135
1
Parent(s):
4a70f4c
more readme ideas
Browse files
README.md
CHANGED
@@ -14,12 +14,15 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
14 |
This is a test ...
|
15 |
|
16 |
TASKS:
|
17 |
-
-
|
|
|
|
|
18 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
19 |
-
-
|
20 |
-
-
|
21 |
-
-
|
22 |
-
-
|
|
|
23 |
- move model to DL into env-var with proper error handling
|
24 |
- chore: cleanup ignore, dockerfile etc.
|
25 |
- update all deps to one up to date version, then PIN them!
|
|
|
14 |
This is a test ...
|
15 |
|
16 |
TASKS:
|
17 |
+
- for fast debug: Add a debug mode that enables me to run direct cli commands? -> Never for prod!
|
18 |
+
- prod harden docker with proper users etc. OR mention this is only a dev build an intended for messing with, no readonly filesystem etc.
|
19 |
+
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
|
20 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
21 |
+
- test multimodal with llama?
|
22 |
+
- can i use swap in docker to maximize usable memory?
|
23 |
+
- proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
|
24 |
+
- maybe run as webserver locally and gradio only uses the webserver as backend? (better for async but maybe worse to control - just an idea)
|
25 |
+
- check ho wmuch parallel generation is possible or only one que and set
|
26 |
- move model to DL into env-var with proper error handling
|
27 |
- chore: cleanup ignore, dockerfile etc.
|
28 |
- update all deps to one up to date version, then PIN them!
|