title: Test | |
emoji: 🔥 | |
colorFrom: red | |
colorTo: yellow | |
sdk: gradio | |
pinned: false | |
license: mit | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
This is a test ... | |
TASKS: | |
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py | |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference | |
- test multimodal with llama? | |
- proper token handling - make it a real chat (if not auto by chatcompletion interface ...) | |
- check ho wmuch parallel generation is possible or only one que and set | |
- move model to DL into env-var with proper error handling | |
- chore: cleanup ignore, etc. | |
- update all deps to one up to date version, then PIN them! | |
- make a short info on how to clone and run custom 7b models in separate spaces | |
- make a pr for popular repos to include in their readme etc. |