README.md · choltha/free-CPU-inference-for-testing at d65f1354b7f9faf0fd5cef247059cc79ce7672a9

metadata

title: Test
emoji: 🔥
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
license: apache-2.0

This is a test ...

TASKS:

for fast debug: Add a debug mode that enables me to run direct cli commands? -> Never for prod!
prod harden docker with proper users etc. OR mention this is only a dev build an intended for messing with, no readonly filesystem etc.
rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
test multimodal with llama?
can i use swap in docker to maximize usable memory?
proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
maybe run as webserver locally and gradio only uses the webserver as backend? (better for async but maybe worse to control - just an idea)
check ho wmuch parallel generation is possible or only one que and set
move model to DL into env-var with proper error handling
chore: cleanup ignore, dockerfile etc.
update all deps to one up to date version, then PIN them!
make a short info on how to clone and run custom 7b models in separate spaces
make a pr for popular repos to include in their readme etc.