README.md · choltha/free-CPU-inference-for-testing at 4a70f4c110cf3d03bef6e5c7c4005403d2da9ac1

metadata

title: Test
emoji: 🔥
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
license: apache-2.0

This is a test ...

TASKS:

rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion
write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
check ho wmuch parallel generation is possible or only one que and set max context etc accordingly. Maybe live-log ram free etc to the interface on "system health" graph if not too resource hungry on its own ... -> Gradio for display??
live stream response (see mistral space!!)
log memory usage to console? Maybe auto reboot if too slim
readd system prompt? maybe checkout how to setup from lm studio - could be a dropdown with an option to set one fix also via env var when only one model available ...
move model to DL into env-var with proper error handling
chore: cleanup ignore, dockerfile etc.
update all deps to one up to date version, then PIN them!
make a short info on how to clone and run custom 7b models in separate spaces
make a pr for popular repos to include in their readme etc.