Spaces:

choltha
/

free-CPU-inference-for-testing

Paused

Christoph Holthaus

switch over to gradio "native"

b8c846d over 1 year ago

1.08 kB

	---
	title: Test
	emoji: 🔥
	colorFrom: red
	colorTo: yellow
	sdk: gradio
	pinned: false
	license: mit
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference


	This is a test ...

	TASKS:
	- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
	- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
	- test multimodal with llama?
	- proper token handling - make it a real chat (if not auto by chatcompletion interface ...)
	- check ho wmuch parallel generation is possible or only one que and set
	- move model to DL into env-var with proper error handling
	- chore: cleanup ignore, etc.
	- update all deps to one up to date version, then PIN them!
	- make a short info on how to clone and run custom 7b models in separate spaces
	- make a pr for popular repos to include in their readme etc.