Christoph Holthaus
tasks
4a70f4c
|
raw
history blame
1.33 kB
metadata
title: Test
emoji: 🔥
colorFrom: indigo
colorTo: yellow
sdk: docker
pinned: false
license: apache-2.0

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

This is a test ...

TASKS:

  • rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion
  • write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
  • check ho wmuch parallel generation is possible or only one que and set max context etc accordingly. Maybe live-log ram free etc to the interface on "system health" graph if not too resource hungry on its own ... -> Gradio for display??
  • live stream response (see mistral space!!)
  • log memory usage to console? Maybe auto reboot if too slim
  • readd system prompt? maybe checkout how to setup from lm studio - could be a dropdown with an option to set one fix also via env var when only one model available ...
  • move model to DL into env-var with proper error handling
  • chore: cleanup ignore, dockerfile etc.
  • update all deps to one up to date version, then PIN them!
  • make a short info on how to clone and run custom 7b models in separate spaces
  • make a pr for popular repos to include in their readme etc.