Spaces:

willwade
/

AACKGDemo

Sleeping

App Files Files Community

AACKGDemo / to-do.md

willwade

Initial commit

f5b302e 3 months ago

preview code

raw

history blame contribute delete

2.7 kB

	# AAC Context-Aware Demo: To-Do Document

	## Goal

	Create a proof-of-concept offline-capable RAG (Retrieval-Augmented Generation) system for ALS AAC users that:

	* Uses a lightweight knowledge graph (JSON)
	* Supports utterance suggestion and correction
	* Uses local/offline LLMs (e.g., Gemma, Flan-T5)
	* Includes a semantic retriever to match context (e.g. conversation partner, topics)
	* Provides a Gradio-based UI for deployment on HuggingFace

	---

	## Phase 1: Environment Setup

	* [ ] Install Gradio, Transformers, Sentence-Transformers
	* [ ] Choose and install inference backends:

	* [ ] `google/flan-t5-base` (via HuggingFace Transformers)
	* [ ] Gemma 2B via Ollama or Transformers (check support for offline use)
	* [ ] Sentence similarity model (`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` or similar)

	---

	## Phase 2: Knowledge Graph

	* [ ] Create example `social_graph.json` (people, topics, relationships)
	* [ ] Define function to extract relevant context given a selected person

	* Name, relationship, typical topics, frequency
	* [ ] Format for prompt injection: inline context for LLM use

	---

	## Phase 3: Semantic Retriever

	* [ ] Load sentence-transformer model
	* [ ] Create index from the social graph topics/descriptions
	* [ ] Match transcript to closest node(s) in the graph
	* [ ] Retrieve context for prompt generation

	---

	## Phase 4: Gradio UI

	* [ ] Simple interface:

	* Dropdown: Select "Who is speaking?" (Bob, Alice, etc.)
	* Record Button: Capture audio input
	* Text area: Show transcript
	* Toggle tabs:

	* [ ] "Suggest Utterance"
	* [ ] "Correct Message"
	* Output: Generated message
	* [ ] Implement Whisper transcription (use `whisper`, `faster-whisper`, or `whisper.cpp`)
	* [ ] Pass transcript + retrieved context to LLM model

	---

	## Phase 5: Model Comparison

	* [ ] Test both Flan-T5 and Gemma:

	* [ ] Evaluate speed/quality tradeoffs
	* [ ] Compare correction accuracy and context-specific generation

	---

	## Optional Phase 6: HuggingFace Deployment

	* [ ] Clean up UI and remove dependencies requiring GPU-only execution
	* [ ] Upload Gradio demo to HuggingFace Spaces
	* [ ] Add documentation and example graphs/transcripts

	---

	## Notes

	* Keep user privacy and safety in mind (no cloud transcription if Whisper offline is available)
	* Keep JSON editable for later expansion (add sessions, emotional tone, etc.)
	* Option to cache LLM suggestions for fast recall

	---

	## Future Features (Post-Proof of Concept)

	* Add visualisation of social graph (D3 or static SVG)
	* Add editable profile page for caregivers
	* Add chat history / rolling transcript viewer
	* Add emotion/sentiment detection for tone-aware suggestions