|
# AAC Context-Aware Demo: To-Do Document |
|
|
|
## Goal |
|
|
|
Create a proof-of-concept offline-capable RAG (Retrieval-Augmented Generation) system for ALS AAC users that: |
|
|
|
* Uses a lightweight knowledge graph (JSON) |
|
* Supports utterance suggestion and correction |
|
* Uses local/offline LLMs (e.g., Gemma, Flan-T5) |
|
* Includes a semantic retriever to match context (e.g. conversation partner, topics) |
|
* Provides a Gradio-based UI for deployment on HuggingFace |
|
|
|
--- |
|
|
|
## Phase 1: Environment Setup |
|
|
|
* [ ] Install Gradio, Transformers, Sentence-Transformers |
|
* [ ] Choose and install inference backends: |
|
|
|
* [ ] `google/flan-t5-base` (via HuggingFace Transformers) |
|
* [ ] Gemma 2B via Ollama or Transformers (check support for offline use) |
|
* [ ] Sentence similarity model (`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2` or similar) |
|
|
|
--- |
|
|
|
## Phase 2: Knowledge Graph |
|
|
|
* [ ] Create example `social_graph.json` (people, topics, relationships) |
|
* [ ] Define function to extract relevant context given a selected person |
|
|
|
* Name, relationship, typical topics, frequency |
|
* [ ] Format for prompt injection: inline context for LLM use |
|
|
|
--- |
|
|
|
## Phase 3: Semantic Retriever |
|
|
|
* [ ] Load sentence-transformer model |
|
* [ ] Create index from the social graph topics/descriptions |
|
* [ ] Match transcript to closest node(s) in the graph |
|
* [ ] Retrieve context for prompt generation |
|
|
|
--- |
|
|
|
## Phase 4: Gradio UI |
|
|
|
* [ ] Simple interface: |
|
|
|
* Dropdown: Select "Who is speaking?" (Bob, Alice, etc.) |
|
* Record Button: Capture audio input |
|
* Text area: Show transcript |
|
* Toggle tabs: |
|
|
|
* [ ] "Suggest Utterance" |
|
* [ ] "Correct Message" |
|
* Output: Generated message |
|
* [ ] Implement Whisper transcription (use `whisper`, `faster-whisper`, or `whisper.cpp`) |
|
* [ ] Pass transcript + retrieved context to LLM model |
|
|
|
--- |
|
|
|
## Phase 5: Model Comparison |
|
|
|
* [ ] Test both Flan-T5 and Gemma: |
|
|
|
* [ ] Evaluate speed/quality tradeoffs |
|
* [ ] Compare correction accuracy and context-specific generation |
|
|
|
--- |
|
|
|
## Optional Phase 6: HuggingFace Deployment |
|
|
|
* [ ] Clean up UI and remove dependencies requiring GPU-only execution |
|
* [ ] Upload Gradio demo to HuggingFace Spaces |
|
* [ ] Add documentation and example graphs/transcripts |
|
|
|
--- |
|
|
|
## Notes |
|
|
|
* Keep user privacy and safety in mind (no cloud transcription if Whisper offline is available) |
|
* Keep JSON editable for later expansion (add sessions, emotional tone, etc.) |
|
* Option to cache LLM suggestions for fast recall |
|
|
|
--- |
|
|
|
## Future Features (Post-Proof of Concept) |
|
|
|
* Add visualisation of social graph (D3 or static SVG) |
|
* Add editable profile page for caregivers |
|
* Add chat history / rolling transcript viewer |
|
* Add emotion/sentiment detection for tone-aware suggestions |
|
|