|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- sentence-transformer |
|
- embeddings |
|
- mental-health |
|
- intent-classification |
|
pipeline_tag: feature-extraction |
|
base_model: sentence-transformers/all-MiniLM-L6-v2 |
|
--- |
|
|
|
# Intent Encoder (MindPadi) |
|
|
|
The `intent_encoder` is a Sentence Transformer model used in the MindPadi mental health assistant for **encoding user messages into dense embeddings**. These embeddings support intent classification, similarity search, and memory recall workflows. It plays a foundational role in the semantic understanding of user inputs across various MindPadi features. |
|
|
|
|
|
## π§ Model Overview |
|
|
|
- **Architecture:** Sentence-BERT (`all-MiniLM-L6-v2` base) |
|
- **Task:** Sentence Embedding / Semantic Similarity |
|
- **Purpose:** Embed user queries for intent classification, vector search, and memory retrieval |
|
- **Size:** ~80M parameters |
|
- **Files:** |
|
- `config.json` |
|
- `pytorch_model.bin` or `model.safetensors` |
|
- `tokenizer.json`, `vocab.txt` |
|
- `1_Pooling/`, `2_Normalize/` (Sentence-BERT components) |
|
|
|
|
|
## π§Ύ Intended Use |
|
|
|
### βοΈ Primary Use Cases |
|
- Semantic embedding of user inputs for intent recognition |
|
- Matching new messages against known intent samples (`data/processed_intents.json`) |
|
- Supporting vector similarity in MongoDB Atlas Search or ChromaDB |
|
- Powering memory in LangGraph agentic workflows |
|
|
|
### π« Not Recommended For |
|
- Direct intent classification (this model returns embeddings, not classes) |
|
- Use outside of NLP (e.g., image, audio) |
|
|
|
|
|
## π§ͺ Integration in MindPadi |
|
|
|
- `app/chatbot/intent_classifier.py`: Uses this model to compute sentence embeddings |
|
- `app/chatbot/intent_router.py`: Leverages vector similarity for intent matching |
|
- `database/vector_search.py`: Embeddings are stored or queried from MongoDB vector index |
|
- `app/utils/embedding_search.py`: Embeds utterances for real-time nearest-neighbor lookup |
|
|
|
|
|
## ποΈ Training Details |
|
|
|
- **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (pretrained) |
|
- **Fine-tuning:** Optional domain-specific contrastive learning using pairs in `training/datasets/fallback_pairs.json` |
|
- **Script:** `training/fine_tune_encoder.py` (if fine-tuned) |
|
- **Tokenizer:** BERT-based WordPiece tokenizer |
|
- **Max Token Length:** 128 |
|
|
|
|
|
## π Evaluation |
|
|
|
While this model is not evaluated via classification metrics, its **embedding quality** was assessed through: |
|
|
|
- **Cosine similarity tests** (intent embedding similarity) |
|
- **Intent clustering accuracy** with `KMeans` in vector space |
|
- **Recall@K** for correct intent retrieval |
|
- **Visualizations:** UMAP plots (`logs/intent_umap.png`) |
|
|
|
Results indicate: |
|
- High-quality clustering of semantically similar intents |
|
- ~91% Top-3 Recall for known intents |
|
|
|
|
|
## π¬ Example Usage |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
model = SentenceTransformer("mindpadi/intent_encoder") |
|
|
|
texts = ["I want to talk to a therapist", "Book a session", "I'm feeling anxious"] |
|
embeddings = model.encode(texts) |
|
|
|
print(embeddings.shape) # (3, 384) |
|
```` |
|
|
|
|
|
## π§ͺ Deployment (API Example) |
|
|
|
```python |
|
import requests |
|
|
|
endpoint = "https://api-inference.huggingface.co/models/mindpadi/intent_encoder" |
|
headers = {"Authorization": f"Bearer <your-token>"} |
|
payload = {"inputs": "I need help managing stress"} |
|
|
|
response = requests.post(endpoint, json=payload, headers=headers) |
|
embedding = response.json() |
|
``` |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
* English-only |
|
* Short, clean sentences work best (not optimized for long documents) |
|
* Does not directly return intent labels β must be paired with clustering or classification logic |
|
* May yield ambiguous vectors for multi-intent or vague inputs |
|
|
|
|
|
## π License |
|
|
|
MIT License β open for personal, academic, and commercial use with attribution. |
|
|
|
|
|
## π¬ Contact |
|
|
|
* **Project:** [MindPadi Mental Health Assistant](https://huggingface.co/mindpadi) |
|
* **Team:** MindPadi Developers |
|
* **Email:** \[[[email protected]](mailto:[email protected])] |
|
* **GitHub:** \[[https://github.com/mindpadi](https://github.com/mindpadi)] |
|
|
|
*Last updated: May 2025* |