File size: 4,020 Bytes
f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f f5a5a54 687920f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
license: mit
language:
- en
tags:
- sentence-transformer
- embeddings
- mental-health
- intent-classification
pipeline_tag: feature-extraction
base_model: sentence-transformers/all-MiniLM-L6-v2
---
# Intent Encoder (MindPadi)
The `intent_encoder` is a Sentence Transformer model used in the MindPadi mental health assistant for **encoding user messages into dense embeddings**. These embeddings support intent classification, similarity search, and memory recall workflows. It plays a foundational role in the semantic understanding of user inputs across various MindPadi features.
## π§ Model Overview
- **Architecture:** Sentence-BERT (`all-MiniLM-L6-v2` base)
- **Task:** Sentence Embedding / Semantic Similarity
- **Purpose:** Embed user queries for intent classification, vector search, and memory retrieval
- **Size:** ~80M parameters
- **Files:**
- `config.json`
- `pytorch_model.bin` or `model.safetensors`
- `tokenizer.json`, `vocab.txt`
- `1_Pooling/`, `2_Normalize/` (Sentence-BERT components)
## π§Ύ Intended Use
### βοΈ Primary Use Cases
- Semantic embedding of user inputs for intent recognition
- Matching new messages against known intent samples (`data/processed_intents.json`)
- Supporting vector similarity in MongoDB Atlas Search or ChromaDB
- Powering memory in LangGraph agentic workflows
### π« Not Recommended For
- Direct intent classification (this model returns embeddings, not classes)
- Use outside of NLP (e.g., image, audio)
## π§ͺ Integration in MindPadi
- `app/chatbot/intent_classifier.py`: Uses this model to compute sentence embeddings
- `app/chatbot/intent_router.py`: Leverages vector similarity for intent matching
- `database/vector_search.py`: Embeddings are stored or queried from MongoDB vector index
- `app/utils/embedding_search.py`: Embeds utterances for real-time nearest-neighbor lookup
## ποΈ Training Details
- **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (pretrained)
- **Fine-tuning:** Optional domain-specific contrastive learning using pairs in `training/datasets/fallback_pairs.json`
- **Script:** `training/fine_tune_encoder.py` (if fine-tuned)
- **Tokenizer:** BERT-based WordPiece tokenizer
- **Max Token Length:** 128
## π Evaluation
While this model is not evaluated via classification metrics, its **embedding quality** was assessed through:
- **Cosine similarity tests** (intent embedding similarity)
- **Intent clustering accuracy** with `KMeans` in vector space
- **Recall@K** for correct intent retrieval
- **Visualizations:** UMAP plots (`logs/intent_umap.png`)
Results indicate:
- High-quality clustering of semantically similar intents
- ~91% Top-3 Recall for known intents
## π¬ Example Usage
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("mindpadi/intent_encoder")
texts = ["I want to talk to a therapist", "Book a session", "I'm feeling anxious"]
embeddings = model.encode(texts)
print(embeddings.shape) # (3, 384)
````
## π§ͺ Deployment (API Example)
```python
import requests
endpoint = "https://api-inference.huggingface.co/models/mindpadi/intent_encoder"
headers = {"Authorization": f"Bearer <your-token>"}
payload = {"inputs": "I need help managing stress"}
response = requests.post(endpoint, json=payload, headers=headers)
embedding = response.json()
```
## β οΈ Limitations
* English-only
* Short, clean sentences work best (not optimized for long documents)
* Does not directly return intent labels β must be paired with clustering or classification logic
* May yield ambiguous vectors for multi-intent or vague inputs
## π License
MIT License β open for personal, academic, and commercial use with attribution.
## π¬ Contact
* **Project:** [MindPadi Mental Health Assistant](https://huggingface.co/mindpadi)
* **Team:** MindPadi Developers
* **Email:** \[[[email protected]](mailto:[email protected])]
* **GitHub:** \[[https://github.com/mindpadi](https://github.com/mindpadi)]
*Last updated: May 2025* |