Hindi Sentence Embeddings Foundational Model
This is a sentence embedding model trained specifically for Hindi language text. It can be used to convert Hindi sentences into dense vector representations that capture semantic meaning, enabling various NLP applications such as semantic search, clustering, and similarity comparison.
Features
- Specialized for Hindi language text
- Creates semantically meaningful vector representations
- Supports sentence similarity computation
- Enables semantic search over document collections
- Lightweight and easy to use
Usage
Installation
git lfs install
git clone https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model
cd hindi-embeddings-foundational-model
Quick Start
from hindi_embeddings import SentenceEmbedder
# Initialize the embedder
model = SentenceEmbedder("/path/to/hindi-embeddings-foundational-model")
# Encode sentences to embeddings
sentences = [
"मुझे हिंदी भाषा बहुत पसंद है।",
"मैं हिंदी भाषा सीख रहा हूँ।"
]
embeddings = model.encode(sentences)
# Compute similarity between sentences
similarity = model.compute_similarity(sentences)
print(f"Similarity: {similarity[0][1]:.4f}")
# Perform semantic search
query = "भारत की राजधानी"
documents = [
"दिल्ली भारत की राजधानी है।",
"मुंबई भारत का सबसे बड़ा शहर है।",
"हिमालय पर्वत भारत के उत्तर में स्थित है।"
]
results = model.search(query, documents)
for i, result in enumerate(results):
print(f"{i+1}. Score: {result['score']:.4f}")
print(f" Document: {result['document']}")
Model Details
This model uses a transformer-based architecture with mean pooling to create sentence embeddings. It was trained on a diverse corpus of Hindi text to capture the semantic nuances of the Hindi language.
The embeddings have the following properties:
- Dimension: 768
- Normalization: L2-normalized vectors (unit vectors)
- Distance metric: Cosine similarity
Applications
- Semantic search systems
- Document clustering and organization
- Recommendation systems
- Question answering
- Information retrieval
- Text similarity comparison
License
This model is released under the MIT License.
Citation
If you use this model in your research or application, please cite us:
@misc{convaiinnovations2025hindi,
author = {ConvAI Innovations},
title = {Hindi Sentence Embeddings Foundational Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model}}
}
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support sentence-similarity models for transformers library.