Hindi Sentence Embeddings Foundational Model

This is a sentence embedding model trained specifically for Hindi language text. It can be used to convert Hindi sentences into dense vector representations that capture semantic meaning, enabling various NLP applications such as semantic search, clustering, and similarity comparison.

Features

  • Specialized for Hindi language text
  • Creates semantically meaningful vector representations
  • Supports sentence similarity computation
  • Enables semantic search over document collections
  • Lightweight and easy to use

Usage

Installation

git lfs install 
git clone https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model
cd hindi-embeddings-foundational-model

Quick Start

from hindi_embeddings import SentenceEmbedder

# Initialize the embedder
model = SentenceEmbedder("/path/to/hindi-embeddings-foundational-model")

# Encode sentences to embeddings
sentences = [
    "मुझे हिंदी भाषा बहुत पसंद है।",
    "मैं हिंदी भाषा सीख रहा हूँ।"
]
embeddings = model.encode(sentences)

# Compute similarity between sentences
similarity = model.compute_similarity(sentences)
print(f"Similarity: {similarity[0][1]:.4f}")

# Perform semantic search
query = "भारत की राजधानी"
documents = [
    "दिल्ली भारत की राजधानी है।",
    "मुंबई भारत का सबसे बड़ा शहर है।",
    "हिमालय पर्वत भारत के उत्तर में स्थित है।"
]
results = model.search(query, documents)
for i, result in enumerate(results):
    print(f"{i+1}. Score: {result['score']:.4f}")
    print(f"   Document: {result['document']}")

Model Details

This model uses a transformer-based architecture with mean pooling to create sentence embeddings. It was trained on a diverse corpus of Hindi text to capture the semantic nuances of the Hindi language.

The embeddings have the following properties:

  • Dimension: 768
  • Normalization: L2-normalized vectors (unit vectors)
  • Distance metric: Cosine similarity

Applications

  • Semantic search systems
  • Document clustering and organization
  • Recommendation systems
  • Question answering
  • Information retrieval
  • Text similarity comparison

License

This model is released under the MIT License.

Citation

If you use this model in your research or application, please cite us:

@misc{convaiinnovations2025hindi,
  author = {ConvAI Innovations},
  title = {Hindi Sentence Embeddings Foundational Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model}}
}
Downloads last month
5
Safetensors
Model size
107M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for transformers library.