Hindi Sentence Embeddings Foundational Model

This is a sentence embedding model trained specifically for Hindi language text. It can be used to convert Hindi sentences into dense vector representations that capture semantic meaning, enabling various NLP applications such as semantic search, clustering, and similarity comparison.

Features

Specialized for Hindi language text
Creates semantically meaningful vector representations
Supports sentence similarity computation
Enables semantic search over document collections
Lightweight and easy to use

Usage

Installation

git lfs install 
git clone https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model
cd hindi-embeddings-foundational-model

Quick Start

from hindi_embeddings import SentenceEmbedder

# Initialize the embedder
model = SentenceEmbedder("/path/to/hindi-embeddings-foundational-model")

# Encode sentences to embeddings
sentences = [
    "मुझे हिंदी भाषा बहुत पसंद है।",
    "मैं हिंदी भाषा सीख रहा हूँ।"
]
embeddings = model.encode(sentences)

# Compute similarity between sentences
similarity = model.compute_similarity(sentences)
print(f"Similarity: {similarity[0][1]:.4f}")

# Perform semantic search
query = "भारत की राजधानी"
documents = [
    "दिल्ली भारत की राजधानी है।",
    "मुंबई भारत का सबसे बड़ा शहर है।",
    "हिमालय पर्वत भारत के उत्तर में स्थित है।"
]
results = model.search(query, documents)
for i, result in enumerate(results):
    print(f"{i+1}. Score: {result['score']:.4f}")
    print(f"   Document: {result['document']}")

Model Details

This model uses a transformer-based architecture with mean pooling to create sentence embeddings. It was trained on a diverse corpus of Hindi text to capture the semantic nuances of the Hindi language.

The embeddings have the following properties:

Dimension: 768
Normalization: L2-normalized vectors (unit vectors)
Distance metric: Cosine similarity

Applications

Semantic search systems
Document clustering and organization
Recommendation systems
Question answering
Information retrieval
Text similarity comparison

License

This model is released under the MIT License.

Citation

If you use this model in your research or application, please cite us:

@misc{convaiinnovations2025hindi,
  author = {ConvAI Innovations},
  title = {Hindi Sentence Embeddings Foundational Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/convaiinnovations/hindi-embeddings-foundational-model}}
}