--- language: bm tags: - bambara - fasttext - embeddings - word-vectors - african-nlp - low-resource license: apache-2.0 datasets: - bambara-corpus metrics: - cosine_similarity pipeline_tag: feature-extraction --- # Bambara FastText Embeddings ## Model Description This model provides FastText word embeddings for the Bambara language (Bamanankan), a Mande language spoken primarily in Mali. The embeddings capture semantic relationships between Bambara words and enable various NLP tasks for this low-resource African language. **Model Type:** FastText Word Embeddings **Language:** Bambara (bm) **License:** Apache 2.0 ## Model Details ### Model Architecture - **Algorithm:** FastText with subword information - **Vector Dimension:** 300 - **Vocabulary Size:** 9,973 unique Bambara words - **Training Method:** Skip-gram with negative sampling - **Subword Information:** Character n-grams (enables handling of out-of-vocabulary words) ### Training Data The model was trained on Bambara text corpora, building upon the work of David Ifeoluwa Adelani's research on African language embeddings. ### Intended Use This model is designed for: - **Semantic similarity tasks** in Bambara - **Information retrieval** for Bambara documents - **Cross-lingual research** involving Bambara - **Cultural preservation** and digital humanities projects - **Educational applications** for Bambara language learning - **Foundation for downstream NLP tasks** in Bambara ## Usage ``` Coming soon ```