--- license: mit datasets: - arbml/arabic_100k_reviews language: - ar - en base_model: - google-bert/bert-base-uncased pipeline_tag: text-classification tags: - fine-tuning-bert-arbic - fine-tuning-bert-sentiment-analysis - sentiment-analysis - text-classification - ktrain-library --- # Fine-Tuned Arabic Sentiment Analysis with BERT 🚀 This repository contains a fine-tuned **BERT** model for sentiment analysis of Arabic reviews. The model is trained on the **[Arabic 100k Reviews](https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews)** dataset and can classify reviews into three sentiment categories: **Positive**, **Negative**, and **Mixed**. ## Author 🧑‍💻 Khaled Soudy GitHub: [khaledsoudy-1](https://github.com/khaledsoudy-1) --- ## Source Code 💻 You can find the source code and full implementation of this project on my [GitHub repository](https://github.com/khaledsoudy-1/FineTuning-BERT-Arabic-Sentiment/tree/main). The repository contains the Google Colab notebook, dataset, and scripts used to fine-tune the model for Arabic sentiment analysis. --- ## How to Use the Model ### 1. Install Required Libraries Make sure you have the **transformers** and **tensorflow** libraries installed: ```bash !pip install transformers ``` ```bash !pip install tensorflow ``` ### 2. Load the Fine-Tuned Model You can load the fine-tuned model and tokenizer directly from Hugging Face using the following code: ```python from transformers import TFBertForSequenceClassification, BertTokenizer # Load model and tokenizer from Hugging Face model_name = "khaledsoudy/arabic-sentiment-bert-model" # Load model model = TFBertForSequenceClassification.from_pretrained(model_name) # Load tokenizer tokenizer = BertTokenizer.from_pretrained(model_name) ``` ### 3. Use the Model for Prediction To use the model for sentiment analysis on an Arabic text, follow these steps: ```python import tensorflow as tf # Sample Arabic text for sentiment prediction text = "الفندق رائع و الخدمة ممتازة" # Tokenize the input text inputs = tokenizer(text, return_tensors="tf") # Get the model's prediction outputs = model(**inputs) # Get the predicted sentiment (assuming 3 classes: Positive, Negative, Mixed) predicted_class = tf.argmax(outputs.logits, axis=-1).numpy() # Map the predicted class index to sentiment labels sentiment_labels = ['Mixed', 'Negative', 'Positive'] print(f"Predicted sentiment: {sentiment_labels[predicted_class[0]]}") ``` ### 4. Input Format The model expects Arabic text input. The text should be preprocessed to remove unnecessary characters or diacritics for better results. ### 5. Sentiment Labels The model classifies the sentiment into three categories: - **Positive** 🌟 - **Negative** 😠 - **Mixed** 🤔 ## Model Details - **Model Name:** `khaledsoudy/arabic-sentiment-bert-model` - **Model Type:** `TFBertForSequenceClassification` - **Language:** Arabic - **Sentiment Classes:** Positive, Negative, Mixed ## How to Fine-Tune This Model You can fine-tune this model further using your own dataset. Check out the source code and related notebooks on my GitHub for detailed steps and guidance. ## License 📜 This model is licensed under the MIT License. ## Acknowledgments 🙏 - **Hugging Face** for providing the platform to host models. - **Google BERT** for the pre-trained model. - **Kaggle** for the **Arabic 100k Reviews** dataset. --- This README is ready for use on your Hugging Face model page! It includes detailed usage instructions, links to your GitHub, and other relevant information.