|
--- |
|
license: mit |
|
--- |
|
|
|
## πΈ **Application Screenshot** |
|
|
|
## <a href="https://ibb.co/JjhspDnr"><img src="https://i.ibb.co/ZzZMgqdc/Screenshot-2025-02-11-231313.png" alt="Screenshot-2025-02-11-231313" border="0" /></a> |
|
|
|
## π **How It Works (End-to-End)** |
|
|
|
### 1. **Data Preparation** |
|
|
|
- The dataset `gender.xlsx` contains names and their corresponding genders (Male/Female). |
|
- The `Gender` column is mapped to numerical values: |
|
- **Male (M)** is mapped to `1` |
|
- **Female (F)** is mapped to `0` |
|
|
|
### 2. **Feature Extraction (TF-IDF Vectorization)** |
|
|
|
- The names are converted to **TF-IDF vectors** using character n-grams (1 to 3 characters). |
|
- This helps the model learn important patterns in names. |
|
|
|
### 3. **Model Training** |
|
|
|
- A **Neural Network** is built using **Keras Sequential API**: |
|
- Dense layers with **ReLU activation** |
|
- **Batch Normalization** and **Dropout layers** to prevent overfitting |
|
- Output layer with **Sigmoid activation** for binary classification |
|
- The model is trained with **callbacks** like early stopping and learning rate reduction. |
|
|
|
### 4. **Saving the Model and Vectorizer** |
|
|
|
- The trained model is saved as `gender_prediction_model_Improve.h5` |
|
- The TF-IDF vectorizer is saved as `tfidf_vectorizer_Improve.joblib` |
|
|
|
### 5. **Streamlit Application** |
|
|
|
- Loads the pre-trained model and vectorizer. |
|
- Accepts user input (name) and predicts gender. |
|
- Displays the predicted gender in a clean UI. |
|
|
|
--- |
|
|
|
## π **Project File Structure** |
|
|
|
``` |
|
. |
|
βββ TrainImprove.py # Training script for the model |
|
βββ ml-st1.py # Streamlit app for gender prediction |
|
βββ gender.xlsx # Dataset with names and gender |
|
βββ gender_prediction_model_Improve.h5 # Saved Keras model |
|
βββ tfidf_vectorizer_Improve.joblib # Saved TF-IDF vectorizer |
|
βββ screenshot.png # Screenshot of the app UI |
|
``` |
|
|
|
--- |
|
|
|
## π **How to Run the Project** |
|
|
|
### 1. **Clone the Repository** |
|
|
|
```bash |
|
$ git clone <repository-url> |
|
$ cd <repository-folder> |
|
``` |
|
|
|
### 2. **Install Dependencies** |
|
|
|
```bash |
|
$ pip install -r requirements.txt |
|
``` |
|
|
|
### 3. **Train the Model (Optional)** |
|
|
|
If you want to retrain the model, run the training script: |
|
|
|
```bash |
|
$ python TrainImprove.py |
|
``` |
|
|
|
### 4. **Run the Streamlit Application** |
|
|
|
```bash |
|
$ python -m streamlit run ml-st.py |
|
``` |
|
|
|
### 5. **Access the App** |
|
|
|
Open your browser and go to: [http://localhost:8501](http://localhost:8501) |
|
|
|
--- |
|
|
|
## π‘ **How the Code Works** |
|
|
|
### **Training (TrainImprove.py)** |
|
|
|
1. **Data Loading:** Reads the dataset from `gender.xlsx`. |
|
2. **Preprocessing:** Converts names to TF-IDF vectors. |
|
3. **Model Building:** Defines a neural network with regularization. |
|
4. **Model Training:** Trains the model with early stopping. |
|
5. **Saving Artifacts:** Stores the trained model (`.h5`) and vectorizer (`.joblib`). |
|
|
|
### **Application (final.py)** |
|
|
|
1. **Load Model and Vectorizer:** Loads the pre-trained model and TF-IDF vectorizer. |
|
2. **User Input:** Accepts a name input from the user. |
|
3. **Prediction:** Transforms the name using TF-IDF and makes a prediction. |
|
4. **Output:** Displays the predicted gender (Male/Female) in the app. |
|
|
|
--- |
|
|
|
## π¦ **Dependencies** |
|
|
|
- Python 3.x |
|
- TensorFlow |
|
- Scikit-learn |
|
- Pandas |
|
- Streamlit |
|
|
|
Install them using: |
|
|
|
```bash |
|
$ pip install tensorflow scikit-learn pandas streamlit joblib |
|
``` |
|
|
|
--- |
|
|
|
## π¨ **Future Enhancements** |
|
|
|
- Improve the UI design. |
|
- Include more diverse datasets for better generalization. |
|
- Add confidence scores for predictions. |
|
- Deploy the app online for public access. |
|
|
|
--- |
|
|
|
## π€ **Contributing** |
|
|
|
Feel free to fork the project and submit a pull request for improvements. |
|
|
|
--- |
|
|
|
## π **License** |
|
|
|
This project is licensed under the MIT License. |
|
|
|
</html> |