File size: 3,761 Bytes
c272f9c
 
bfacd17
 
 
008a594
23025fa
bfacd17
 
 
 
008a594
 
 
 
 
bfacd17
 
008a594
 
 
bfacd17
 
008a594
 
 
 
 
 
bfacd17
 
008a594
 
 
bfacd17
 
008a594
 
 
 
bfacd17
 
 
 
008a594
bfacd17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
008a594
bfacd17
 
 
 
 
008a594
bfacd17
008a594
bfacd17
 
 
 
 
008a594
bfacd17
3aee5b6
bfacd17
 
 
008a594
bfacd17
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
 
 
008a594
bfacd17
 
 
 
 
 
008a594
bfacd17
c272f9c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
license: mit
---

## πŸ“Έ **Application Screenshot**

## <a href="https://ibb.co/JjhspDnr"><img src="https://i.ibb.co/ZzZMgqdc/Screenshot-2025-02-11-231313.png" alt="Screenshot-2025-02-11-231313" border="0" /></a>

## πŸ›  **How It Works (End-to-End)**

### 1. **Data Preparation**

- The dataset `gender.xlsx` contains names and their corresponding genders (Male/Female).
- The `Gender` column is mapped to numerical values:
  - **Male (M)** is mapped to `1`
  - **Female (F)** is mapped to `0`

### 2. **Feature Extraction (TF-IDF Vectorization)**

- The names are converted to **TF-IDF vectors** using character n-grams (1 to 3 characters).
- This helps the model learn important patterns in names.

### 3. **Model Training**

- A **Neural Network** is built using **Keras Sequential API**:
  - Dense layers with **ReLU activation**
  - **Batch Normalization** and **Dropout layers** to prevent overfitting
  - Output layer with **Sigmoid activation** for binary classification
- The model is trained with **callbacks** like early stopping and learning rate reduction.

### 4. **Saving the Model and Vectorizer**

- The trained model is saved as `gender_prediction_model_Improve.h5`
- The TF-IDF vectorizer is saved as `tfidf_vectorizer_Improve.joblib`

### 5. **Streamlit Application**

- Loads the pre-trained model and vectorizer.
- Accepts user input (name) and predicts gender.
- Displays the predicted gender in a clean UI.

---

## πŸ“ **Project File Structure**

```
.
β”œβ”€β”€ TrainImprove.py          # Training script for the model
β”œβ”€β”€ ml-st1.py                 # Streamlit app for gender prediction
β”œβ”€β”€ gender.xlsx              # Dataset with names and gender
β”œβ”€β”€ gender_prediction_model_Improve.h5  # Saved Keras model
β”œβ”€β”€ tfidf_vectorizer_Improve.joblib     # Saved TF-IDF vectorizer
└── screenshot.png           # Screenshot of the app UI
```

---

## πŸš€ **How to Run the Project**

### 1. **Clone the Repository**

```bash
$ git clone <repository-url>
$ cd <repository-folder>
```

### 2. **Install Dependencies**

```bash
$ pip install -r requirements.txt
```

### 3. **Train the Model (Optional)**

If you want to retrain the model, run the training script:

```bash
$ python TrainImprove.py
```

### 4. **Run the Streamlit Application**

```bash
$ python -m streamlit run ml-st.py 
```

### 5. **Access the App**

Open your browser and go to: [http://localhost:8501](http://localhost:8501)

---

## πŸ’‘ **How the Code Works**

### **Training (TrainImprove.py)**

1. **Data Loading:** Reads the dataset from `gender.xlsx`.
2. **Preprocessing:** Converts names to TF-IDF vectors.
3. **Model Building:** Defines a neural network with regularization.
4. **Model Training:** Trains the model with early stopping.
5. **Saving Artifacts:** Stores the trained model (`.h5`) and vectorizer (`.joblib`).

### **Application (final.py)**

1. **Load Model and Vectorizer:** Loads the pre-trained model and TF-IDF vectorizer.
2. **User Input:** Accepts a name input from the user.
3. **Prediction:** Transforms the name using TF-IDF and makes a prediction.
4. **Output:** Displays the predicted gender (Male/Female) in the app.

---

## πŸ“¦ **Dependencies**

- Python 3.x
- TensorFlow
- Scikit-learn
- Pandas
- Streamlit

Install them using:

```bash
$ pip install tensorflow scikit-learn pandas streamlit joblib
```

---

## 🎨 **Future Enhancements**

- Improve the UI design.
- Include more diverse datasets for better generalization.
- Add confidence scores for predictions.
- Deploy the app online for public access.

---

## 🀝 **Contributing**

Feel free to fork the project and submit a pull request for improvements.

---

## πŸ“œ **License**

This project is licensed under the MIT License.

</html>