ronakreddy18 commited on
Commit
d3d0026
·
verified ·
1 Parent(s): 3ae2c67

Update pages/LIFE_CYCLE_OF_MACHINE_LEARNING.py

Browse files
pages/LIFE_CYCLE_OF_MACHINE_LEARNING.py CHANGED
@@ -132,44 +132,38 @@ print(excel_file.sheet_names)
132
  st.session_state.page = "structured_data"
133
 
134
  # ----------------- Unstructured Data Page -----------------
 
 
 
 
 
135
  def unstructured_data_page():
136
  st.title(":blue[Unstructured Data]")
137
 
138
  st.markdown("""
139
- **Unstructured data** does not have a predefined format. It consists of various data types like text, images, videos, and audio files.
140
  Examples include:
141
- - Text documents (e.g., .txt, .docx)
142
  - Images (e.g., .jpg, .png)
143
  - Videos (e.g., .mp4, .avi)
144
- - Audio files (e.g., .mp3, .wav)
145
  - Social media posts
146
  """)
147
 
148
- st.header("📄 Handling Text Data")
149
- st.markdown("""
150
- Text data can be analyzed using Natural Language Processing (NLP) techniques.
151
- """)
152
- st.code("""
153
- # Reading text data
154
- with open('sample.txt', 'r') as file:
155
- text = file.read()
156
- print(text)
157
-
158
- # Basic text processing using NLTK
159
- import nltk
160
- from nltk.tokenize import word_tokenize
161
-
162
- nltk.download('punkt')
163
- tokens = word_tokenize(text)
164
- print(tokens)
165
- """, language='python')
166
-
167
  st.header("🖼️ Handling Image Data")
168
  st.markdown("""
169
- Image data can be processed using libraries like OpenCV and PIL (Pillow).
 
 
 
 
 
 
170
  """)
 
171
  st.code("""
172
  from PIL import Image
 
 
173
 
174
  # Open an image file
175
  image = Image.open('sample_image.jpg')
@@ -178,64 +172,61 @@ image.show()
178
  # Convert image to grayscale
179
  gray_image = image.convert('L')
180
  gray_image.show()
181
- """, language='python')
182
 
183
- st.header("🎥 Handling Video Data")
184
- st.markdown("""
185
- Videos can be processed frame by frame using OpenCV.
186
- """)
187
- st.code("""
188
- import cv2
189
 
190
- # Capture video
191
- video = cv2.VideoCapture('sample_video.mp4')
 
192
 
193
- while video.isOpened():
194
- ret, frame = video.read()
195
- if not ret:
196
- break
197
- cv2.imshow('Frame', frame)
198
- if cv2.waitKey(25) & 0xFF == ord('q'):
199
- break
200
 
201
- video.release()
202
- cv2.destroyAllWindows()
 
 
 
203
  """, language='python')
204
 
205
- st.header("🔊 Handling Audio Data")
206
  st.markdown("""
207
- Audio data can be handled using libraries like librosa.
 
 
 
 
 
208
  """)
209
- st.code("""
210
- import librosa
211
- import librosa.display
212
- import matplotlib.pyplot as plt
213
-
214
- # Load audio file
215
- y, sr = librosa.load('sample_audio.mp3')
216
- librosa.display.waveshow(y, sr=sr)
217
- plt.title('Waveform')
218
- plt.show()
219
- """, language='python')
220
 
 
221
  st.markdown("### Challenges with Unstructured Data")
222
  st.write("""
223
- - **Noise and Inconsistency**: Data is often incomplete or noisy.
224
- - **Storage Requirements**: Large size and variability in data types.
225
- - **Processing Time**: Analyzing unstructured data is computationally expensive.
226
  """)
227
 
228
  st.markdown("### Solutions")
229
  st.write("""
230
- - **Data Cleaning**: Preprocess data to remove noise.
231
- - **Efficient Storage**: Use NoSQL databases (e.g., MongoDB) or cloud storage.
232
- - **Parallel Processing**: Utilize frameworks like Apache Spark.
233
  """)
234
 
235
- # Back to Data Collection
 
 
 
 
236
  if st.button("Back to Data Collection"):
237
  st.session_state.page = "data_collection"
238
 
 
 
 
239
  # ----------------- Semi-Structured Data Page -----------------
240
  def semi_structured_data_page():
241
  st.title(":orange[Semi-Structured Data]")
 
132
  st.session_state.page = "structured_data"
133
 
134
  # ----------------- Unstructured Data Page -----------------
135
+
136
+ from PIL import Image
137
+ import numpy as np
138
+ import matplotlib.pyplot as plt
139
+
140
  def unstructured_data_page():
141
  st.title(":blue[Unstructured Data]")
142
 
143
  st.markdown("""
144
+ *Unstructured data* does not have a predefined format. It consists of various data types like text, images, videos, and audio files.
145
  Examples include:
 
146
  - Images (e.g., .jpg, .png)
147
  - Videos (e.g., .mp4, .avi)
 
148
  - Social media posts
149
  """)
150
 
151
+ ### Handling Image Data Section
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  st.header("🖼️ Handling Image Data")
153
  st.markdown("""
154
+ Image data can be processed using libraries like OpenCV and PIL (Pillow). Images often need to be preprocessed for tasks like analysis, classification, or feature extraction. Common operations include:
155
+ - **Reading and displaying images**
156
+ - **Converting to grayscale**
157
+ - **Resizing and cropping**
158
+ - **Rotating and flipping**
159
+ - **Applying filters**
160
+ - **Edge detection and other transformations**
161
  """)
162
+
163
  st.code("""
164
  from PIL import Image
165
+ import numpy as np
166
+ import matplotlib.pyplot as plt
167
 
168
  # Open an image file
169
  image = Image.open('sample_image.jpg')
 
172
  # Convert image to grayscale
173
  gray_image = image.convert('L')
174
  gray_image.show()
 
175
 
176
+ # Resize the image
177
+ resized_image = image.resize((200, 200))
178
+ resized_image.show()
 
 
 
179
 
180
+ # Rotate the image by 90 degrees
181
+ rotated_image = image.rotate(90)
182
+ rotated_image.show()
183
 
184
+ # Convert the image to a NumPy array and display its shape
185
+ image_array = np.array(image)
186
+ print(image_array.shape)
 
 
 
 
187
 
188
+ # Display the image array as a plot
189
+ plt.imshow(image)
190
+ plt.title("Original Image")
191
+ plt.axis('off')
192
+ plt.show()
193
  """, language='python')
194
 
 
195
  st.markdown("""
196
+ **Common Image Processing Techniques:**
197
+ - **Resizing**: Adjust the dimensions of an image for uniformity in models.
198
+ - **Cropping**: Extract a region of interest (ROI) from an image.
199
+ - **Grayscale Conversion**: Simplify image data by reducing it to a single channel.
200
+ - **Rotation/Flipping**: Perform augmentations to increase the dataset for model training.
201
+ - **Edge Detection**: Identify edges in images using filters like the Sobel or Canny filters.
202
  """)
 
 
 
 
 
 
 
 
 
 
 
203
 
204
+ ### Challenges and Solutions Section
205
  st.markdown("### Challenges with Unstructured Data")
206
  st.write("""
207
+ - *Noise and Inconsistency*: Data is often incomplete or noisy.
208
+ - *Storage Requirements*: Large size and variability in data types.
209
+ - *Processing Time*: Analyzing unstructured data is computationally expensive.
210
  """)
211
 
212
  st.markdown("### Solutions")
213
  st.write("""
214
+ - *Data Cleaning*: Preprocess data to remove noise.
215
+ - *Efficient Storage*: Use NoSQL databases (e.g., MongoDB) or cloud storage.
216
+ - *Parallel Processing*: Utilize frameworks like Apache Spark.
217
  """)
218
 
219
+ # Button to Navigate to Introduction to Image
220
+ if st.button("Introduction to Image"):
221
+ st.session_state.page = "introduction_to_image"
222
+
223
+ # Navigation Button
224
  if st.button("Back to Data Collection"):
225
  st.session_state.page = "data_collection"
226
 
227
+
228
+
229
+
230
  # ----------------- Semi-Structured Data Page -----------------
231
  def semi_structured_data_page():
232
  st.title(":orange[Semi-Structured Data]")