aannor commited on
Commit
20329e9
·
0 Parent(s):

Duplicate from viewervoice-analytics/dev

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.streamlit/config.toml ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ [theme]
2
+ base="light"
3
+ primaryColor="#2b67e0"
4
+ backgroundColor="#eaf8ff"
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ emoji: 🤔
3
+ sdk: streamlit
4
+ duplicated_from: viewervoice-analytics/dev
5
+ title: dev
6
+ colorFrom: blue
7
+ colorTo: green
8
+ ---
9
+ # ViewerVoice
10
+
11
+ <p align="center">
12
+ <img width="620" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/649afd299cc39ae39ce9cc04/xWZdzLZtbtYFJFpYhD_sr.png">
13
+ </p>
14
+
15
+ ## 🏁Introduction
16
+ Welcome to ViewerVoice, the tool that allows you to gain insights into what viewers have commented under a YouTube video. We created this dashboard particularly to help YouTubers better understand their audience and allow for data-driven decision making, enabling them to strategise and build their brand. As well as to help brands understand the audience of potential creators they are looking to partner with.
17
+
18
+ This dashboard is still under development; further updates will be implemented in due course.
19
+
20
+ ## 🚶‍♂️Walkthrough
21
+ ### 🗨️Retrieving YouTube Comments
22
+ To retrieve comments we used the YouTube Data API which provides various functionalities, including retrieving YouTube comments. At present we do not read in comment replies. Please refer to the instructions provided in the following link to acquire your unique API key. Please be aware that each API key facilitates up to 10,000 API calls within a 24-hour period.
23
+ https://developers.google.com/youtube/v3/getting-started
24
+
25
+ ### 👩‍💻Implementing Natural Language Processing Models
26
+ ViewerVoice covers the following areas of NLP driven by BERT based models.
27
+
28
+ - **Topic modelling** groups the comments into topics so that you can gain clearer insights into the key themes viewers are discussing in the comment section. For optimal performance of the current topic model, we recommend retrieving thousands of comments.
29
+
30
+ - **Sentiment analysis** identifies whether comments are positive, negative or neutral. This provides you with an overview of what viewers feel toward aspects of the content. Please note that at present, the sentiment analysis does not take emojis into account.
31
+
32
+ - **Semantic search** allows you to not only search for comments with exact matches to words, but also for comments that contain similar words. For example, if you search for 'music', comments containing 'music' as well as 'song' and 'vinyl' can show up. This feature enables you to search for specifics in a comment section - for example, perhaps you are a YouTuber that has collaborated with another creator or advertised for a brand you have partnered with, you can search for the name of the creator or brand in your comments to see how your viewers reacted to this.
33
+
34
+ ### 📊Creating a Streamlit WebApp
35
+ Streamlit is an open-source Python library that enables the creation and sharing of data driven web applications. Leveraging Streamlit's versatile functions, we've tailored our dashboard to cover a range of essential features, such as:
36
+
37
+ - Interactive input widgets: allowing users to query their preferred videos and apply personalised filters.
38
+ - Comprehensive Python graphs: seamlessly integrating a variety of visualisations into the app.
39
+ - Custom HTML and CSS: offering the freedom to fine-tune the application's style to our preferences.
40
+
41
+ ## 🔍Results
42
+ We hope the insights provided by ViewerVoice will enable YouTubers to cater to their audience and use data-derived observations to negotiate opportunities and aid brands in gaining a deeper understanding of the intended audience of prospective partnerships. Future developments look to include but are not limited to improving the UI/UX design and implementing LLMs to give enhanced and more comprehensible results.
app.py ADDED
@@ -0,0 +1,383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from git import Repo
3
+ import streamlit as st
4
+ import time
5
+ from PIL import Image
6
+ import base64
7
+ from transformers import pipeline
8
+ import spacy
9
+ import googleapiclient
10
+ import numpy as np
11
+ from sentence_transformers import SentenceTransformer
12
+ from matplotlib import colormaps
13
+ from matplotlib.colors import ListedColormap
14
+
15
+ GITHUB_PAT = os.environ['GITHUB']
16
+ SENTIMENT = os.environ['SENTIMENT']
17
+ EMBEDDING = os.environ['EMBEDDING']
18
+
19
+ if not os.path.exists('repo_directory'):
20
+ try:
21
+ Repo.clone_from(f'https://marcus-t-s:{GITHUB_PAT}@github.com/marcus-t-s/yt-comment-analyser.git', 'repo_directory' )
22
+ except:
23
+ st.error("Error: Oops there's an issue on our end, please wait a moment and try again.")
24
+ st.stop()
25
+
26
+ from repo_directory.all_utils import *
27
+
28
+
29
+ # Streamlit configuration
30
+ st.set_page_config(
31
+ page_title="ViewerVoice | YouTube Comment Analyser",
32
+ layout="wide",
33
+ page_icon=Image.open('page_icon.png')
34
+ )
35
+
36
+
37
+ # Define and load cached resources
38
+ @st.cache_resource
39
+ def load_models():
40
+ sentiment_pipeline = pipeline("sentiment-analysis", model=f'{SENTIMENT}')
41
+ embedding_model = SentenceTransformer(f'{EMBEDDING}')
42
+ spacy_nlp = spacy.load("en_core_web_sm")
43
+ add_custom_stopwords(spacy_nlp, {"bring", "know", "come"})
44
+ return sentiment_pipeline, embedding_model, spacy_nlp
45
+
46
+
47
+ @st.cache_resource
48
+ def load_colors_image():
49
+ mask = np.array(Image.open('youtube_icon.jpg'))
50
+ Reds = colormaps['Reds']
51
+ colors = ListedColormap(Reds(np.linspace(0.4, 0.8, 256)))
52
+ with open("viewervoice_logo_crop.png", "rb") as img_file:
53
+ logo_image = base64.b64encode(img_file.read()).decode("utf-8")
54
+ return mask, colors, logo_image
55
+
56
+
57
+ sentiment_pipeline, embedding_model, spacy_nlp = load_models()
58
+ mask, colors, logo_image = load_colors_image()
59
+
60
+
61
+ # Hide line at the top and "made with streamlit" text
62
+ hide_decoration_bar_style = """
63
+ <style>
64
+ header {visibility: hidden;}
65
+ footer {visibility: hidden;}
66
+ </style>
67
+ """
68
+ st.markdown(hide_decoration_bar_style, unsafe_allow_html=True)
69
+
70
+ main_page = st.container()
71
+
72
+ if 'YouTubeParser' not in st.session_state:
73
+ st.session_state['YouTubeParser'] = YoutubeCommentParser()
74
+ if 'comment_fig' not in st.session_state:
75
+ st.session_state["comment_fig"] = None
76
+ st.session_state["wordcloud_fig"] = None
77
+ st.session_state["topic_fig"] = None
78
+ st.session_state["sentiment_fig"] = None
79
+ if 'rerun_button' not in st.session_state:
80
+ st.session_state['rerun_button'] = "INIT"
81
+ if 'topic_filter' not in st.session_state:
82
+ st.session_state['topic_filter'] = False
83
+ if 'sentiment_filter' not in st.session_state:
84
+ st.session_state['sentiment_filter'] = False
85
+ if 'filter_state' not in st.session_state:
86
+ st.session_state['filter_state'] = "INIT"
87
+ if 'video_link' not in st.session_state:
88
+ st.session_state["video_link"] = None
89
+ if 'num_comments' not in st.session_state:
90
+ st.session_state['num_comments'] = None
91
+
92
+ # Set reference to YouTubeParser object for more concise code
93
+ yt_parser = st.session_state['YouTubeParser']
94
+
95
+
96
+ def query_comments_button():
97
+ # Delete larger objects from session state to later replace
98
+ del st.session_state["comment_fig"]
99
+ del st.session_state["wordcloud_fig"]
100
+ del st.session_state["topic_fig"]
101
+ del st.session_state["sentiment_fig"]
102
+ del st.session_state["YouTubeParser"]
103
+
104
+ # Reset session state variables back to placeholder values
105
+ st.session_state.rerun_button = "QUERYING"
106
+ st.session_state['filter_state'] = "INIT"
107
+ st.session_state["topic_filter"] = False
108
+ st.session_state["sentiment_filter"] = False
109
+ st.session_state["semantic_filter"] = False
110
+ st.session_state["figures_built"] = False
111
+ st.session_state["comment_fig"] = None
112
+ st.session_state["wordcloud_fig"] = None
113
+ st.session_state["topic_fig"] = None
114
+ st.session_state["sentiment_fig"] = None
115
+ st.session_state["YouTubeParser"] = YoutubeCommentParser()
116
+
117
+
118
+ def filter_visuals_button():
119
+ st.session_state["filter_state"] = "FILTERING"
120
+
121
+
122
+ with st.sidebar:
123
+ st.session_state["api_key"] = st.text_input('YouTube API key', value="", type='password')
124
+ st.session_state["video_link"] = st.text_input('YouTube Video URL', value="")
125
+ st.session_state["max_comments"] = st.slider(label="Maximum number of comments to query",
126
+ min_value=100,
127
+ max_value=3000,
128
+ step=100)
129
+ st.session_state["max_topics"] = st.slider(label="Maximum number of topics",
130
+ min_value=5,
131
+ max_value=20,
132
+ step=1)
133
+ st.button('Query comments :left_speech_bubble:', on_click=query_comments_button)
134
+
135
+ with main_page:
136
+ # Reduce space at the top
137
+ reduce_header_height_style = """
138
+ <style>
139
+ div.block-container {padding-top:0rem;}
140
+ div.block-container {padding-bottom:1rem;}
141
+ div.block-container {padding-left:1.5rem;}
142
+ </style>
143
+ """
144
+ st.markdown(reduce_header_height_style, unsafe_allow_html=True)
145
+
146
+ # Title and intro section
147
+ markdown_content = f"""
148
+ <div style='display: flex; align-items: center; justify-content: center;'>
149
+ <img src='data:image/png;base64,{logo_image}' height='135px';/>
150
+ </div>
151
+ """
152
+ st.markdown(markdown_content, unsafe_allow_html=True)
153
+
154
+ # LinkedIn links
155
+ lnk = '<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">'
156
+ st.markdown(lnk + """
157
+ <div style="display: flex; justify-content: center; align-items: center; flex-direction: column;">
158
+ <br>
159
+ <p style="text-align: center;"><b>Made by</b>
160
+ <b>
161
+ <a href='https://www.linkedin.com/in/afiba-7715ab166/' style="text-decoration: none">
162
+ <i class='fa fa-linkedin-square'></i>&nbsp;<span style='color: #000000'>Afiba Annor</span></a>
163
+ <a href='https://www.linkedin.com/in/marcus-singh-305927172/' style="text-decoration: none">
164
+ <i class='fa fa-linkedin-square'></i>&nbsp;<span style='color: #000000'>Marcus Singh</span></a>
165
+ </b></p>
166
+ </div>
167
+
168
+ """, unsafe_allow_html=True)
169
+
170
+ st.markdown("<hr>", unsafe_allow_html=True)
171
+ # Notes section
172
+ st.markdown("<p style='font-size: 1.3rem;'><b>📝 Notes</b></p>", unsafe_allow_html=True)
173
+
174
+ html_content = """
175
+ <ul>
176
+ <li style='font-size: 0.95rem;'>This dashboard is still under development; further updates will be implemented
177
+ in due course.</li>
178
+ <li style='font-size: 0.95rem;'>Kindly refer to the instructions provided in this
179
+ <a href='https://developers.google.com/youtube/v3/getting-started'>link</a>.
180
+ This will guide you in acquiring your API key to retrieve comments.</li>
181
+ <li style='font-size: 0.95rem;'>Please be aware that each API key facilitates up to 10,000 API calls within
182
+ a 24-hour period.</li>
183
+ <li style='font-size: 0.95rem;'>Currently, the dashboard caters to comments in English and does not
184
+ include comment replies.</li>
185
+ <li style='font-size: 0.95rem;'>Comments undergo cleaning and pre-processing to optimise modelling. As a result,
186
+ the returned comment count may fall short of the maximum queried amount.</li>
187
+ <li style='font-size: 0.95rem;'>Please note that the sentiment analysis currently does not take emojis into
188
+ account.</li>
189
+ <li style='font-size: 0.95rem;'>For optimal performance of the current topic model, we recommend retrieving
190
+ thousands of comments.</li>
191
+ <li style='font-size: 0.95rem;'>Please anticipate that querying comments and running the models may require
192
+ a few minutes to complete.</li>
193
+ </ul>
194
+ <hr>
195
+ """
196
+ # Display the HTML content using st.markdown()
197
+ st.markdown(html_content, unsafe_allow_html=True)
198
+
199
+ # Query comments section
200
+ if (st.session_state.rerun_button == "QUERYING") and (st.session_state["video_link"] is not None):
201
+ with st.spinner('Querying comments and running models'):
202
+ yt_parser = st.session_state["YouTubeParser"]
203
+ try:
204
+ yt_parser.build_youtube_api(st.session_state['api_key'])
205
+ except:
206
+ st.error("Error: Unable to query comments, please check your API key")
207
+ st.stop()
208
+ try:
209
+ yt_parser.query_comments(st.session_state['video_link'], st.session_state['max_comments'])
210
+ except googleapiclient.errors.HttpError:
211
+ st.error("Error: Unable to query comments, please check your API key.")
212
+ st.stop()
213
+ except:
214
+ st.error("Error: Unable to query comments, incorrect YouTube URL or maximum \
215
+ API call limit reached.")
216
+ st.stop()
217
+
218
+ # Run formatting and models
219
+ yt_parser.format_comments()
220
+ yt_parser.clean_comments()
221
+ yt_parser.run_sentiment_pipeline(sentiment_pipeline)
222
+ yt_parser.run_topic_modelling_pipeline(embedding_model,
223
+ nlp=spacy_nlp,
224
+ max_topics=st.session_state['max_topics'])
225
+ # Set "QUERY COMPLETE" to bypass running this section on script re-run
226
+ st.session_state.rerun_button = "QUERY COMPLETE"
227
+
228
+ # Once comments are queried, build charts ready to visualise
229
+ if st.session_state.rerun_button == "QUERY COMPLETE":
230
+ # Check for built figures:
231
+ if (not st.session_state["figures_built"]) or (st.session_state.filter_state == "FILTERING"):
232
+ # Select colors for wordcloud
233
+
234
+ # If filtering button pressed
235
+ if st.session_state.filter_state == "FILTERING":
236
+ df_filtered = yt_parser.df_comments.copy()
237
+ if st.session_state["topic_filter"]:
238
+ df_filtered = df_filtered.query(f"Topic == {st.session_state.topic_filter}")
239
+ if st.session_state["sentiment_filter"]:
240
+ df_filtered = df_filtered.query(f"Sentiment == {st.session_state.sentiment_filter}")
241
+ if st.session_state["semantic_filter"]:
242
+ df_filtered = semantic_search(df=df_filtered, query=st.session_state["semantic_filter"],
243
+ embedding_model=embedding_model,
244
+ text_col='Comment_Clean')
245
+ if len(df_filtered) == 0:
246
+ st.session_state['num_comments'] = 0
247
+
248
+ else:
249
+ st.session_state['num_comments'] = len(df_filtered)
250
+ # Build filtered table figure
251
+ st.session_state["table_fig"] = comments_table(df_filtered,
252
+ ['publishedAt', 'Comment_Formatted', 'Likes',
253
+ 'Sentiment', 'Topic'],
254
+ {'publishedAt': 'Date', 'Comment_Formatted': 'Comment'})
255
+
256
+ # Build filtered wordcloud figure
257
+ st.session_state["wordcloud_fig"] = comment_wordcloud(df_filtered, mask, colors)
258
+
259
+ # Build filtered topic figure
260
+ st.session_state["topic_fig"] = topic_treemap(df_filtered, "Topic")
261
+
262
+ # Build filtered sentiment figure
263
+ st.session_state["sentiment_fig"] = sentiment_chart(df_filtered, "Sentiment")
264
+
265
+ st.session_state["figures_built"] = True
266
+
267
+ st.session_state.filter_state = "FILTERED"
268
+
269
+ # No filtering selected
270
+ else:
271
+ st.session_state['num_comments'] = len(yt_parser.df_comments)
272
+
273
+ # Can only build graphs if we have comments
274
+ if st.session_state['num_comments'] > 0:
275
+ try:
276
+ # Build unfiltered table figure
277
+ st.session_state["table_fig"] = comments_table(yt_parser.df_comments,
278
+ ['publishedAt', 'Comment_Formatted', 'Likes',
279
+ 'Sentiment', 'Topic'],
280
+ {'publishedAt': 'Date',
281
+ 'Comment_Formatted': 'Comment'})
282
+ # Build unfiltered wordcloud figure
283
+ st.session_state["wordcloud_fig"] = comment_wordcloud(yt_parser.df_comments,
284
+ mask, colors)
285
+ # Build unfiltered topic figure
286
+ st.session_state["topic_fig"] = topic_treemap(yt_parser.df_comments, "Topic")
287
+ # Build unfiltered sentiment figure
288
+ st.session_state["sentiment_fig"] = sentiment_chart(yt_parser.df_comments, "Sentiment")
289
+
290
+ st.session_state["figures_built"] = True
291
+ except:
292
+ st.error("Error: Oops there's an issue on our end, please wait a moment and try again.")
293
+ st.stop()
294
+
295
+ with main_page:
296
+ if st.session_state.rerun_button == "QUERY COMPLETE":
297
+ st.subheader(f"{yt_parser.title}")
298
+ st.markdown("<hr><br>", unsafe_allow_html=True)
299
+
300
+ if st.session_state['num_comments'] > 0:
301
+ table_col, word_cloud_col = st.columns([0.55, 0.45])
302
+ with table_col:
303
+ st.markdown(f"""<p style='font-size: 1.3rem;
304
+ display: flex; align-items: center; justify-content: center;'><b>
305
+ Comments</b></p>""", unsafe_allow_html=True)
306
+ st.plotly_chart(st.session_state["table_fig"], use_container_width=True)
307
+
308
+ with word_cloud_col:
309
+ st.markdown(f"""<p style='font-size: 1.3rem;
310
+ display: flex; align-items: center; justify-content: center;'><b>
311
+ Word Cloud</b></p>""", unsafe_allow_html=True)
312
+
313
+ st.pyplot(st.session_state["wordcloud_fig"], use_container_width=True)
314
+
315
+ treemap_col, sentiment_donut_col = st.columns([0.55, 0.45])
316
+
317
+ with treemap_col:
318
+ st.markdown(f"""<p style='font-size: 1.3rem;
319
+ display: flex; align-items: center; justify-content: center;'><b>
320
+ Topic Proportions</b></p>""", unsafe_allow_html=True)
321
+
322
+ st.plotly_chart(st.session_state["topic_fig"], use_container_width=True)
323
+
324
+ with sentiment_donut_col:
325
+ st.markdown(f"""<p style='font-size: 1.3rem;
326
+ display: flex; align-items: center; justify-content: center;'><b>
327
+ Sentiment Distribution</b></p>""", unsafe_allow_html=True)
328
+ st.plotly_chart(st.session_state["sentiment_fig"], use_container_width=True)
329
+
330
+ # st.table(yt_parser.df_comments.head())
331
+ else:
332
+ st.write("Unfortunately we couldn't find any comments for this set of filters, please try "
333
+ "editing the filters and try again")
334
+
335
+ with st.sidebar:
336
+ # Define the HTML and CSS for the button-style container
337
+ if st.session_state['num_comments'] is not None:
338
+ num_comments = st.session_state['num_comments']
339
+ else:
340
+ num_comments = 0
341
+ htmlstr = f"""
342
+ <p style='background-color: rgb(255, 255, 255, 0.75);
343
+ color: rgb(0, 0, 0, 0.75);
344
+ font-size: 40px;
345
+ border-radius: 7px;
346
+ padding-top: 25px;
347
+ padding-bottom: 25px;
348
+ padding-right: 25px;
349
+ padding-left: 25px;
350
+ line-height:25px;
351
+ display: flex;
352
+ align-items: center;
353
+ justify-content: center;
354
+ box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);'>
355
+ &nbsp;{num_comments}</p>
356
+ """
357
+ # Display the button-style container with number of comments
358
+ st.subheader("Number of comments")
359
+ st.markdown(htmlstr, unsafe_allow_html=True)
360
+
361
+ # Filters section
362
+ st.subheader("Filters")
363
+
364
+ if yt_parser.df_comments is not None:
365
+ st.session_state["topic_filter"] = st.multiselect("Topic",
366
+ options=sorted(list(yt_parser.df_comments['Topic'].unique())))
367
+ st.session_state["sentiment_filter"] = st.multiselect("Sentiment",
368
+ options=list(yt_parser.df_comments['Sentiment'].unique()))
369
+ st.session_state["semantic_filter"] = st.text_input("Keyword search",
370
+ max_chars=30)
371
+ st.button('Filter visualisations :sleuth_or_spy:', on_click=filter_visuals_button)
372
+
373
+ else:
374
+ st.multiselect("Topic",
375
+ options=["Please query comments from a video"],
376
+ disabled=True)
377
+ st.multiselect("Sentiment",
378
+ options=["Please query comments from a video"],
379
+ disabled=True)
380
+ st.text_input("Keyword search",
381
+ disabled=True)
382
+ st.button('Please query comments before filtering',
383
+ disabled=True)
page_icon.png ADDED
requirements.txt ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ streamlit
2
+ numpy==1.24.4
3
+ urllib3==1.26.16
4
+ beautifulsoup4==4.12.2
5
+ git+https://github.com/scikit-learn-contrib/hdbscan.git
6
+ bertopic==0.15.0
7
+ contractions==0.1.73
8
+ google_api_python_client==2.96.0
9
+ pandas==1.5.3
10
+ python-dotenv==1.0.0
11
+ scikit_learn==1.2.2
12
+ sentence_transformers==2.2.2
13
+ spacy==3.6.1
14
+ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0.tar.gz
15
+ torch==2.0.1
16
+ transformers==4.31.0
17
+ umap==0.1.1
18
+ umap_learn==0.5.3
19
+ wordcloud==1.9.2
20
+ #need to install C++ this for hbscan https://visualstudio.microsoft.com/visual-cpp-build-tools/
viewervoice_logo_crop.png ADDED
youtube_icon.jpg ADDED