Keane Moraes commited on
Commit
04b8ab3
·
1 Parent(s): 9a105fd

fixed requirements

Browse files
Files changed (4) hide show
  1. app.py +4 -3
  2. prompter/insights_33.prompt +0 -41
  3. requirements.txt +4 -2
  4. utils.py +9 -3
app.py CHANGED
@@ -11,6 +11,9 @@ file2 = st.file_uploader("Upload a file", type=["md", "txt"], key="second")
11
  topics = {}
12
  results = {}
13
 
 
 
 
14
  if file1 is not None and file2 is not None:
15
 
16
  input_text1 = file1.read().decode("utf-8")
@@ -32,9 +35,7 @@ if file1 is not None and file2 is not None:
32
  topics['insight2'] = [keywords2, concepts2]
33
 
34
  with st.spinner("Flux capacitor is fluxing..."):
35
- embedder = utils.load_model()
36
- clutered = utils.cluster_based_on_topics(embedder, cleaned_text1, cleaned_text2, num_clusters=5)
37
- # print(clutered)
38
 
39
  with st.spinner("Polishing up"):
40
  results = utils.generate_insights(topics, file1.name, file2.name, cleaned_text1, cleaned_text2, clutered)
 
11
  topics = {}
12
  results = {}
13
 
14
+ embedder = utils.load_model()
15
+ nlp = utils.load_nlp()
16
+
17
  if file1 is not None and file2 is not None:
18
 
19
  input_text1 = file1.read().decode("utf-8")
 
35
  topics['insight2'] = [keywords2, concepts2]
36
 
37
  with st.spinner("Flux capacitor is fluxing..."):
38
+ clutered = utils.cluster_based_on_topics(nlp, embedder, cleaned_text1, cleaned_text2, num_clusters=3)
 
 
39
 
40
  with st.spinner("Polishing up"):
41
  results = utils.generate_insights(topics, file1.name, file2.name, cleaned_text1, cleaned_text2, clutered)
prompter/insights_33.prompt DELETED
@@ -1,41 +0,0 @@
1
- You are a highly intelligent bot that is tasked with common ideas between documents. The following are two documents that have been topic modelled and have been clustered based on concepts.
2
-
3
- The name for document 1 is : Good conversations have lots of doorknobs.md
4
-
5
- The name for document 2 is : First we shape our social graph; then it shapes us.md
6
-
7
- The topics for document 1 is : spiderman,singing,musical,chorus,song,singing something,about spiderman,spiderman spiderman,spiderman,spiderman and
8
-
9
- The topics for document 2 is : chimpanzees,genetically,womb,consciously,upbringings,from chimpanzees,chimpanzees as,chimpanzees in,chimpanzees and,chimpanzees
10
-
11
- The more complex concepts in document 1 is : singing like spiderman,spiderman sudden pianist,songs spiderman scientific,just songs spiderman,excerpt spiderman boyfriend
12
-
13
- The more complex concepts in document 2 is : chimpanzees born habitat,die chimpanzees born,sets apart chimpanzees,fast die chimpanzees,old children chimpanzees
14
-
15
- The sentences in one of the clusters is : ask remove mask spot really hard, trick kept us afloat called “take-and-take focus,” meaning whoever singing keep going someone jumped take spotlight them, happen quickly often.
16
- it’s easy remember lonely feels taker refuses cede spotlight you, easy forget lovely feels don’t want spotlight taker lets recline mezzanine fill stage.
17
- it’s often unclear, stand around waiting someone else take turn invite us take ours.
18
- we’re standing perimeter empty dance circle, takers martyrs launch middle .
19
-
20
- From the sentences and topics above, explain the common idea between the documents and write a paragraph about it and give me 3 new concepts that are linked to this idea.
21
- You output format should be:
22
-
23
- """
24
- name: <FILL-CONCEPT-NAME-HERE>
25
- description: <FILL-CONCEPT-DESCRIPTION-HERE>
26
- related:
27
- - <FILL-RELATED-CONCEPT-1>
28
- - <FILL-RELATED-CONCEPT-2>
29
- - <FILL-RELATED-CONCEPT-3>
30
- """
31
-
32
- The common idea between the documents is the importance of collaboration and teamwork. In the first document, the idea of collaboration is explored in the context of music, with the chorus singing together to create a beautiful song. In the second document, the idea of collaboration is explored in the context of chimpanzees, with the idea that they work together to survive and thrive in their environment.
33
-
34
- The concept of collaboration is an important one, and it is essential for any group of individuals to work together to achieve a common goal.
35
-
36
- name: Group Dynamics
37
- description: Group dynamics is the study of how people interact in groups and how their behavior affects the group as a whole.
38
- related:
39
- - Interpersonal Relationships
40
- - Social Interaction
41
- - Conflict Resolution
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,8 +1,10 @@
1
  keybert==0.7.0
2
- mdforest==1.5.0
3
  nltk==3.8.1
 
4
  openai==0.27.2
5
- pandas==1.5.3
 
6
  sentence_transformers==2.2.2
7
  spacy==3.5.2
8
  streamlit==1.21.0
 
1
  keybert==0.7.0
2
+ mdforest==1.5.4
3
  nltk==3.8.1
4
+ numpy==1.23.5
5
  openai==0.27.2
6
+ scikit_learn==1.2.2
7
+ en_core_web_sm
8
  sentence_transformers==2.2.2
9
  spacy==3.5.2
10
  streamlit==1.21.0
utils.py CHANGED
@@ -5,6 +5,7 @@ from transformers import AutoTokenizer
5
  import os, re, json
6
  import openai
7
  import spacy
 
8
  from sklearn.cluster import KMeans, AgglomerativeClustering
9
  import numpy as np
10
  from sentence_transformers import SentenceTransformer
@@ -27,6 +28,11 @@ def load_model():
27
  embedder = SentenceTransformer(MODEL)
28
  return embedder
29
 
 
 
 
 
 
30
  def create_nest_sentences(document:str, token_max_length = 1023):
31
  nested = []
32
  sent = []
@@ -66,9 +72,9 @@ def generate_keywords(kw_model, document: str) -> list:
66
  final_topics.append(extraction[0])
67
  return final_topics
68
 
69
- def cluster_based_on_topics(embedder, text1:str, text2:str, num_clusters=3):
70
- nlp = spacy.load("en_core_web_sm")
71
-
72
  # Preprocess and tokenize the texts
73
  doc1 = nlp(preprocess(text1))
74
  doc2 = nlp(preprocess(text2))
 
5
  import os, re, json
6
  import openai
7
  import spacy
8
+ import en_core_web_sm
9
  from sklearn.cluster import KMeans, AgglomerativeClustering
10
  import numpy as np
11
  from sentence_transformers import SentenceTransformer
 
28
  embedder = SentenceTransformer(MODEL)
29
  return embedder
30
 
31
+ @st.cache_data
32
+ def load_nlp():
33
+ nlp = en_core_web_sm.load()
34
+ return nlp
35
+
36
  def create_nest_sentences(document:str, token_max_length = 1023):
37
  nested = []
38
  sent = []
 
72
  final_topics.append(extraction[0])
73
  return final_topics
74
 
75
+
76
+ def cluster_based_on_topics(nlp, embedder, text1:str, text2:str, num_clusters=3):
77
+
78
  # Preprocess and tokenize the texts
79
  doc1 = nlp(preprocess(text1))
80
  doc2 = nlp(preprocess(text2))