m7n commited on
Commit
887a63e
·
1 Parent(s): 6a831d2

added tier separation

Browse files
Files changed (1) hide show
  1. app.py +52 -9
app.py CHANGED
@@ -34,7 +34,10 @@ plt.style.use("opinionated_rc")
34
 
35
  from sklearn.neighbors import NearestNeighbors
36
 
37
-
 
 
 
38
  def is_running_in_hf_space():
39
  return "SPACE_ID" in os.environ
40
 
@@ -134,10 +137,27 @@ def no_op_decorator(func):
134
 
135
 
136
  if is_running_in_hf_space():
137
- @spaces.GPU(duration=4*60)
138
- def create_embeddings(texts_to_embedd):
 
 
 
 
 
 
 
 
 
 
139
  """Create embeddings for the input texts using the loaded model."""
140
  return model.encode(texts_to_embedd, show_progress_bar=True, batch_size=192)
 
 
 
 
 
 
 
141
  else:
142
  def create_embeddings(texts_to_embedd):
143
  """Create embeddings for the input texts using the loaded model."""
@@ -177,8 +197,15 @@ def predict(request: gr.Request, text_input, sample_size_slider, reduce_sample_c
177
  payload = f"{payload}{'=' * ((4 - len(payload) % 4) % 4)}"
178
  payload = json.loads(base64.urlsafe_b64decode(payload).decode())
179
  print(payload)
180
- else:
181
- pass
 
 
 
 
 
 
 
182
 
183
  # Check if input is empty or whitespace
184
  print(f"Input: {text_input}")
@@ -256,7 +283,20 @@ def predict(request: gr.Request, text_input, sample_size_slider, reduce_sample_c
256
  progress(0.3, desc="Embedding Data...")
257
  texts_to_embedd = [f"{title} {abstract}" for title, abstract
258
  in zip(records_df['title'], records_df['abstract'])]
259
- embeddings = create_embeddings(texts_to_embedd)
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  print(f"Embeddings created in {time.time() - embedding_start:.2f} seconds")
261
 
262
  # Project embeddings
@@ -540,10 +580,9 @@ with gr.Blocks(theme=theme, css="""
540
 
541
  OpenAlex Mapper is a way of projecting search queries from the amazing OpenAlex database on a background map of randomly sampled papers from OpenAlex, which allows you to easily investigate interdisciplinary connections. OpenAlex Mapper was developed by [Maximilian Noichl](https://maxnoichl.eu) and [Andrea Loettgers](https://unige.academia.edu/AndreaLoettgers) at the [Possible Life project](http://www.possiblelife.eu/).
542
 
543
- To use OpenAlex Mapper, first head over to [OpenAlex](https://openalex.org/) and search for something that interests you. For example, you could search for all the papers that make use of the [Kuramoto model](https://openalex.org/works?page=1&filter=default.search%3A%22Kuramoto%20Model%22), for all the papers that were published by researchers at [Utrecht University in 2019](https://openalex.org/works?page=1&filter=authorships.institutions.lineage%3Ai193662353,publication_year%3A2019), or for all the papers that cite Wittgenstein's [Philosophical Investigations](https://openalex.org/works?page=1&filter=cites%3Aw4251395411). Then you copy the URL to that search query into the OpenAlex search URL box below and click "Run Query." It will download all of these records from OpenAlex and embed them on our interactive map. As the embedding step is a little expensive, computationally, it's often a good idea to play around with smaller samples, before running a larger analysis. After a little time, that map will appear and be available for you to interact with and download. You can find more explanations in the FAQs below.
544
-
545
- **Note:** Due to some bugs in Gradio, this project currently does not work in the Safari-Browser.
546
  </div>
 
547
  """)
548
 
549
 
@@ -637,6 +676,10 @@ with gr.Blocks(theme=theme, css="""
637
 
638
  The base map for this project is developed by randomly downloading 250,000 articles from OpenAlex, then embedding their abstracts using our [fine-tuned](https://huggingface.co/m7n/discipline-tuned_specter_2_024) version of the [specter-2](https://huggingface.co/allenai/specter2_aug2023refresh_base) language model, running these embeddings through [UMAP](https://umap-learn.readthedocs.io/en/latest/) to give us a two-dimensional representation, and displaying that in an interactive window using [datamapplot](https://datamapplot.readthedocs.io/en/latest/index.html). After the data for your query is downloaded from OpenAlex, it then undergoes the exact same process, but the pre-trained UMAP model from earlier is used to project your new data points onto this original map, showing where they would show up if they were included in the original sample. For more details, you can take a look at the method section of this paper: **...**
639
 
 
 
 
 
640
  ## I want to add multiple queries at once!
641
 
642
  That can be a good idea, e. g. if your interested in a specific paper, as well as all the papers that cite it. Just add the queries to the query box and separate them with a ";" without any spaces in between!
 
34
 
35
  from sklearn.neighbors import NearestNeighbors
36
 
37
+ def is_running_in_hf_zero_gpu():
38
+ print(os.environ.get("SPACES_ZERO_GPU"))
39
+ return os.environ.get("SPACES_ZERO_GPU")
40
+
41
  def is_running_in_hf_space():
42
  return "SPACE_ID" in os.environ
43
 
 
137
 
138
 
139
  if is_running_in_hf_space():
140
+ @spaces.GPU(duration=30)
141
+ def create_embeddings_30(texts_to_embedd):
142
+ """Create embeddings for the input texts using the loaded model."""
143
+ return model.encode(texts_to_embedd, show_progress_bar=True, batch_size=192)
144
+
145
+ @spaces.GPU(duration=59)
146
+ def create_embeddings_59(texts_to_embedd):
147
+ """Create embeddings for the input texts using the loaded model."""
148
+ return model.encode(texts_to_embedd, show_progress_bar=True, batch_size=192)
149
+
150
+ @spaces.GPU(duration=120)
151
+ def create_embeddings_120(texts_to_embedd):
152
  """Create embeddings for the input texts using the loaded model."""
153
  return model.encode(texts_to_embedd, show_progress_bar=True, batch_size=192)
154
+
155
+ @spaces.GPU(duration=299)
156
+ def create_embeddings_299(texts_to_embedd):
157
+ """Create embeddings for the input texts using the loaded model."""
158
+ return model.encode(texts_to_embedd, show_progress_bar=True, batch_size=192)
159
+
160
+
161
  else:
162
  def create_embeddings(texts_to_embedd):
163
  """Create embeddings for the input texts using the loaded model."""
 
197
  payload = f"{payload}{'=' * ((4 - len(payload) % 4) % 4)}"
198
  payload = json.loads(base64.urlsafe_b64decode(payload).decode())
199
  print(payload)
200
+ user = payload['user']
201
+ if user == None:
202
+ user_type = "anonymous"
203
+ elif '[pro]' in user:
204
+ user_type = "pro"
205
+ else:
206
+ user_type = "registered"
207
+ print(f"User type: {user_type}")
208
+
209
 
210
  # Check if input is empty or whitespace
211
  print(f"Input: {text_input}")
 
283
  progress(0.3, desc="Embedding Data...")
284
  texts_to_embedd = [f"{title} {abstract}" for title, abstract
285
  in zip(records_df['title'], records_df['abstract'])]
286
+
287
+
288
+ if is_running_in_hf_space():
289
+ if len(texts_to_embedd) < 2000:
290
+ embeddings = create_embeddings_30(texts_to_embedd)
291
+ elif len(texts_to_embedd) < 4000 or user_type == "anonymous":
292
+ embeddings = create_embeddings_59(texts_to_embedd)
293
+ elif len(texts_to_embedd) < 8000:
294
+ embeddings = create_embeddings_120(texts_to_embedd)
295
+ else:
296
+ embeddings = create_embeddings_299(texts_to_embedd)
297
+ else:
298
+ embeddings = create_embeddings(texts_to_embedd)
299
+
300
  print(f"Embeddings created in {time.time() - embedding_start:.2f} seconds")
301
 
302
  # Project embeddings
 
580
 
581
  OpenAlex Mapper is a way of projecting search queries from the amazing OpenAlex database on a background map of randomly sampled papers from OpenAlex, which allows you to easily investigate interdisciplinary connections. OpenAlex Mapper was developed by [Maximilian Noichl](https://maxnoichl.eu) and [Andrea Loettgers](https://unige.academia.edu/AndreaLoettgers) at the [Possible Life project](http://www.possiblelife.eu/).
582
 
583
+ To use OpenAlex Mapper, first head over to [OpenAlex](https://openalex.org/) and search for something that interests you. For example, you could search for all the papers that make use of the [Kuramoto model](https://openalex.org/works?page=1&filter=default.search%3A%22Kuramoto%20Model%22), for all the papers that were published by researchers at [Utrecht University in 2019](https://openalex.org/works?page=1&filter=authorships.institutions.lineage%3Ai193662353,publication_year%3A2019), or for all the papers that cite Wittgenstein's [Philosophical Investigations](https://openalex.org/works?page=1&filter=cites%3Aw4251395411). Then you copy the URL to that search query into the OpenAlex search URL box below and click "Run Query." It will download all of these records from OpenAlex and embed them on our interactive map. As the embedding step is a little expensive, computationally, it's often a good idea to play around with smaller samples, before running a larger analysis (see below for a note on sample size and run-time). After a little time, that map will appear and be available for you to interact with and download. You can find more explanations in the FAQs below.
 
 
584
  </div>
585
+
586
  """)
587
 
588
 
 
676
 
677
  The base map for this project is developed by randomly downloading 250,000 articles from OpenAlex, then embedding their abstracts using our [fine-tuned](https://huggingface.co/m7n/discipline-tuned_specter_2_024) version of the [specter-2](https://huggingface.co/allenai/specter2_aug2023refresh_base) language model, running these embeddings through [UMAP](https://umap-learn.readthedocs.io/en/latest/) to give us a two-dimensional representation, and displaying that in an interactive window using [datamapplot](https://datamapplot.readthedocs.io/en/latest/index.html). After the data for your query is downloaded from OpenAlex, it then undergoes the exact same process, but the pre-trained UMAP model from earlier is used to project your new data points onto this original map, showing where they would show up if they were included in the original sample. For more details, you can take a look at the method section of this paper: **...**
678
 
679
+ ## I'm getting an "out of GPU credits" error.
680
+
681
+ Running the embedding process requires an expensive A100 GPU. To provide this, we make use of HuggingFace's ZeroGPU service. As an anonymous user, this entitles you to one minute of GPU runtime, which is enough for several small queries of around a thousand records every day. If you create a free account on HuggingFace, this should increase to five minutes of runtime, allowing you to run successful queries of up to 10,000 records at a time. If you need more, there's always the option to either buy a HuggingFace Pro subscription for roughly ten dollars a month (entitling you to 25 minutes of runtime every day) or get in touch with us to run the pipeline outside of the HuggingFace environment.
682
+
683
  ## I want to add multiple queries at once!
684
 
685
  That can be a good idea, e. g. if your interested in a specific paper, as well as all the papers that cite it. Just add the queries to the query box and separate them with a ";" without any spaces in between!