davidmezzetti commited on
Commit
6cf5129
·
verified ·
1 Parent(s): 595da74

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +8 -4
app.py CHANGED
@@ -54,6 +54,7 @@ class Application:
54
  # Calculate stat columns
55
  data["n_flagged_percent"] = 100 * (data["n_stars_flagged"] / data["n_stars"])
56
 
 
57
  data.columns = ["repo", "month", "clustered", "low activity", "total stars", "flagged stars", "flagged %"]
58
  return data[["repo", "month", "clustered", "low activity", "flagged stars", "total stars", "flagged %"]]
59
 
@@ -100,6 +101,9 @@ class Application:
100
  def parse(self, repos):
101
  """
102
  Parses and cleans the input repos string.
 
 
 
103
  """
104
 
105
  outputs = []
@@ -143,17 +147,17 @@ _4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Co
143
 
144
  _[Paper](https://arxiv.org/abs/2412.13459) | [GitHub Project](https://github.com/hehao98/StarScout)_
145
 
146
- Note the disclaimer from the paper's author's.
147
 
148
  **Disclaimer**. _As we discussed in Section 3.4 and 3.5 in our paper, the resulting dataset are only repositories and users with suspected
149
  fake stars. The individual repositories and users in our dataset may be false positives. The main purpose of our dataset is for statistical
150
  analyses (which tolerates noises reasonably well), not for publicly shaming individual repositories. If you intend to publish subsequent work
151
  based on our dataset, please be aware of this limitation and its ethical implications._
152
 
153
- To add to the author's disclaimer.
154
 
155
- _It's also worth noting that projects that trend on popular sites and the GitHub trending page tend to attract lots of automated behavior.
156
- This is just a data point that shouldn't be used in a vacuum._
157
  """
158
  )
159
 
 
54
  # Calculate stat columns
55
  data["n_flagged_percent"] = 100 * (data["n_stars_flagged"] / data["n_stars"])
56
 
57
+ # Rename and organize columns
58
  data.columns = ["repo", "month", "clustered", "low activity", "total stars", "flagged stars", "flagged %"]
59
  return data[["repo", "month", "clustered", "low activity", "flagged stars", "total stars", "flagged %"]]
60
 
 
101
  def parse(self, repos):
102
  """
103
  Parses and cleans the input repos string.
104
+
105
+ Returns:
106
+ list of repos
107
  """
108
 
109
  outputs = []
 
147
 
148
  _[Paper](https://arxiv.org/abs/2412.13459) | [GitHub Project](https://github.com/hehao98/StarScout)_
149
 
150
+ Note the disclaimer from the paper's authors.
151
 
152
  **Disclaimer**. _As we discussed in Section 3.4 and 3.5 in our paper, the resulting dataset are only repositories and users with suspected
153
  fake stars. The individual repositories and users in our dataset may be false positives. The main purpose of our dataset is for statistical
154
  analyses (which tolerates noises reasonably well), not for publicly shaming individual repositories. If you intend to publish subsequent work
155
  based on our dataset, please be aware of this limitation and its ethical implications._
156
 
157
+ To add to the authors disclaimer.
158
 
159
+ _It's also worth noting that projects that trend on popular sites such as the GitHub Trending Page can attract a lot of automated behavior outside
160
+ of a project's control. This dataset is just a data point that shouldn't be used in a vacuum._
161
  """
162
  )
163