Spaces:

davidmezzetti
/

analyzestars

Running

App Files Files Community

davidmezzetti commited on Dec 20, 2024

Commit

6cf5129

verified ·

1 Parent(s): 595da74

Update app.py

Browse files

Files changed (1) hide show

app.py +8 -4

app.py CHANGED Viewed

@@ -54,6 +54,7 @@ class Application:
         # Calculate stat columns
         data["n_flagged_percent"] = 100 * (data["n_stars_flagged"] / data["n_stars"])
         data.columns = ["repo", "month", "clustered", "low activity", "total stars", "flagged stars", "flagged %"]
         return data[["repo", "month", "clustered", "low activity", "flagged stars", "total stars", "flagged %"]]
@@ -100,6 +101,9 @@ class Application:
     def parse(self, repos):
         """
         Parses and cleans the input repos string.
         """
         outputs = []
@@ -143,17 +147,17 @@ _4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Co
 _[Paper](https://arxiv.org/abs/2412.13459) | [GitHub Project](https://github.com/hehao98/StarScout)_
-Note the disclaimer from the paper's author's.
 **Disclaimer**. _As we discussed in Section 3.4 and 3.5 in our paper, the resulting dataset are only repositories and users with suspected
 fake stars. The individual repositories and users in our dataset may be false positives. The main purpose of our dataset is for statistical
 analyses (which tolerates noises reasonably well), not for publicly shaming individual repositories. If you intend to publish subsequent work
 based on our dataset, please be aware of this limitation and its ethical implications._
-To add to the author's disclaimer.
-_It's also worth noting that projects that trend on popular sites and the GitHub trending page tend to attract lots of automated behavior.
-This is just a data point that shouldn't be used in a vacuum._
 """
     )

         # Calculate stat columns
         data["n_flagged_percent"] = 100 * (data["n_stars_flagged"] / data["n_stars"])
+        # Rename and organize columns
         data.columns = ["repo", "month", "clustered", "low activity", "total stars", "flagged stars", "flagged %"]
         return data[["repo", "month", "clustered", "low activity", "flagged stars", "total stars", "flagged %"]]
     def parse(self, repos):
         """
         Parses and cleans the input repos string.
+        Returns:
+            list of repos
         """
         outputs = []
 _[Paper](https://arxiv.org/abs/2412.13459) | [GitHub Project](https://github.com/hehao98/StarScout)_
+Note the disclaimer from the paper's authors.
 **Disclaimer**. _As we discussed in Section 3.4 and 3.5 in our paper, the resulting dataset are only repositories and users with suspected
 fake stars. The individual repositories and users in our dataset may be false positives. The main purpose of our dataset is for statistical
 analyses (which tolerates noises reasonably well), not for publicly shaming individual repositories. If you intend to publish subsequent work
 based on our dataset, please be aware of this limitation and its ethical implications._
+To add to the authors disclaimer.
+_It's also worth noting that projects that trend on popular sites such as the GitHub Trending Page can attract a lot of automated behavior outside
+of a project's control. This dataset is just a data point that shouldn't be used in a vacuum._
 """
     )