Spaces:

leuschnm
/

CrowdCounting-with-Scale-Adaptive-Selection-SASNet

Running

App Files Files Community

leuschnm commited on Jan 24, 2023

Commit

cc56f52

1 Parent(s): a555162

Update app.py

Browse files

Files changed (1) hide show

app.py +34 -30

app.py CHANGED Viewed

@@ -93,36 +93,39 @@ def predict(img):
 with gr.Blocks() as demo:
-    gr.Markdown("""
-    # Crowd Counting based on SASNet
-    <p>
-    This space implements crowd counting following the paper of Song et. al (2021). The model is a VGG16 base with MultiBranch-Channels. For more details see the official publication on AAAI.
-    Training data is the Shanghai-Tech A/B data set with Gaussian augmentation for density map creation. The data set annotates more than 300k people.
-    </p>
-    ## Abstract
-    <p>
-    In this paper, we address the large scale variation problem in crowd counting by taking full advantage of the multi-scale feature representations in a multi-level network. We
-    implement such an idea by keeping the counting error of a patch as small as possible with a proper feature level selection strategy, since a specific feature level tends to perform
-    better for a certain range of scales. However, without scale annotations, it is sub-optimal and error-prone to manually assign the predictions for heads of different scales to
-    specific feature levels. Therefore, we propose a Scale-Adaptive Selection Network (SASNet), which automatically learns the internal correspondence between the scales and the feature
-    levels. Instead of directly using the predictions from the most appropriate feature level as the final estimation, our SASNet also considers the predictions from other feature
-    levels via weighted average, which helps to mitigate the gap between discrete feature levels and continuous scale variation. Since the heads in a local patch share roughly a same
-    scale, we conduct the adaptive selection strategy in a patch-wise style. However, pixels within a patch contribute different counting errors due to the various difficulty degrees of
-    learning. Thus, we further propose a Pyramid Region Awareness Loss (PRA Loss) to recursively select the most hard sub-regions within a patch until reaching the pixel level. With
-    awareness of whether the parent patch is over-estimated or under-estimated, the fine-grained optimization with the PRA Loss for these region-aware hard pixels helps to alleviate the
-    inconsistency problem between training target and evaluation metric. The state-of-the-art results on four datasets demonstrate the superiority of our approach.
-    </p>
-    ## Demo
-    """)
     with gr.Row():
         with gr.Column():
             gr.Markdown(
-                """
                 Upload an image or use some of the example to let the model count your crowd. The estimated density map is plotted as well. Have fun!
                 Visit my [**github**](https://github.com/MalteLeuschner/CrowdCounting_SASNet) for more!
-                """
         with gr.Column():
             text_output = gr.Label()
     with gr.Row():
@@ -140,11 +143,12 @@ with gr.Blocks() as demo:
     gr.Examples(["IMG_1.jpg", "IMG_2.jpg", "IMG_3.jpg"], image_input)
-    gr.Markdown("""
-    ## References
-    The code will be available at: https://github.com/TencentYoutuResearch/CrowdCounting-SASNet.
-    Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., … Ma, J. (2021). To Choose or to Fuse? Scale Selection for Crowd Counting. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21).
     """)
     image_button.click(predict, inputs=image_input, outputs=[text_output, image_output])

 with gr.Blocks() as demo:
+    gr.Markdown(
+    """
+        # Crowd Counting based on SASNet
+        <p>
+        This space implements crowd counting following the paper of Song et. al (2021). The model is a VGG16 base with MultiBranch-Channels. For more details see the official publication on AAAI.
+        Training data is the Shanghai-Tech A/B data set with Gaussian augmentation for density map creation. The data set annotates more than 300k people.
+        </p>
+        ## Abstract
+        <p>
+        In this paper, we address the large scale variation problem in crowd counting by taking full advantage of the multi-scale feature representations in a multi-level network. We
+        implement such an idea by keeping the counting error of a patch as small as possible with a proper feature level selection strategy, since a specific feature level tends to perform
+        better for a certain range of scales. However, without scale annotations, it is sub-optimal and error-prone to manually assign the predictions for heads of different scales to
+        specific feature levels. Therefore, we propose a Scale-Adaptive Selection Network (SASNet), which automatically learns the internal correspondence between the scales and the feature
+        levels. Instead of directly using the predictions from the most appropriate feature level as the final estimation, our SASNet also considers the predictions from other feature
+        levels via weighted average, which helps to mitigate the gap between discrete feature levels and continuous scale variation. Since the heads in a local patch share roughly a same
+        scale, we conduct the adaptive selection strategy in a patch-wise style. However, pixels within a patch contribute different counting errors due to the various difficulty degrees of
+        learning. Thus, we further propose a Pyramid Region Awareness Loss (PRA Loss) to recursively select the most hard sub-regions within a patch until reaching the pixel level. With
+        awareness of whether the parent patch is over-estimated or under-estimated, the fine-grained optimization with the PRA Loss for these region-aware hard pixels helps to alleviate the
+        inconsistency problem between training target and evaluation metric. The state-of-the-art results on four datasets demonstrate the superiority of our approach.
+        </p>
+        ## Demo
+    """
+    )
     with gr.Row():
         with gr.Column():
             gr.Markdown(
+            """
                 Upload an image or use some of the example to let the model count your crowd. The estimated density map is plotted as well. Have fun!
                 Visit my [**github**](https://github.com/MalteLeuschner/CrowdCounting_SASNet) for more!
+            """
+            )
         with gr.Column():
             text_output = gr.Label()
     with gr.Row():
     gr.Examples(["IMG_1.jpg", "IMG_2.jpg", "IMG_3.jpg"], image_input)
+    gr.Markdown(
+    """
+        ## References
+        The code will be available at: https://github.com/TencentYoutuResearch/CrowdCounting-SASNet.
+        Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., … Ma, J. (2021). To Choose or to Fuse? Scale Selection for Crowd Counting. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21).
     """)
     image_button.click(predict, inputs=image_input, outputs=[text_output, image_output])