Update app.py
Browse files
app.py
CHANGED
@@ -28,7 +28,7 @@ For example, if visual elements is like this:
|
|
28 |
"An older man wearing a brown hat and glasses, looking off into the distance.
|
29 |
An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
|
30 |
An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
|
31 |
-
It does not mean there are 3 older men, but this is the same man. Because we have extracted
|
32 |
|
33 |
Audio events are actually the entire scene description based on the audio of the video.
|
34 |
|
|
|
28 |
"An older man wearing a brown hat and glasses, looking off into the distance.
|
29 |
An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
|
30 |
An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
|
31 |
+
It does not mean there are 3 older men, but it means this is the same man. Because we have extracted close frames from the video sequence.
|
32 |
|
33 |
Audio events are actually the entire scene description based on the audio of the video.
|
34 |
|