Update app.py
Browse files
app.py
CHANGED
@@ -29,6 +29,8 @@ For example, if visual elements is like this:
|
|
29 |
An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
|
30 |
An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
|
31 |
It does not mean there are 3 older men, but it means this is the same man. Because we have extracted close frames from the video sequence.
|
|
|
|
|
32 |
|
33 |
Audio events are actually the entire scene description based on the audio of the video.
|
34 |
|
|
|
29 |
An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
|
30 |
An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
|
31 |
It does not mean there are 3 older men, but it means this is the same man. Because we have extracted close frames from the video sequence.
|
32 |
+
So with his in mind, what you must understand from this list is actually :
|
33 |
+
"The older man wearing a brown hat and glasses, with a beard is doing some stuff"
|
34 |
|
35 |
Audio events are actually the entire scene description based on the audio of the video.
|
36 |
|