Spaces:

fffiloni
/

soft-video-understanding

Paused

fffiloni commited on Mar 7, 2024

Commit

7ab5a05

verified ·

1 Parent(s): cd2500a

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -29,6 +29,8 @@ For example, if visual elements is like this:
  An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
  An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
 It does not mean there are 3 older men, but it means this is the same man. Because we have extracted close frames from the video sequence.
 Audio events are actually the entire scene description based on the audio of the video.

  An older man wearing a brown hat and glasses, with a beard and a mustache, is looking directly at the camera.
  An older man wearing a brown hat and glasses, with a beard and a beard on his chin, is looking at the camera."
 It does not mean there are 3 older men, but it means this is the same man. Because we have extracted close frames from the video sequence.
+So with his in mind, what you must understand from this list is actually :
+"The older man wearing a brown hat and glasses, with a beard is doing some stuff"
 Audio events are actually the entire scene description based on the audio of the video.