Spaces:
Runtime error
Runtime error
Commit
·
5e05897
1
Parent(s):
bf34015
Update README.md to remove outdated introductory content and add a link to the submission video, streamlining the overview of ScouterAI.
Browse files
README.md
CHANGED
|
@@ -14,32 +14,4 @@ short_description: The agent using over 9000 vision models from the HF Hub.
|
|
| 14 |
|
| 15 |
# ScouterAI - The Vision enhanced Agent
|
| 16 |
|
| 17 |
-
|
| 18 |
-
This app falls under the track 3 : Agentic Demo.
|
| 19 |
-
The goal of the app is to demonstrate the capabilities of agentic llm's combined with more "traditional" deep learning computer vision.
|
| 20 |
-
LLM's (and VLM's) are great models when it comes to interacting with the user and understanding its queries but are not (yet) capable of a precise perception of the images presented to them.
|
| 21 |
-
Computer Vision models like object detection or image segmentation models are tailored models to accomplish these tasks but require some engineering to wrap them and be user ready.
|
| 22 |
-
The idea of the agentic demo is to provide powerful LLM with access to expert vision models like object detection or image segmentation models.
|
| 23 |
-
The agent can fulfill precise perception task on any object present in the image : detection, location, classification, masking, counting, etc...
|
| 24 |
-
|
| 25 |
-
## Overview
|
| 26 |
-
|
| 27 |
-
In this preliminary app, the agent is a CodeAgent provided by the smolagents framework.
|
| 28 |
-
Its interface consists of a chat interface with example and a gallery which is used to display the agent's work.
|
| 29 |
-
The agent is provided with a set of tools :
|
| 30 |
-
- Task model retriever : a RAG tool which, given a task (object-detection or image-segmentation) and a query (car e.g.), returns a list of models with their model id and the list of classes it is capable of detecting/segmenting. The list if based on a curated dataset of all the models available on the HuggingFace Hub, returns the mo
|
| 31 |
-
- Computer vision models : Any object detection and image segmentation models available of HuggingFace
|
| 32 |
-
- Image processing functions : Resizing, cropping, ...
|
| 33 |
-
- Image annotation functions : Label, bounding box and mask annotators
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
To complete a user request
|
| 38 |
-
|
| 39 |
-
## Use-cases
|
| 40 |
-
|
| 41 |
-
## Stack
|
| 42 |
-
|
| 43 |
-
Agent framework : smolagents
|
| 44 |
-
LLM : Anthropic
|
| 45 |
-
Compute : Modal
|
|
|
|
| 14 |
|
| 15 |
# ScouterAI - The Vision enhanced Agent
|
| 16 |
|
| 17 |
+
[Submission video](https://youtu.be/FD8sZTjF5_4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|