File size: 2,858 Bytes
024a632
1dff180
13d3de7
ccefd9b
a3605bb
024a632
baf3b69
024a632
 
 
13d3de7
024a632
 
13d3de7
 
 
 
115c7c8
13d3de7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ccefd9b
13d3de7
 
 
 
 
 
 
 
 
 
 
 
ccefd9b
 
 
 
 
 
 
 
 
 
13d3de7
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
title: TranscriptTool - smolagent Transcription Tool
emoji: 💬
colorFrom: green
colorTo: green
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: smolagent tool to transcribe audio & video files
---

# TranscriptTool: A SmolAgent Tool for Audio/Video Transcription

## Overview

`TranscriptTool` is a smolagent tool designed to transcribe audio and video files into text. Implementing OpenAI's Whisper and `ffmpeg`, this tool enables agents to process multimedia inputs efficiently. It supports robust file handling, including format conversion to WAV, dynamic device selection (CPU or GPU), and easy use within smolagents via the Hugging Face API.

The repository contains three main components:
- **`transcription_tool.py`**: The core smolagent tool for transcription.
- **`app.py`**: A Gradio-powered web app to test and use the tool interactively.
- **`example_smolagent.py`**: Toy demonstration of how the tool operates within a smolagent framework.

---

## Installation

1. Clone this repository:
   ```bash
   git clone https://huggingface.co/spaces/maguid28/TranscriptTool
   cd TranscriptTool
   ```
2. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```
   
---
## Usage

### Testing with Gradio (app.py)

To quickly test and use the transcription tool, run the provided Gradio app:
   ```bash
   python app.py
   ```

This launches a local Gradio interface. Upload an audio or video file to transcribe it directly.
### Running example SmolAgent (example_smolagent.py)

To see how TranscriptTool operates within a SmolAgent framework:
   ```bash
   python example_smolagent.py
   ```

### Access via Hugging Face API

The `TranscriptTool` is also available as a tool through the Hugging Face API.

#### How to Use the Tool via Hugging Face API (Currently not working)

1. **Install SmolAgents**  

   Ensure you have the SmolAgents library installed:
   ```bash
   pip install smolagents
    ```
2. **Load the Tool from the Hugging Face Hub**

You can load the tool directly using the Hugging Face API.

   ```python
   from smolagents import Tool
   transcript_tool = Tool.from_space(
    "maguid28/TranscriptTool",
    name="TranscriptTool",
    description="""
    A smolagent tool for transcribing audio and video files into text. This tool utilises Whisper for transcription 
    and ffmpeg for media conversion, enabling agents to process multimedia inputs into text. The tool supports robust 
    file handling, including format conversion to WAV and dynamic device selection for optimal performance.
    """
    )
   ```
---
## License
This project is licensed under the Apache-2.0 License. See the LICENSE file for more details. 

---
## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.