Spaces:
Build error
Build error
README.MD
Browse files- README.MD +86 -0
- docs/HR Training-Completed.drawio.svg +4 -0
- docs/HR Training-Simple.drawio.svg +4 -0
README.MD
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Polyhedron
|
| 2 |
+
|
| 3 |
+
Polyhedron is a voice chat application designed to enable real-time transcription and translation for training across language barriers.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
The app allows a trainer to conduct lessons in their native language, while trainees can receive instructions translated into their own languages.
|
| 7 |
+
|
| 8 |
+
## Key features:
|
| 9 |
+
|
| 10 |
+
Real-time voice transcription of the trainer's speech using Amazon Transcribe
|
| 11 |
+
Translates speech into the trainee's language using Amazon Translate
|
| 12 |
+
Displays translated text to trainees in real-time
|
| 13 |
+
Allows trainer to see transcription and repeat unclear sections
|
| 14 |
+
Facilitates training in multilingual organizations
|
| 15 |
+
Polyhedron uses WebSockets to stream audio and text between clients. The frontend is built with React and Vite.
|
| 16 |
+
|
| 17 |
+
The backend is developed in Rust using the Poem web framework with WebSockets support. It interfaces with AWS services for transcription, translation and text-to-speech.
|
| 18 |
+
|
| 19 |
+
Configuration like AWS credentials and models are specified in config.yaml.
|
| 20 |
+
|
| 21 |
+
## Getting Started
|
| 22 |
+
To run Polyhedron locally:
|
| 23 |
+
|
| 24 |
+
Clone the repository
|
| 25 |
+
Run `cargo run`
|
| 26 |
+
|
| 27 |
+
Open http://localhost:8080 in the browser
|
| 28 |
+
## Architecture
|
| 29 |
+
|
| 30 |
+

|
| 31 |
+
Polyhedron uses a broadcast model to share transcription, translation, and speech synthesis work between clients.
|
| 32 |
+
|
| 33 |
+
- A single transcription is generated for the speaker and shared with all language clients.
|
| 34 |
+
|
| 35 |
+
- The transcript is translated once per language and shared with clients of that language.
|
| 36 |
+
|
| 37 |
+
- Speech is synthesized once per voice and shared with clients selecting that voice.
|
| 38 |
+
|
| 39 |
+
This optimized architecture minimizes redundant work and cost:
|
| 40 |
+
|
| 41 |
+
- Automatic speech recognition (ASR) is done only once for the speaker and broadcast.
|
| 42 |
+
|
| 43 |
+
- Translation is done once per language from the shared transcript and broadcast.
|
| 44 |
+
|
| 45 |
+
- Text-to-speech (TTS) synthesis is done once per voice and broadcast.
|
| 46 |
+
|
| 47 |
+
By sharing the intermediate outputs, the system avoids duplicating work across clients. This allows serving many users efficiently and cost effectively.
|
| 48 |
+
|
| 49 |
+
The components communicate using WebSockets and channels to distribute the shared outputs.
|
| 50 |
+
|
| 51 |
+

|
| 52 |
+
The system architecture with a single listener can be summarized as:
|
| 53 |
+
|
| 54 |
+
- Speaker voice input ->
|
| 55 |
+
- ASR Transcription (English) ->
|
| 56 |
+
- Translation to Listener language ->
|
| 57 |
+
- TTS Synthesis in Listener language ->
|
| 58 |
+
- Voice output in Listener language
|
| 59 |
+
The speaker's voice is transcribed to text using ASR in the speaker's language (e.g. English).
|
| 60 |
+
|
| 61 |
+
The transcript is then translated to the listener's language.
|
| 62 |
+
|
| 63 |
+
Text-to-speech synthesis converts the translated text into a voice audio in the listener's language.
|
| 64 |
+
|
| 65 |
+
This synthesized voice audio is played out as output to the listener.
|
| 66 |
+
|
| 67 |
+
The architecture forms a linear pipeline from speaker voice input to listener voice output, with transcription, translation and synthesis steps in between.
|
| 68 |
+
|
| 69 |
+
## Directory Structure
|
| 70 |
+
|
| 71 |
+
- `src/`: Main Rust backend source code
|
| 72 |
+
- `main.rs`: Entry point and server definition
|
| 73 |
+
- `config.rs`: Configuration loading
|
| 74 |
+
- `lesson.rs`: Lesson management and audio streaming
|
| 75 |
+
- `whisper.rs`: Whisper ASR integration
|
| 76 |
+
- `group.rs`: Group management
|
| 77 |
+
- `static/`: Frontend JavaScript and assets
|
| 78 |
+
- `index.html`: Main HTML page
|
| 79 |
+
- `index.js`: React frontend code
|
| 80 |
+
- `recorderWorkletProcessor.js`: Audio recorder WebWorker
|
| 81 |
+
- `models/`: Whisper speech recognition models
|
| 82 |
+
- `config.yaml`: Server configuration
|
| 83 |
+
- `Cargo.toml`: Rust crate dependencies
|
| 84 |
+
|
| 85 |
+
## Contributing
|
| 86 |
+
Contributions welcome! Please open an issue or PR.
|
docs/HR Training-Completed.drawio.svg
ADDED
|
|
docs/HR Training-Simple.drawio.svg
ADDED
|
|