Spaces:
Sleeping
Sleeping
README.MD
Browse files- README.MD +86 -0
- docs/HR Training-Completed.drawio.svg +4 -0
- docs/HR Training-Simple.drawio.svg +4 -0
README.MD
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Polyhedron
|
2 |
+
|
3 |
+
Polyhedron is a voice chat application designed to enable real-time transcription and translation for training across language barriers.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
The app allows a trainer to conduct lessons in their native language, while trainees can receive instructions translated into their own languages.
|
7 |
+
|
8 |
+
## Key features:
|
9 |
+
|
10 |
+
Real-time voice transcription of the trainer's speech using Amazon Transcribe
|
11 |
+
Translates speech into the trainee's language using Amazon Translate
|
12 |
+
Displays translated text to trainees in real-time
|
13 |
+
Allows trainer to see transcription and repeat unclear sections
|
14 |
+
Facilitates training in multilingual organizations
|
15 |
+
Polyhedron uses WebSockets to stream audio and text between clients. The frontend is built with React and Vite.
|
16 |
+
|
17 |
+
The backend is developed in Rust using the Poem web framework with WebSockets support. It interfaces with AWS services for transcription, translation and text-to-speech.
|
18 |
+
|
19 |
+
Configuration like AWS credentials and models are specified in config.yaml.
|
20 |
+
|
21 |
+
## Getting Started
|
22 |
+
To run Polyhedron locally:
|
23 |
+
|
24 |
+
Clone the repository
|
25 |
+
Run `cargo run`
|
26 |
+
|
27 |
+
Open http://localhost:8080 in the browser
|
28 |
+
## Architecture
|
29 |
+
|
30 |
+

|
31 |
+
Polyhedron uses a broadcast model to share transcription, translation, and speech synthesis work between clients.
|
32 |
+
|
33 |
+
- A single transcription is generated for the speaker and shared with all language clients.
|
34 |
+
|
35 |
+
- The transcript is translated once per language and shared with clients of that language.
|
36 |
+
|
37 |
+
- Speech is synthesized once per voice and shared with clients selecting that voice.
|
38 |
+
|
39 |
+
This optimized architecture minimizes redundant work and cost:
|
40 |
+
|
41 |
+
- Automatic speech recognition (ASR) is done only once for the speaker and broadcast.
|
42 |
+
|
43 |
+
- Translation is done once per language from the shared transcript and broadcast.
|
44 |
+
|
45 |
+
- Text-to-speech (TTS) synthesis is done once per voice and broadcast.
|
46 |
+
|
47 |
+
By sharing the intermediate outputs, the system avoids duplicating work across clients. This allows serving many users efficiently and cost effectively.
|
48 |
+
|
49 |
+
The components communicate using WebSockets and channels to distribute the shared outputs.
|
50 |
+
|
51 |
+

|
52 |
+
The system architecture with a single listener can be summarized as:
|
53 |
+
|
54 |
+
- Speaker voice input ->
|
55 |
+
- ASR Transcription (English) ->
|
56 |
+
- Translation to Listener language ->
|
57 |
+
- TTS Synthesis in Listener language ->
|
58 |
+
- Voice output in Listener language
|
59 |
+
The speaker's voice is transcribed to text using ASR in the speaker's language (e.g. English).
|
60 |
+
|
61 |
+
The transcript is then translated to the listener's language.
|
62 |
+
|
63 |
+
Text-to-speech synthesis converts the translated text into a voice audio in the listener's language.
|
64 |
+
|
65 |
+
This synthesized voice audio is played out as output to the listener.
|
66 |
+
|
67 |
+
The architecture forms a linear pipeline from speaker voice input to listener voice output, with transcription, translation and synthesis steps in between.
|
68 |
+
|
69 |
+
## Directory Structure
|
70 |
+
|
71 |
+
- `src/`: Main Rust backend source code
|
72 |
+
- `main.rs`: Entry point and server definition
|
73 |
+
- `config.rs`: Configuration loading
|
74 |
+
- `lesson.rs`: Lesson management and audio streaming
|
75 |
+
- `whisper.rs`: Whisper ASR integration
|
76 |
+
- `group.rs`: Group management
|
77 |
+
- `static/`: Frontend JavaScript and assets
|
78 |
+
- `index.html`: Main HTML page
|
79 |
+
- `index.js`: React frontend code
|
80 |
+
- `recorderWorkletProcessor.js`: Audio recorder WebWorker
|
81 |
+
- `models/`: Whisper speech recognition models
|
82 |
+
- `config.yaml`: Server configuration
|
83 |
+
- `Cargo.toml`: Rust crate dependencies
|
84 |
+
|
85 |
+
## Contributing
|
86 |
+
Contributions welcome! Please open an issue or PR.
|
docs/HR Training-Completed.drawio.svg
ADDED
|
docs/HR Training-Simple.drawio.svg
ADDED
|