Spaces:
Running
Running
# Turkish Tiktokenizer Web App | |
A Streamlit-based web interface for the Turkish Morphological Tokenizer. This app provides an interactive way to tokenize Turkish text with real-time visualization and color-coded token display. | |
## Features | |
- π€ Turkish text tokenization with morphological analysis | |
- π¨ Color-coded token visualization | |
- π’ Token count and ID display | |
- π Special token highlighting (uppercase, space, newline, etc.) | |
- π Version selection from GitHub commit history | |
- π Direct integration with GitHub repository | |
## Demo | |
You can try the live demo at [Hugging Face Spaces](https://huggingface.co/spaces/YOUR_USERNAME/turkish-tiktokenizer) (Replace with your actual Spaces URL) | |
## Installation | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/malibayram/tokenizer.git | |
cd tokenizer/streamlit_app | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Usage | |
1. Run the Streamlit app: | |
```bash | |
streamlit run app.py | |
``` | |
2. Open your browser and navigate to http://localhost:8501 | |
3. Enter Turkish text in the input area and click "Tokenize" | |
## How It Works | |
1. **Text Input**: Enter Turkish text in the left panel | |
2. **Tokenization**: Click the "Tokenize" button to process the text | |
3. **Visualization**: | |
- Token count is displayed at the top | |
- Tokens are shown with color-coding: | |
- Special tokens (uppercase, space, etc.) have predefined colors | |
- Regular tokens get unique colors for easy identification | |
- Token IDs are displayed below the visualization | |
## Code Structure | |
- `app.py`: Main Streamlit application | |
- UI components and layout | |
- GitHub integration | |
- Tokenization logic | |
- Color generation and visualization | |
- `requirements.txt`: Python dependencies | |
## Technical Details | |
- **Tokenizer Source**: Fetched directly from GitHub repository | |
- **Caching**: Uses Streamlit's caching for better performance | |
- **Color Generation**: HSV-based algorithm for visually distinct colors | |
- **Session State**: Maintains text and results between interactions | |
- **Error Handling**: Graceful handling of GitHub API and tokenization errors | |
## Deployment to Hugging Face Spaces | |
1. Create a new Space: | |
- Go to https://huggingface.co/spaces | |
- Click "Create new Space" | |
- Select "Streamlit" as the SDK | |
- Choose a name for your Space | |
2. Upload files: | |
- `app.py` | |
- `requirements.txt` | |
3. The app will automatically deploy and be available at your Space's URL | |
## Contributing | |
1. Fork the repository | |
2. Create your feature branch | |
3. Commit your changes | |
4. Push to the branch | |
5. Create a Pull Request | |
## License | |
MIT License - see the [LICENSE](../LICENSE) file for details | |
## Acknowledgments | |
- Built by dqbd | |
- Created with the generous help from Diagram | |
- Based on the [Turkish Morphological Tokenizer](https://github.com/malibayram/tokenizer) |