Spaces:
Running
Running
Turkish Tiktokenizer Web App
A Streamlit-based web interface for the Turkish Morphological Tokenizer. This app provides an interactive way to tokenize Turkish text with real-time visualization and color-coded token display.
Features
- π€ Turkish text tokenization with morphological analysis
- π¨ Color-coded token visualization
- π’ Token count and ID display
- π Special token highlighting (uppercase, space, newline, etc.)
- π Version selection from GitHub commit history
- π Direct integration with GitHub repository
Demo
You can try the live demo at Hugging Face Spaces (Replace with your actual Spaces URL)
Installation
- Clone the repository:
git clone https://github.com/malibayram/tokenizer.git
cd tokenizer/streamlit_app
- Install dependencies:
pip install -r requirements.txt
Usage
- Run the Streamlit app:
streamlit run app.py
Open your browser and navigate to http://localhost:8501
Enter Turkish text in the input area and click "Tokenize"
How It Works
- Text Input: Enter Turkish text in the left panel
- Tokenization: Click the "Tokenize" button to process the text
- Visualization:
- Token count is displayed at the top
- Tokens are shown with color-coding:
- Special tokens (uppercase, space, etc.) have predefined colors
- Regular tokens get unique colors for easy identification
- Token IDs are displayed below the visualization
Code Structure
app.py
: Main Streamlit application- UI components and layout
- GitHub integration
- Tokenization logic
- Color generation and visualization
requirements.txt
: Python dependencies
Technical Details
- Tokenizer Source: Fetched directly from GitHub repository
- Caching: Uses Streamlit's caching for better performance
- Color Generation: HSV-based algorithm for visually distinct colors
- Session State: Maintains text and results between interactions
- Error Handling: Graceful handling of GitHub API and tokenization errors
Deployment to Hugging Face Spaces
Create a new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "Streamlit" as the SDK
- Choose a name for your Space
Upload files:
app.py
requirements.txt
The app will automatically deploy and be available at your Space's URL
Contributing
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
License
MIT License - see the LICENSE file for details
Acknowledgments
- Built by dqbd
- Created with the generous help from Diagram
- Based on the Turkish Morphological Tokenizer