alibayram's picture
Add application file
26ddb6c
|
raw
history blame
2.85 kB

Turkish Tiktokenizer Web App

A Streamlit-based web interface for the Turkish Morphological Tokenizer. This app provides an interactive way to tokenize Turkish text with real-time visualization and color-coded token display.

Features

  • πŸ”€ Turkish text tokenization with morphological analysis
  • 🎨 Color-coded token visualization
  • πŸ”’ Token count and ID display
  • πŸ“Š Special token highlighting (uppercase, space, newline, etc.)
  • πŸ”„ Version selection from GitHub commit history
  • 🌐 Direct integration with GitHub repository

Demo

You can try the live demo at Hugging Face Spaces (Replace with your actual Spaces URL)

Installation

  1. Clone the repository:
git clone https://github.com/malibayram/tokenizer.git
cd tokenizer/streamlit_app
  1. Install dependencies:
pip install -r requirements.txt

Usage

  1. Run the Streamlit app:
streamlit run app.py
  1. Open your browser and navigate to http://localhost:8501

  2. Enter Turkish text in the input area and click "Tokenize"

How It Works

  1. Text Input: Enter Turkish text in the left panel
  2. Tokenization: Click the "Tokenize" button to process the text
  3. Visualization:
    • Token count is displayed at the top
    • Tokens are shown with color-coding:
      • Special tokens (uppercase, space, etc.) have predefined colors
      • Regular tokens get unique colors for easy identification
    • Token IDs are displayed below the visualization

Code Structure

  • app.py: Main Streamlit application
    • UI components and layout
    • GitHub integration
    • Tokenization logic
    • Color generation and visualization
  • requirements.txt: Python dependencies

Technical Details

  • Tokenizer Source: Fetched directly from GitHub repository
  • Caching: Uses Streamlit's caching for better performance
  • Color Generation: HSV-based algorithm for visually distinct colors
  • Session State: Maintains text and results between interactions
  • Error Handling: Graceful handling of GitHub API and tokenization errors

Deployment to Hugging Face Spaces

  1. Create a new Space:

  2. Upload files:

    • app.py
    • requirements.txt
  3. The app will automatically deploy and be available at your Space's URL

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

MIT License - see the LICENSE file for details

Acknowledgments