yuvarajareddy001's picture
Initial commit
cd218ce
|
raw
history blame
3.27 kB
metadata
title: Optimized Llm Log Classification
emoji: 😻
colorFrom: green
colorTo: pink
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false

Optimized Log Classification Using LLMs


A comprehensive framework for hybrid log classification that integrates multiple analytical techniques to effectively process and categorize log data. This system leverages different methods to handle simple, complex, and sparsely labeled log patterns.

Overview

This project combines three primary classification strategies:

  • Regex-based Classification Captures predictable patterns using predefined regular expressions.

  • Embedding-based Classification Uses Sentence Transformers to generate embeddings followed by Logistic Regression for nuanced pattern recognition.

  • LLM-assisted Classification Employs large language models to classify data when traditional methods struggle due to limited labeled samples.

System Architecture


Directory Structure

  • training/ Contains notebooks and scripts for training the models and experimenting with different approaches.

  • models/ Stores pre-trained models such as the logistic regression classifier and embedding models.

  • resources/ Holds auxiliary files like CSV datasets, output samples, and images.

  • Root Directory Includes the main API server (server.py) and the command-line classification utility (classify.py).


Installation & Setup

  1. Clone the Repository

    git clone <your_repository_url>
    
  2. Install Dependencies Ensure Python is installed and run:

    pip install -r requirements.txt
    
  3. Train the Model (if needed) Open and run the training notebook:

    jupyter notebook training/log_classification.ipynb
    
  4. Run the API Server Start the server using one of the following methods:

    • Direct execution:
      python server.py
      
    • With Uvicorn:
      uvicorn server:app --reload
      

    Access the API documentation at:

  5. Running the Streamlit App To start the Streamlit application for log classification:

    streamlit run app.py
    

    This command will launch the app in your browser at a URL like http://localhost:8501.


Usage Instructions

  • Input Data Upload a CSV file with the following columns:

    • source
    • log_message
  • Output The system processes the logs and returns a CSV file with an additional target_label column indicating the classification result.


Customization

Feel free to modify and extend the classification logic in the following modules:

  • processor_bert.py
  • processor_llm.py
  • processor_regex.py

These modules are designed to be flexible, allowing you to tailor the classification approaches to your specific needs.


Contributions

Contributions, feedback, and feature requests are welcome. Please open an issue or submit a pull request in your GitHub repository.