File size: 3,271 Bytes
53bfb09 cd218ce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
title: Optimized Llm Log Classification
emoji: 😻
colorFrom: green
colorTo: pink
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
---
# Optimized Log Classification Using LLMs
---
A comprehensive framework for hybrid log classification that integrates multiple analytical techniques to effectively process and categorize log data.
This system leverages different methods to handle simple, complex, and sparsely labeled log patterns.
---
## Overview
This project combines three primary classification strategies:
- **Regex-based Classification**
Captures predictable patterns using predefined regular expressions.
- **Embedding-based Classification**
Uses Sentence Transformers to generate embeddings followed by Logistic Regression for nuanced pattern recognition.
- **LLM-assisted Classification**
Employs large language models to classify data when traditional methods struggle due to limited labeled samples.

---
## Directory Structure
- **`training/`**
Contains notebooks and scripts for training the models and experimenting with different approaches.
- **`models/`**
Stores pre-trained models such as the logistic regression classifier and embedding models.
- **`resources/`**
Holds auxiliary files like CSV datasets, output samples, and images.
- **Root Directory**
Includes the main API server (`server.py`) and the command-line classification utility (`classify.py`).
---
## Installation & Setup
1. **Clone the Repository**
```bash
git clone <your_repository_url>
```
2. **Install Dependencies**
Ensure Python is installed and run:
```bash
pip install -r requirements.txt
```
3. **Train the Model (if needed)**
Open and run the training notebook:
```bash
jupyter notebook training/log_classification.ipynb
```
4. **Run the API Server**
Start the server using one of the following methods:
- Direct execution:
```bash
python server.py
```
- With Uvicorn:
```bash
uvicorn server:app --reload
```
Access the API documentation at:
- Main Endpoint: [http://127.0.0.1:8000/](http://127.0.0.1:8000/)
- Swagger UI: [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
- Redoc: [http://127.0.0.1:8000/redoc](http://127.0.0.1:8000/redoc)
5. **Running the Streamlit App**
To start the Streamlit application for log classification:
```bash
streamlit run app.py
```
This command will launch the app in your browser at a URL like http://localhost:8501.
---
## Usage Instructions
- **Input Data**
Upload a CSV file with the following columns:
- `source`
- `log_message`
- **Output**
The system processes the logs and returns a CSV file with an additional `target_label` column indicating the classification result.
---
## Customization
Feel free to modify and extend the classification logic in the following modules:
- `processor_bert.py`
- `processor_llm.py`
- `processor_regex.py`
These modules are designed to be flexible, allowing you to tailor the classification approaches to your specific needs.
---
## Contributions
Contributions, feedback, and feature requests are welcome.
Please open an issue or submit a pull request in your GitHub repository. |