rahulm-selector
/

log-classifier-BERT-v1

Text Classification

log-classification

Model card Files Files and versions Community

log-classifier-BERT-v1 / README.md

rahulm-selector's picture

rahulm-selector

Update README.md

28cc6cd verified 10 months ago

|

3.12 kB

	---
	language: en
	library_name: LogClassifier
	tags:
	- log-classification
	- log feature
	- log-similarity
	- transformers
	- AIOps
	pipeline_tag: text-classification
	---


	# log-classifier-BERT-v1
	log-classifier-v1 is a neural network-based log classification model, trained from BERTForSequenceClassification designed for use in network and device log mining tasks.
	Developed by [Selector AI](https://www.selector.ai/)

	## Model Usage
	```python
	from transformers import BertForSequenceClassification, BertTokenizer

	# Step 1: Load the model and tokenizer from Hugging Face
	model = BertForSequenceClassification.from_pretrained("rahulm-selector/log-classifier-BERT-v1")
	tokenizer = BertTokenizer.from_pretrained("rahulm-selector/log-classifier-BERT-v1")

	import torch

	model.eval()

	# Step 2: Prepare the input data (Example log text)
	log_text = "Error occurred while accessing the database."

	# Tokenize the input data
	inputs = tokenizer(log_text, return_tensors="pt", padding=True, truncation=True, max_length=128)

	# Step 3: Make predictions
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits

	# Step 4: Get the predicted class (the class with the highest score)
	predicted_class = torch.argmax(logits, dim=1).item()

	# label mapping (can load from JSON file in repo or config)
	label_mapping = model.config.id2label

	# Step 5: Get the event name
	predicted_event = label_mapping[predicted_class]
	print(f"Predicted Event: {predicted_event}")
	```

	## Background

	The model focuses on structured and semi-structured log data, outputing around 60 different event categories. It is highly effective
	for real-time log analysis, anomaly detection, and operational monitoring, helping organizations manage
	large-scale network data by automatically classifying logs into predefined categories, facilitating faster
	and more accurate diagnosis of network issues. The log-classifier-BERT-v1 model is designed to process logs as
	input and output a corresponding classification.

	## Intended uses

	Our model is intended to be used as classifier. Given an input text (a log coming from a network/device), it outputs the corresponding event most associated with the log.
	The possible events that can be classified are shown in [encoder.json](https://huggingface.co/rahulm-selector/log-classifier-v1/blob/main/encoder.json)


	## Training Details

	### Data

	The model was trained on log data sourced from various network and infrastructure devices,
	capturing crucial system events and performance metrics. Syslogs originated from network routers,
	switches, firewalls, and servers, providing a rich dataset of operational insights including security events,
	traffic patterns, and hardware health statuses.

	### Train/Test Split

	- Train Data Size: `~80K Logs`
	- Test Data Size: `~20K Logs`

	#### Hyper Parameters

	The following hyperparameters were used during training to optimize the model's performance:

	- Batch Size: `32`
	- Learning Rate: `.001`
	- Optimizer: `Adam`
	- Epochs: `10`
	- Dropout Rate: N/A
	- LSTM Hidden Dimension: `384`
	- Embedding Dimension: `384`