π§± Dockerfile Quality Classifier β Multilabel Model
This model predicts which rules are violated in a given Dockerfile. It is a multilabel classifier trained to detect violations of the top 30 most frequent rules from Hadolint.
π§ Model Overview
- Architecture: Fine-tuned
microsoft/codebert-base
- Task: Multi-label classification (30 labels)
- Input: Full Dockerfile content as plain text
- Output: For each rule β probability of violation
- Max input length: 512 tokens
- Threshold: 0.5 (configurable)
π Training Details
- Total training files: ~15,000 Dockerfiles with at least one rule violation
- Per-rule cap: Max 2,000 files per rule to avoid imbalance
- Perfect (clean) files: ~1,500 examples with no Hadolint violations
- Label source: Hadolint output (top 30 rules only)
- One-hot labels:
[1, 0, 0, 1, ...]
for 30 rules
π§ͺ Evaluation Snapshot
Evaluation on 6,873 labeled examples:
Metric | Value |
---|---|
Micro avg F1 | 0.97 |
Macro avg F1 | 0.95 |
Weighted avg F1 | 0.97 |
Samples avg F1 | 0.97 |
More metrics available in classification_report.csv
π Quick Start
π§ͺ Step 1 β Create test script
Save as test_multilabel_predict.py
:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path
import numpy as np
import json
import sys
MODEL_DIR = "LeeSek/multilabel-dockerfile-model"
TOP_RULES_PATH = "top_rules.json"
THRESHOLD = 0.5
def main():
if len(sys.argv) < 2:
print("Usage: python test_multilabel_predict.py Dockerfile [--debug]")
return
debug = "--debug" in sys.argv
file_path = Path(sys.argv[1])
if not file_path.exists():
print(f"File {file_path} not found.")
return
labels = json.load(open(TOP_RULES_PATH))
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
model.eval()
text = file_path.read_text(encoding="utf-8")
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.sigmoid(logits).squeeze().cpu().numpy()
triggered = [(labels[i], probs[i]) for i in range(len(labels)) if probs[i] > THRESHOLD]
top5 = np.argsort(probs)[-5:][::-1]
print(f"\nπ§ͺ Prediction for file: {file_path.name}")
print(f"π Lines in file: {len(text.splitlines())}")
if triggered:
print(f"\nπ¨ Detected violations (p > {THRESHOLD}):")
for rule, p in triggered:
print(f" - {rule}: {p:.3f}")
else:
print("β
No violations detected.")
if debug:
print("\nπ DEBUG INFO:")
print(f"π Text snippet:\n{text[:300]}")
print(f"π’ Token count: {len(inputs['input_ids'][0])}")
print(f"π Logits: {logits.squeeze().tolist()}")
print("\nπ₯ Top 5 predictions:")
for idx in top5:
print(f" - {labels[idx]}: {probs[idx]:.3f}")
if __name__ == "__main__":
main()
Make sure top_rules.json
is available next to the script.
π Step 2 β Create good and bad Dockerfile
Good:
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "index.js"]
Bad:
FROM ubuntu:latest
RUN apt-get install python3
ADD . /app
WORKDIR /app
RUN pip install flask
CMD python3 app.py
βΆοΈ Step 3 β Run the script
python test_multilabel_predict.py Dockerfile --debug
π Extras
The full training and evaluation pipeline β including data preparation, training, validation, prediction, and threshold calibration β is available in the scripts/
folder.
π¬ Note: Scripts are written with Polish comments and variable names for clarity during local development. Logic is fully portable.
π License
MIT
π Credits
- Based on Hadolint
- Powered by Hugging Face Transformers
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support