File size: 3,340 Bytes

---
language: "code"
license: "mit"
tags:
  - dockerfile
  - hadolint
  - binary-classification
  - codebert
model-index:
  - name: Binary Dockerfile Classifier
    results: []
---


# 🧱 Dockerfile Quality Classifier – Binary Model

This model predicts whether a given Dockerfile is:

- ✅ **GOOD** – clean and adheres to best practices (no top rule violations)
- ❌ **BAD** – violates at least one important rule (from Hadolint)

It is the first step in a full ML-based Dockerfile linter.

---

## 🧠 Model Overview

- **Architecture:** Fine-tuned `microsoft/codebert-base`
- **Task:** Binary classification (`good` vs `bad`)
- **Input:** Full Dockerfile content as plain text
- **Output:** `[prob_good, prob_bad]` — softmax scores
- **Max input length:** 512 tokens

---

## 📚 Training Details

- **Data source:** Real-world and synthetic Dockerfiles
- **Labels:** Based on [Hadolint](https://github.com/hadolint/hadolint) top 30 rules
- **Bad examples:** At least one rule violated
- **Good examples:** Fully clean files
- **Dataset balance:** 15000 BAD / 1500 GOOD (clean)

---

## 🧪 Evaluation Results

Evaluation on a held-out test set of 1,650 Dockerfiles:

| Class | Precision | Recall | F1-score | Support |
|-------|-----------|--------|----------|---------|
| good  | 0.96      | 0.91   | 0.93     | 150     |
| bad   | 0.99      | 1.00   | 0.99     | 1500    |
| **Accuracy** |       |        | **0.99** | 1650    |

---

## 🚀 Quick Start

### 🧪 Step 1 — Create test script

Save this as `test_binary_predict.py`:

```python
import sys
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from pathlib import Path

path = Path(sys.argv[1])
text = path.read_text(encoding="utf-8")

tokenizer = AutoTokenizer.from_pretrained("LeeSek/binary-dockerfile-model")
model = AutoModelForSequenceClassification.from_pretrained("LeeSek/binary-dockerfile-model")
model.eval()

inputs = tokenizer(text, return_tensors="pt", padding="max_length", truncation=True, max_length=512)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.nn.functional.softmax(logits, dim=1).squeeze()

label = "GOOD" if torch.argmax(probs).item() == 0 else "BAD"
print(f"Prediction: {label} — Probabilities: good={probs[0]:.3f}, bad={probs[1]:.3f}")
```

---

### 📄 Step 2 — Create good and bad Dockerfile

Good:

```docker
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "index.js"]
```

Bad:

```docker
FROM ubuntu:latest
RUN apt-get install python3
ADD . /app
WORKDIR /app
RUN pip install flask
CMD python3 app.py
```

---

### ▶️ Step 3 — Run the prediction

```bash
python test_binary_predict.py Dockerfile
```

Expected output:

```
Prediction: GOOD — Probabilities: good=0.998, bad=0.002
```

---

## 🗂 Extras

The full training and evaluation pipeline — including data preparation, training, validation, prediction — is available in the **`scripts/`** folder.

> 💬 **Note:** Scripts are written with **Polish comments and variable names** for clarity during local development. Logic is fully portable.

---

## 📘 License

MIT

---

## 🙌 Credits

- Model powered by [Hugging Face Transformers](https://huggingface.co/transformers)
- Tokenizer: CodeBERT
- Rule definitions: [Hadolint](https://github.com/hadolint/hadolint)