quandao92's picture
Update README.md
75f40a9 verified
---
datasets:
- quandao92/ad-clip-dataset
metrics:
- f1
base_model:
- openai/clip-vit-base-patch32
---
<div style='text-align: center; font-size: 28px; font-weight: bold'>CLIP ๊ธฐ๋ฐ˜ ์ œํ’ˆ ๊ฒฐํ•จ ํƒ์ง€ ๋ชจ๋ธ ์นด๋“œ</div>
## ๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ
### ๋ชจ๋ธ ์„ค๋ช…
AnomalyCLIP์€ ํŠน์ • ๊ฐ์ฒด์— ์˜์กดํ•˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‚ด์˜ ์ „๊ฒฝ ๊ฐ์ฒด์™€ ์ƒ๊ด€์—†์ด ์ผ๋ฐ˜์ ์ธ ์ •์ƒ ๋ฐ ๋น„์ •์ƒ ํŒจํ„ด์„ ํฌ์ฐฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
์ด ๋ชจ๋ธ์€ CLIP ๊ธฐ๋ฐ˜ ์ด์ƒ ํƒ์ง€ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ œํ’ˆ ๊ฒฐํ•จ์„ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.
์‚ฌ์ „ ํ•™์Šต๋œ CLIP ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹(Fine-tuning)ํ•˜์—ฌ ์ œํ’ˆ ์ด๋ฏธ์ง€์—์„œ ๊ฒฐํ•จ์„ ์‹๋ณ„ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ƒ์‚ฐ ๋ผ์ธ์˜ ํ’ˆ์งˆ ๊ด€๋ฆฌ ๋ฐ ๊ฒฐํ•จ ํƒ์ง€ ์ž‘์—…์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- **Developed by:** ์œค์„๋ฏผ
- **Funded by:** SOLUWINS Co., Ltd. (์†”๋ฃจ์œˆ์Šค)
- **Referenced by:** zhou2023 anomalyclip [[github](https://github.com/zqhang/AnomalyCLIP.git)]
- **Model type:** CLIP (Contrastive Language-Image Pretraining) - Domain-Agnostic Prompt Learning Model
- **Language(s):** Python
- **License:** Apache 2.0, MIT, OpenAI
### ๊ธฐ์ˆ ์  ์ œํ•œ์‚ฌํ•ญ
- ๋ชจ๋ธ์€ ๊ฒฐํ•จ ํƒ์ง€๋ฅผ ์œ„ํ•œ ์ถฉ๋ถ„ํ•˜๊ณ  ๋‹ค์–‘ํ•œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์ด ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜ ๋ถˆ๊ท ํ˜•ํ•  ๊ฒฝ์šฐ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ์‹ค์‹œ๊ฐ„ ๊ฒฐํ•จ ๊ฐ์ง€ ์„ฑ๋Šฅ์€ ํ•˜๋“œ์›จ์–ด ์‚ฌ์–‘์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋†’์€ ํ•ด์ƒ๋„์—์„œ ๊ฒฐํ•จ์„ ํƒ์ง€ํ•˜๋Š” ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๊ฒฐํ•จ์ด ๋ฏธ์„ธํ•˜๊ฑฐ๋‚˜ ์ œํ’ˆ ๊ฐ„ ์œ ์‚ฌ์„ฑ์ด ๋งค์šฐ ๋†’์€ ๊ฒฝ์šฐ, ๋ชจ๋ธ์ด ๊ฒฐํ•จ์„ ์ •ํ™•ํ•˜๊ฒŒ ํƒ์ง€ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## ํ•™์Šต ์„ธ๋ถ€์‚ฌํ•ญ
### Hardware
- **CPU:** Intel Core i9-13900K (24 Cores, 32 Threads)
- **RAM:** 64GB DDR5
- **GPU:** NVIDIA RTX 4090Ti 24GB
- **Storage:** 1TB NVMe SSD + 2TB HDD
### Software
- **OS:** Windows 11 64 bit/ Ubuntu 20.04LTS
- **Python:** 3.8 (anaconda)
- **PyTorch:** 1.9.0
- **OpenCV:** 4.5.3
- **Cuda Toolkit:** 11.8
- **CudDNN:** 9.3.0.75 for cuda11
### ๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด
์ด ๋ชจ๋ธ์€ ์ œํ’ˆ์˜ ์ •์ƒ ์ด๋ฏธ์ง€์™€ ๊ฒฐํ•จ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค.
์ด ๋ฐ์ดํ„ฐ๋Š” ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€, ๊ฒฐํ•จ ์˜์—ญ์— ๋Œ€ํ•œ ground truth ์ •๋ณด, ๊ทธ๋ฆฌ๊ณ  ๊ธฐํƒ€ ๊ด€๋ จ ํŠน์„ฑ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€๋Š” CLIP ๋ชจ๋ธ์˜ ์ž…๋ ฅ ํ˜•์‹์— ์ ํ•ฉํ•˜๋„๋ก ์ „์ฒ˜๋ฆฌ๋˜๋ฉฐ, ๊ฒฐํ•จ ์˜์—ญ์˜ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ground truth ๋งˆํ‚น์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
- **๋ฐ์ดํ„ฐ ์†Œ์Šค:** https://huggingface.co/datasets/quandao92/ad-clip-dataset
- **๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์žฅ๋น„:**
- ์ˆ˜์ง‘ H/W: jetson orin nano 8GB
- ์นด๋ฉ”๋ผ: BFS-U3-89S6C Color Camera
- ๋ Œ์ฆˆ: 8mm Fiexd Focal Length Lens
- ์กฐ๋ช…: LIDLA-120070
- ๋ฐ์ดํ„ฐ ํ˜•์‹: .bpm, .jpg
- **๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๊ด€๋ฆฌ:**
- **1์ฐจ : 20240910_V0_๊ฐ„์ด ํ™˜๊ฒฝ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘**
๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
- V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›๋ณธ: 7ea
- V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜: 4ea/3ea
- V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ•_45/90/135๋„๋กœ ํšŒ์ „_28ea
<div style="text-align: center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6kvzgbH81jJrHJECaEspY.png" height="500" width="100%">
<p>Ground Truth Marking</p>
</div>
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_fkcI52_BTcqvQyrJ4EXl.png" height="80%" width="90%" style="margin-right:5px;">
<p>PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”</p>
</div>
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/biaWPJtbm6iwNf7ZqnW5O.png" height="80%" width="90%" style="margin-right:5px;">
<p>Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ</p>
</div>
</div>
- **2์ฐจ : 20240920_V1_ํ•˜์šฐ์ง• ๋‚ด ์ด๋ฏธ์ง€ ์ˆ˜์ง‘**
๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
- V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›๋ณธ: 16ea
- V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜: 14ea/2ea
- V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ•__64ea
<div style="text-align: center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/YsP7UwejFabUFp2Im0xWj.png" height="500" width="100%">
<p>Ground Truth Marking</p>
</div>
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/CNFdse5mHQY1KkMb5BYpb.png" height="80%" width="90%" style="margin-right:5px;">
<p>PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”</p>
</div>
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/nRO00DJFT0-B1EJYf8lzK.png" height="80%" width="90%" style="margin-right:5px;">
<p>Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ</p>
</div>
</div>
- **3์ฐจ : 20241002_V2_์„ค๋น„ ๋‚ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘**
๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
- V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ์ด๋ฏธ์ง€ ์ˆ˜์ง‘_49๊ฐœ
- V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜ ์ˆ˜ํ–‰_error/normal
- V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ• ์ˆ˜ํ–‰_์ด๋ฏธ์ง€ ํšŒ์ „์„ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๊ฐœ์ˆ˜ 102๊ฐœ
<div style="text-align: center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/MFyVWaqr4GDNs8W2mWzGZ.png" height="500" width="100%">
<p>Ground Truth Marking</p>
</div>
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/Kc3EMbY05frUFQh5HbVHn.png" height="80%" width="90%" style="margin-right:5px;">
<p>PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”</p>
</div>
<div style="text-align: center; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/SP4R5LjGo2M1Zvby1Bar_.png" height="80%" width="90%" style="margin-right:5px;">
<p>Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ</p>
</div>
</div>
- **Data Configuration:**
- **์ด๋ฏธ์ง€ ํฌ๊ธฐ ์กฐ์ • ๋ฐ ์ •๊ทœํ™”:**
- ์ด๋ฏธ์ง€๋Š” ์ผ์ •ํ•œ ํฌ๊ธฐ(์˜ˆ: 518x518)๋กœ ๋ฆฌ์‚ฌ์ด์ฆˆ๋˜๋ฉฐ, CLIP ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์ ํ•ฉํ•˜๊ฒŒ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.
- ์ •๊ทœํ™”๋ฅผ ํ†ตํ•ด ํ”ฝ์…€ ๊ฐ’์„ [0, 1] ๋ฒ”์œ„๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
- **Ground Truth ๋งˆํ‚น:**
- ๊ฒฐํ•จ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฒฐํ•จ ์˜์—ญ์„ bounding box ํ˜•์‹ ๋˜๋Š” binary mask๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.
- ๋งˆํ‚น๋œ ๋ฐ์ดํ„ฐ๋ฅผ JSON ๋˜๋Š” CSV ํ˜•์‹์œผ๋กœ ์ €์žฅํ•˜์—ฌ ๋ชจ๋ธ ํ‰๊ฐ€ ์‹œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
<div style="text-align: center;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/k8GQgaTK7JfQExNpCYpzz.png" height="500" width="100%" style="margin-right:5px;">
<p>Ground Truth Marking</p>
</div>
- **๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜:**
- Normal: ๊ฒฐํ•จ์ด ์—†๋Š” ์ •์ƒ ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€.
- Error: ๊ฒฐํ•จ์ด ์žˆ๋Š” ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€. ๊ฒฐํ•จ ์œ„์น˜์™€ ๊ด€๋ จ ์ •๋ณด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
<div style="display: flex;justify-content: space-between;">
<div style="text-align: center;margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/5pGwZ-sptjWjf7WpHifyJ.jpeg" height="400" width="450"">
</div>
<div style="text-align: center;justify-content: space-between; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/3iihck7VfkXKw9VcIl06x.jpeg" height="400" width="450"">
</div>
<div style="text-align: center;justify-content: space-between;margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/tjsmiXq9pp0K6KSuS1iOS.jpeg" height="400" width="450"">
</div>
</div>
<p style="text-align: center;">Normal Product Images</p>
<div style="display: flex;justify-content: space-between;">
<div style="text-align: center;margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/Qv01zDzEM5u8cQYdALrSU.jpeg" height="400" width="450"">
</div>
<div style="text-align: center;justify-content: space-between; margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/B5q_FKiTVXkuElTSlUc4s.jpeg" height="400" width="450"">
</div>
<div style="text-align: center;justify-content: space-between;margin-right: 5px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/3pro8oEqMTiEwiwFKcACn.jpeg" height="400" width="450"">
</div>
</div>
<p style="text-align: center;">Error Product Images</p>
### ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ฐ€์ด๋“œ
๋ณธ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ฐ€์ด๋“œ๋Š” AnomalyDetection ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•ด ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ผ๋ฒจ๋งํ•˜๋Š” ๊ธฐ์ค€๊ณผ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ช…ํ™•ํžˆ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
๋ฐ์ดํ„ฐ๋Š” ์ฃผ๋กœ ์ •์ƒ(normal) ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ตœ์†Œํ•œ์˜ ๋น„์ •์ƒ(anomaly) ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
๋ณธ ๊ฐ€์ด๋“œ๋Š” ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•˜๊ณ  ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ๋ชฉํ‘œ๋ฅผ ๋‘ก๋‹ˆ๋‹ค.
- **๋ผ๋ฒจ๋ง ๋ฒ”์œ„**
1. **์ •์ƒ(normal) ๋ฐ์ดํ„ฐ**:
- ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์•ฝ **95% ์ด์ƒ**์„ ์ฐจ์ง€.
- ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ์กฐ๊ฑด์—์„œ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ (์กฐ๋ช…, ๊ฐ๋„, ๋ฐฐ๊ฒฝ ๋“ฑ).
- ์ •์ƒ์ ์ธ ์ƒํƒœ์˜ ๊ธˆ์† ํ‘œ๋ฉด, ์ •๋ฐ€ํ•œ ๊ตฌ์กฐ, ๊ท ์ผํ•œ ๊ด‘ํƒ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ.
2. **๋น„์ •์ƒ(anomaly) ๋ฐ์ดํ„ฐ**:
- ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์•ฝ 5**% ์ดํ•˜**๋กœ ์ œํ•œ.
- ๊ฒฐํ•จ ์œ ํ˜•:
- **Scratch**: ์Šคํฌ๋ž˜์น˜.
- **Contamination**: ์–ผ๋ฃฉ ๋˜๋Š” ์ด๋ฌผ์งˆ.
- **Crack**: ํ‘œ๋ฉด ๊ท ์—ด.
- **๊ฒฐํ•จ ์ด๋ฏธ์ง€ ์˜ˆ์‹œ**
- **๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ธฐ์ค€**
-**1. ํŒŒ์ผ ๋„ค์ด๋ฐ ๊ทœ์น™**
- ๋ฐ์ดํ„ฐ ๋ฒ„์ „๋ณ„ ํŒŒ์ผ๋ช…์€ ๋ฒ„์ „๋ณ„๋กœ ์ƒ์ดํ•จ.
- ๊ฐ ๋ฒ„์ „์˜ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๋ฌธ์„œ ์ฐธ๊ณ 
- ๋ฐ์ดํ„ฐ ํด๋”๋ช…์€ **`<์ˆ˜์ง‘๋…„์›”์ผ>_<V๋ฒ„์ „>_<๊ฐ„๋‹จํ•œ ์„ค๋ช…>`** ํ˜•์‹์œผ๋กœ ์ž‘์„ฑ.
- ์˜ˆ์‹œ:20240910_V0_๊ฐ„์ด ํ™˜๊ฒฝ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
- **2. ๋ผ๋ฒจ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ**
๋ผ๋ฒจ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” csv ํ˜•์‹์œผ๋กœ ์ €์žฅํ•˜๋ฉฐ, ๊ฐ ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ ๋ฐ ์„ค๋ช…์„ ํฌํ•จ.
- **ํ•„์ˆ˜ ํ•„๋“œ**:
- `image_id`: ์ด๋ฏธ์ง€ ํŒŒ์ผ๋ช….
- `label`: ์ •์ƒ(`normal`) ๋˜๋Š” ๋น„์ •์ƒ(`anomaly`) ์—ฌ๋ถ€.
- `description`: ์ƒ์„ธ ์„ค๋ช…(์˜ˆ: ๊ฒฐํ•จ ์œ ํ˜•).
- **์˜ˆ์‹œ:**
```ruby
{
"image_id": "normal_20241111_001.jpg",
"label": "normal",
"description": "๋งค๋„๋Ÿฌ์šด ํ‘œ๋ฉด์„ ๊ฐ€์ง„ ์ •์ƒ์ ์ธ ๊ธˆ์† ๋ถ€ํ’ˆ, ๊ด‘ํƒ์ด ๊ท ์ผํ•จ."
}
{
"image_id": "abnormal_20241111_002.jpg",
"label": "error",
"description": "ํ‘œ๋ฉด์— ์„ ํ˜• ์Šคํฌ๋ž˜์น˜๊ฐ€ ๋ฐœ๊ฒฌ๋จ."
}
```
# AD-CLIP Model Architecture
AD-CLIP ๋ชจ๋ธ์€ CLIP (ViT-B-32)์„ ๋ฐฑ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ๋Œ€์กฐ ํ•™์Šต์„ ํ†ตํ•ด ์ด์ƒ์„ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.
์ตœ์ข… ์ถœ๋ ฅ์€ ์ด๋ฏธ์ง€๊ฐ€ ๋น„์ •์ƒ์ธ์ง€ ์ •์ƒ์ธ์ง€๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ์ด์ƒ ์ ์ˆ˜์™€ ๊ฐ ํด๋ž˜์Šค์˜ ํ™•๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
<div style="display: flex; justify-content: center; align-items: center; flex-direction: column;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/62sYcSncxxzqGjQAa0MgQ.png" height="500" width="70%">
<p>CLIP-based Anomaly Detection Model Architecture</p>
</div>
- **model:**
- ์ž…๋ ฅ ๊ณ„์ธต (Input Layer):
- ์ž…๋ ฅ ์ด๋ฏธ์ง€: ๋ชจ๋ธ์€ ํฌ๊ธฐ [640, 640, 3]์˜ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ 640x640์€ ์ด๋ฏธ์ง€์˜ ๊ฐ€๋กœ์™€ ์„ธ๋กœ ํฌ๊ธฐ์ด๋ฉฐ, 3์€ RGB ์ƒ‰์ƒ์˜ ์ฑ„๋„ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
- ๊ธฐ๋Šฅ: ์ด ๊ณ„์ธต์€ ์ž…๋ ฅ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ชจ๋ธ์˜ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์— ๋งž๋Š” ํ˜•์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
- backbone:
- CLIP (ViT-B-32): ๋ชจ๋ธ์€ CLIP์˜ Vision Transformer (ViT-B-32) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ViT-B-32๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ ๊ธ‰ ํŠน์„ฑ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
- ํ•„ํ„ฐ: ํ•„ํ„ฐ ํฌ๊ธฐ [32, 64, 128, 256, 512]๋Š” ๊ฐ ViT ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, ์ด๋ฏธ์ง€์˜ ๊ฐ ๋ ˆ๋ฒจ์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜์—ฌ ํŠน์ง•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
- neck:
- ์ด์ƒ ํƒ์ง€ ๋ชจ๋“ˆ (Anomaly Detection Module): ์ด ๋ชจ๋“ˆ์€ CLIP์—์„œ ์ถ”์ถœ๋œ ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์ด์ƒ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์ •์ƒ๊ณผ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ์ค‘์š”ํ•œ ์ฒ˜๋ฆฌ๊ฐ€ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
- ๋Œ€์กฐ ํ•™์Šต (Contrastive Learning): ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์ •์ƒ ์ด๋ฏธ์ง€์™€ ๋น„์ •์ƒ ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•˜์—ฌ, ์ด๋ฏธ์ง€์˜ ์ด์ƒ ์—ฌ๋ถ€๋ฅผ ๋”์šฑ ๋ช…ํ™•ํ•˜๊ฒŒ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ค๋‹ˆ๋‹ค.
- head:
- ์ด์ƒ ํƒ์ง€ ํ—ค๋“œ (Anomaly Detection Head): ๋ชจ๋ธ์˜ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์œผ๋กœ, ์ด ๊ณ„์ธต์€ ์ด๋ฏธ์ง€๊ฐ€ ๋น„์ •์ƒ์ ์ธ์ง€ ์ •์ƒ์ ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
- outputs:
- ์ด์ƒ ์ ์ˆ˜ (Anomaly Score): ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๊ฐ€ ์ด์ƒ์ธ์ง€ ์•„๋‹Œ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ ์ˆ˜(์˜ˆ: 1์€ ์ด์ƒ, 0์€ ์ •์ƒ)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
- ํด๋ž˜์Šค ํ™•๋ฅ  (Class Probabilities): ๋ชจ๋ธ์€ ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ด ํ™•๋ฅ ์„ ํ†ตํ•ด ๊ฒฐํ•จ์ด ์žˆ๋Š”์ง€ ์—†๋Š”์ง€์˜ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.
# Optimizer and Loss Function
- **training:**
- optimizer:
- name: AdamW # AdamW ์˜ตํ‹ฐ๋งˆ์ด์ € (๊ฐ€์ค‘์น˜ ๊ฐ์‡  ํฌํ•จ)
- lr: 0.0001 # ํ•™์Šต๋ฅ 
- loss:
- classification_loss: 1.0 # ๋ถ„๋ฅ˜ ์†์‹ค (๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ)
- anomaly_loss: 1.0 # ๊ฒฐํ•จ ํƒ์ง€ ์†์‹ค (์ด์ƒ ํƒ์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์†์‹ค)
- contrastive_loss: 1.0 # ๋Œ€์กฐ ํ•™์Šต ์†์‹ค (์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ์†์‹ค)
# Metrics
- **metrics:**
- Precision # ์ •๋ฐ€๋„ (Precision)
- Recall # ์žฌํ˜„์œจ (Recall)
- mAP # ํ‰๊ท  ์ •๋ฐ€๋„ (Mean Average Precision)
- F1-Score # F1-์ ์ˆ˜ (๊ท ํ˜• ์žกํžŒ ํ‰๊ฐ€ ์ง€ํ‘œ)
# Training Parameters
**ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •**
- Learning Rate: 0.001.
- Batch Size: 8.
- Epochs: 200.
# Pre-trained CLIP model
| Model | Download |
| --- | --- |
| ViT-B/32 | [download](https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt) |
| ViT-B/16 | [download](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt) |
| ViT-L/14 | [download](https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt) |
| ViT-L/14@336px | [download](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt) |
# Evaluation Parameters
- F1-score: 90%์ด์ƒ.
# ํ•™์Šต ์„ฑ๋Šฅ ๋ฐ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ
- **ํ•™์Šต์„ฑ๋Šฅ ๊ฒฐ๊ณผ๊ณผ ๊ทธ๋ž˜ํ”„**:
<div style="display: flex; justify-content: space-between; margin-bottom: 10px;">
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/7Q1RzKyia-WNSCJHnk2-d.png" height="80%" width="100%" style="margin-right:5px;">
</div>
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/9PyBtPZMACgN1lJOqlVbG.png" height="80%" width="100%" style="margin-right:5px;">
</div>
</div>
<p style="text-align: center;">ํ•™์Šต ๊ณผ์ • ์˜ˆ์‹œ</p>
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/_lUD77x-yueXycuIn7jya.png" height="80%" width="100%" style="margin-right:5px;">
<p>1์ฐจ ํ•™์Šต ์„ฑ๋Šฅ</p>
</div>
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/NHDH9N94cI-KqP8k-ASUN.png" height="80%" width="100%" style="margin-right:5px;">
<p>2์ฐจ ํ•™์Šต ์„ฑ๋Šฅ</p>
</div>
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/6n0DnnQjXD8Ql-p3Owxan.png" height="80%" width="100%" style="margin-right:5px;">
<p>3์ฐจ ํ•™์Šต ์„ฑ๋Šฅ</p>
</div>
</div>
- **ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผํ‘œ**:
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/u1DQHjXM41DMq1JIUOGlp.png" height="100%" width="100%" style="margin-right:5px;">
</div>
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/ndQ60TKlheW8hmOrMBELU.png" height="100%" width="100%" style="margin-right:5px;">
</div>
</div>
- **ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ**:
<div style="display: flex; justify-content: space-between;">
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/A91V0GdrcUcX01cC-biG9.png" height="600" width="1000" style="margin-right:5px;">
<p>Anomaly Product</p>
</div>
<div style="text-align: center; margin-right: 20px;">
<img src="https://cdn-uploads.huggingface.co/production/uploads/65e7d0935ea025ead9623dde/PxleIhphzViTGCubVhWn7.png" height="600" width="1000" style="margin-right:5px;">
<p>Normal Product</p>
</div>
</div>
# ์„ค์น˜ ๋ฐ ์‹คํ–‰ ๊ฐ€์ด๋ผ์ธ
์ด ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด Python๊ณผ ํ•จ๊ป˜ ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:
- **ftfy==6.2.0**: ํ…์ŠคํŠธ ์ •๊ทœํ™” ๋ฐ ์ธ์ฝ”๋”ฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **matplotlib==3.9.0**: ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ฐ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **numpy==1.24.3**: ์ˆ˜์น˜ ์—ฐ์‚ฐ์„ ์œ„ํ•œ ํ•ต์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **opencv_python==4.9.0.80**: ์ด๋ฏธ์ง€ ๋ฐ ๋น„๋””์˜ค ์ฒ˜๋ฆฌ์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **pandas==2.2.2**: ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์กฐ์ž‘์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **Pillow==10.3.0**: ์ด๋ฏธ์ง€ ํŒŒ์ผ ์ฒ˜๋ฆฌ ๋ฐ ๋ณ€ํ™˜์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **PyQt5==5.15.10**: GUI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ.
- **PyQt5_sip==12.13.0**: PyQt5์™€ Python ๊ฐ„์˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **regex==2024.5.15**: ์ •๊ทœ ํ‘œํ˜„์‹ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **scikit_learn==1.2.2**: ๊ธฐ๊ณ„ ํ•™์Šต ๋ฐ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **scipy==1.9.1**: ๊ณผํ•™ ๋ฐ ๊ธฐ์ˆ  ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **setuptools==59.5.0**: Python ํŒจํ‚ค์ง€ ๋ฐฐํฌ ๋ฐ ์„ค์น˜๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **scikit-image**: ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **tabulate==0.9.0**: ํ‘œ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **thop==0.1.1.post2209072238**: PyTorch ๋ชจ๋ธ์˜ FLOP ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋„๊ตฌ.
- **timm==0.6.13**: ๋‹ค์–‘ํ•œ ์ตœ์‹  ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **torch==2.0.0**: PyTorch ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ.
- **torchvision==0.15.1**: ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์„ ์œ„ํ•œ PyTorch ํ™•์žฅ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **tqdm==4.65.0**: ์ง„ํ–‰ ์ƒํ™ฉ์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œ์‹œํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- **pyautogui**: GUI ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.
- Install Python libraries
```
pip install -r requirements.txt
```
## ๋ชจ๋ธ ์‹คํ–‰ ๋‹จ๊ณ„:
### โœ…Dataset configuration
- Dataset configuration as example below
```
โ”œโ”€โ”€ data/
โ”‚ โ”œโ”€โ”€ COMP_1/
โ”‚ โ”‚ โ”œโ”€โ”€ product_1/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€grouth_truth
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_1
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_2
โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€test/
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€good
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_1
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_2
โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€train/
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€good
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_1
โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€anomaly_2
โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”œโ”€โ”€ product_2/
โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ โ”‚ โ”œโ”€โ”€ meta.json
โ”‚ โ”‚ โ”‚
โ”‚ โ”œโ”€โ”€ COMP_2/
โ”‚ โ”‚
```
- Generate JSON file storing all the above information of dataset ( -> meta_train.json, meta_test.json)
```ruby
cd dataset_config
python dataset_get_json.py
```
- Making all grouth_truth (only anomaly mask) by hand
```ruby
cd dataset_config
python image_ground_truth.py
```
- Dataset configuration for train and test
```ruby
cd training_libs
python dataset.py
```
โ†’ _ _init_ _ ๋ฉ”์„œ๋“œ๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ, ๋ณ€ํ™˜ ํ•จ์ˆ˜, ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„, ๋ชจ๋“œ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Œ
โ†’ ๋ฉ”ํƒ€ ์ •๋ณด๋ฅผ ๋‹ด์€ JSON ํŒŒ์ผ (meta_train.json)์„ ์ฝ์–ด์™€ ํด๋ž˜์Šค ์ด๋ฆ„ ๋ชฉ๋ก๊ณผ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํ•ญ๋ชฉ์„ ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
โ†’ generate_class_info ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ํด๋ž˜์Šค ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ํด๋ž˜์Šค ์ด๋ฆ„์„ ํด๋ž˜์Šค ID์— ๋งคํ•‘
โ†’ _ _len_ _ ๋ฉ”์„œ๋“œ๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜
โ†’ _ _getitem_ _ ๋ฉ”์„œ๋“œ๋Š” ์ฃผ์–ด์ง„ ์ธ๋ฑ์Šค์˜ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜
โ†’ ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€๋ฅผ ์ฝ๊ณ , ์ด์ƒ ์—ฌ๋ถ€์— ๋”ฐ๋ผ ๋งˆ์Šคํฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ
โ†’ ํ•„์š”์‹œ ์ด๋ฏธ์ง€์™€ ๋งˆ์Šคํฌ์— ๋ณ€ํ™˜ ํ•จ์ˆ˜๋ฅผ ์ ์šฉ
โ†’ ์ด๋ฏธ์ง€, ๋งˆ์Šคํฌ, ํด๋ž˜์Šค ์ด๋ฆ„, ์ด์ƒ ์—ฌ๋ถ€, ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ, ํด๋ž˜์Šค ID๋ฅผ ํฌํ•จํ•œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜
### โœ… Image pre-processing (transformation) for train and test
```ruby
training_libs/utils.py
```
```ruby
AnomalyCLIP_lib/transform.py
```
- **Data Processing Techniques:**
- normalization:
description: "์ด๋ฏธ์ง€ ํ”ฝ์…€ ๊ฐ’์„ ํ‰๊ท  ๋ฐ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ํ‘œ์ค€ํ™”"
method: "'Normalize' from 'torchvision.transforms'"
- max_resize:
description: "์ด๋ฏธ์ง€์˜ ์ตœ๋Œ€ ํฌ๊ธฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ, ๋น„์œจ์„ ๋งž์ถ”๊ณ  ํŒจ๋”ฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํฌ๊ธฐ ์กฐ์ •"
method: "Custom 'ResizeMaxSize' class"
- random_resized_crop:
description: "ํ›ˆ๋ จ ์ค‘์— ์ด๋ฏธ์ง€๋ฅผ ๋žœ๋ค์œผ๋กœ ์ž๋ฅด๊ณ  ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋ณ€ํ˜•์„ ์ถ”๊ฐ€"
method: "'RandomResizedCrop' from 'torchvision.transforms'"
- resize:
description: "๋ชจ๋ธ ์ž…๋ ฅ์— ๋งž๊ฒŒ ์ด๋ฏธ์ง€๋ฅผ ๊ณ ์ •๋œ ํฌ๊ธฐ๋กœ ์กฐ์ •"
method: "'Resize' with BICUBIC interpolation"
- center_crop:
description: "์ด๋ฏธ์ง€์˜ ์ค‘์•™ ๋ถ€๋ถ„์„ ์ง€์ •๋œ ํฌ๊ธฐ๋กœ ์ž๋ฅด๊ธฐ"
method: "'CenterCrop'"
- to_tensor:
description: "์ด๋ฏธ์ง€๋ฅผ PyTorch ํ…์„œ๋กœ ๋ณ€ํ™˜"
method: "'ToTensor'"
- augmentation (optional):
description: "๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋žœ๋ค ๋ณ€ํ™˜ ์ ์šฉ, 'AugmentationCfg'๋กœ ์„ค์ • ๊ฐ€๋Šฅ"
method: "Uses 'timm' library if specified"
### โœ… Prompt generating
```ruby
training_lib/prompt_ensemble.py
```
๐Ÿ‘ **Prompts Built in the Code**
1. Normal Prompt: *'["{ }"]'*
โ†’ Normal Prompt Example: "object"
2. Anomaly Prompt: *'["damaged { }"]'*
โ†’ Anomaly Prompt Example: "damaged object"
๐Ÿ‘ **Construction Process**
1. *'prompts_pos (Normal)'*: Combines the class name with the normal template
2. *'prompts_neg (Anomaly)'*: Combines the class name with the anomaly template
### โœ… Initial setting for training
- Define the path to the training dataset and model checkpoint saving
```ruby
parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')
```
### โœ… Hyper parameters setting
- Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
```ruby
parser.add_argument("--depth", type=int, default=9, help="image size")
```
- Define the size of input images used for training (pixel)
```ruby
parser.add_argument("--image_size", type=int, default=518, help="image size")
```
- Setting parameters for training
```ruby
parser.add_argument("--epoch", type=int, default=500, help="epochs")
parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
parser.add_argument("--batch_size", type=int, default=8, help="batch size")
```
- Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
```ruby
parser.add_argument("--dpam", type=int, default=20, help="dpam size")
1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
```
```ruby
โ†’ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
โ†’ Helps the model focus on important features within each layer through an attention mechanism
โ†’ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
โ†’ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
โ†’ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
โ†’ If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.
```
### โœ… Test process
๐Ÿ‘ **Load pre-trained and Fine tuned (Checkpoints) models**
1. Pre-trained mode (./pre-trained model/):
```ruby
โ†’ Contains the pre-trained model (ViT-B, ViT-L,....)
โ†’ Used as the starting point for training the CLIP model
โ†’ Pre-trained model helps speed up and improve training by leveraging previously learned features
```
2. Fine-tuned models (./checkpoint/):
```ruby
โ†’ "epoch_N.pth" files in this folder store the model's states during the fine-tuning process.
โ†’ Each ".pth" file represents a version of the model fine-tuned from the pre-trained model
โ†’ These checkpoints can be used to resume fine-tuning, evaluate the model at different stages, or select the best-performing version
```
# ๋ชจ๋ธ ๊ณต๊ฒฉ ์ทจ์•ฝ์  ๋ถ„์„
๋ณธ ๋ฌธ์„œ๋Š” AnomalyCLIP ๋ชจ๋ธ์˜ ์ทจ์•ฝ์  ๋ถ„์„ ๋ฐ ์ ๋Œ€์  ๊ณต๊ฒฉ(Adversarial Attacks)์— ๋Œ€ํ•œ ๋ฐฉ์–ด ๋Œ€์ฑ…์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ˆ˜๋ฆฝํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์•ˆ์ •์„ฑ์„ ํ™•๋ณดํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ ๋ฐ ๋ชจ๋ธ ์ˆ˜์ค€์˜ ๋ฐฉ์–ด ์ „๋žต์„ ๊ตฌํ˜„ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
## **1. ์ทจ์•ฝ์  ๋ถ„์„**
- ### ** ์ ๋Œ€์  ๊ณต๊ฒฉ ์‹œ๋‚˜๋ฆฌ์˜ค**
1. **Adversarial Examples:**
- **์„ค๋ช…:** ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ์ž‘์€ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ์™œ๊ณก.
- **์˜ˆ:** ์ •์ƒ ์ด๋ฏธ์ง€๋ฅผ ๊ฒฐํ•จ ์ด๋ฏธ์ง€๋กœ ์˜ˆ์ธกํ•˜๋„๋ก ์œ ๋„.
2. **Data Poisoning:**
- **์„ค๋ช…:** ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์•…์˜์  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์„ ์™œ๊ณก.
- **์˜ˆ:** ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒฝ์šฐ.
3. **Evasion Attacks:**
- **์„ค๋ช…:** ์ถ”๋ก  ์‹œ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ๋ฅผ ์กฐ์ž‘.
- **์˜ˆ:** ๊ฒฐํ•จ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ์œผ๋กœ ์˜ˆ์ธกํ•˜๋„๋ก ์œ ๋„.
- ### **๋ชจ๋ธ ๋ฐ ๋ฐ์ดํ„ฐ์…‹ ์˜ํ–ฅ**
- **์„ฑ๋Šฅ ์ €ํ•˜:** ์ ๋Œ€์  ์ƒ˜ํ”Œ ์ž…๋ ฅ ์‹œ ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ๊ฐ์†Œ.
- **๋ฌด๊ฒฐ์„ฑ ์†์ƒ:** ๋ฐ์ดํ„ฐ ๋ณ€์กฐ๋กœ ์ธํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์‹ ๋ขฐ์„ฑ์„ ์ƒ์‹ค.
- **์•…์˜์  ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ:** ๋ชจ๋ธ์˜ ์˜์‚ฌ๊ฒฐ์ •์ด ์˜ค์ž‘๋™ํ•˜์—ฌ ์ƒ์‚ฐ ํ’ˆ์งˆ ๊ด€๋ฆฌ ์‹คํŒจ ๊ฐ€๋Šฅ์„ฑ ์ฆ๊ฐ€.
## **2. ๋Œ€์‘ ๋ฐฉ์•ˆ**
- ### ** ๋ฐ์ดํ„ฐ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…**
1. **๋ฐ์ดํ„ฐ ์ •์ œ:**
- ํ๋ฆฟํ•˜๊ฑฐ๋‚˜ ์ž˜๋ฆฐ ์ด๋ฏธ์ง€ ์ œ๊ฑฐ.
- ๋ฐ์ดํ„ฐ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฐ ๊ฒฐํ•จ ๋ณต๊ตฌ.
- **๊ฒฐ๊ณผ:** ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐ•ํ™”๋กœ ์ ๋Œ€์  ๋…ธ์ด์ฆˆ ํšจ๊ณผ ๊ฐ์†Œ.
2. **๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation):**
- ๋žœ๋ค ํšŒ์ „, ํฌ๊ธฐ ์กฐ์ •, ๋ฐ๊ธฐ ๋ฐ ๋Œ€๋น„ ์กฐ์ •.
- Gaussian Noise ๋ฐ Salt-and-Pepper Noise ์ถ”๊ฐ€.
- **๊ฒฐ๊ณผ:** ๋ฐ์ดํ„ฐ ๋‹ค์–‘์„ฑ ํ™•๋ณด ๋ฐ ๋ชจ๋ธ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๊ฐ•ํ™”.
3. **๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ:**
- ๊ฐ ๋ฐ์ดํ„ฐ ํ•ด์‹œ๊ฐ’(MD5) ์ €์žฅ ๋ฐ ์œ„๋ณ€์กฐ ์—ฌ๋ถ€ ํ™•์ธ.
- **๊ฒฐ๊ณผ:** ๋ฐ์ดํ„ฐ์…‹ ์‹ ๋ขฐ์„ฑ ๋ฐ ๋ฌด๊ฒฐ์„ฑ ๋ณด์žฅ.
- ### **๋ชจ๋ธ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…**
1. **Adversarial Training:**
- FGSM ๊ธฐ๋ฐ˜์˜ ์ ๋Œ€์  ์ƒ˜ํ”Œ์„ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ.
- **๊ฒฐ๊ณผ:** ์ ๋Œ€์  ์ƒ˜ํ”Œ์—์„œ๋„ ํ‰๊ท  ์ •ํ™•๋„ 5% ํ–ฅ์ƒ.
2. **Gradient Masking:**
- ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์ˆจ๊ฒจ ๋ชจ๋ธ์ด ์ ๋Œ€์  ๊ณต๊ฒฉ์— ๋…ธ์ถœ๋˜์ง€ ์•Š๋„๋ก ๋ฐฉ์–ด.
3. **Temperature Scaling:**
- ๋ชจ๋ธ์˜ ์˜ˆ์ธก ํ™•๋ฅ ์„ ์กฐ์ •ํ•˜์—ฌ ์ ๋Œ€์  ์ƒ˜ํ”Œ ๋ฏผ๊ฐ๋„ ์™„ํ™”.
- ### **์‹œ์Šคํ…œ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…**
1. **์‹ค์‹œ๊ฐ„ ํƒ์ง€ ๋ฐ ๋Œ€์‘:**
- ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ด์ƒ ํŒจํ„ด์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํƒ์ง€ํ•˜๋Š” ์‹œ์Šคํ…œ ๊ตฌ์ถ•.
- **๊ฒฐ๊ณผ:** ์ ๋Œ€์  ๊ณต๊ฒฉ ๋ฐœ์ƒ ์‹œ ์ฆ‰๊ฐ์ ์ธ ๊ฒฝ๊ณ  ๋ฐ ๋Œ€์‘ ๊ฐ€๋Šฅ.
2. **์ž๋™ํ™”๋œ ๋ฐฉ์–ด ๋„๊ตฌ:**
- Adversarial Examples ์ƒ์„ฑ ๋ฐ ๋ฐฉ์–ด ํ…Œ์ŠคํŠธ ์ž๋™ํ™”.
## **3. ์‹คํ—˜ ๊ฒฐ๊ณผ**
- ### **ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ**
- **๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ:**
- ์ •์ƒ ๋ฐ์ดํ„ฐ: 110๊ฑด
- ๊ฒฐํ•จ ๋ฐ์ดํ„ฐ: 10๊ฑด
- ์ ๋Œ€์  ๋ฐ์ดํ„ฐ(FGSM ๊ณต๊ฒฉ): 100๊ฑด
- ### **์ฃผ์š” ์„ฑ๋Šฅ ์ง€ํ‘œ**
๋ฉ”ํŠธ๋ฆญ | ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ | ์ ๋Œ€์  ๋ฐ์ดํ„ฐ | ๋ณ€ํ™”์œจ
-----------------|-------------|---------------|--------
Accuracy | 98% | 92% | -6%
F1 Score | 0.935 | 0.91 | -2.5%
False Positive | 2% | 5% | +3%
False Negative | 3% | 7% | +4%
## **4. ํ–ฅํ›„ ๊ณ„ํš**
1. **๋‹ค์–‘ํ•œ ๊ณต๊ฒฉ ๊ธฐ๋ฒ• ํ…Œ์ŠคํŠธ:**
- PGD, DeepFool ๋“ฑ ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ ๊ธฐ๋ฒ• ์ ์šฉ ๋ฐ ํ‰๊ฐ€.
2. **๋ชจ๋ธ ๊ฐœ์„ :**
- Contrastive Learning ๋ฐ ์•™์ƒ๋ธ” ํ•™์Šต์„ ํ†ตํ•œ ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”.
3. **์‹ค์‹œ๊ฐ„ ๋ฐฉ์–ด ์‹œ์Šคํ…œ ๊ตฌ์ถ•:**
- ๋ชจ๋ธ์˜ ์‹ค์‹œ๊ฐ„ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์ ๋Œ€์  ์ž…๋ ฅ ํƒ์ง€ ๋ฐ ์ฐจ๋‹จ.
# References
- AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [[github](https://github.com/zqhang/AnomalyCLIP.git)]