CLIP ๊ธฐ๋ฐ˜ ์ œํ’ˆ ๊ฒฐํ•จ ํƒ์ง€ ๋ชจ๋ธ ์นด๋“œ

๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ

๋ชจ๋ธ ์„ค๋ช…

AnomalyCLIP์€ ํŠน์ • ๊ฐ์ฒด์— ์˜์กดํ•˜์ง€ ์•Š๋Š” ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‚ด์˜ ์ „๊ฒฝ ๊ฐ์ฒด์™€ ์ƒ๊ด€์—†์ด ์ผ๋ฐ˜์ ์ธ ์ •์ƒ ๋ฐ ๋น„์ •์ƒ ํŒจํ„ด์„ ํฌ์ฐฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ CLIP ๊ธฐ๋ฐ˜ ์ด์ƒ ํƒ์ง€ ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ์ œํ’ˆ ๊ฒฐํ•จ์„ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์ „ ํ•™์Šต๋œ CLIP ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹(Fine-tuning)ํ•˜์—ฌ ์ œํ’ˆ ์ด๋ฏธ์ง€์—์„œ ๊ฒฐํ•จ์„ ์‹๋ณ„ํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์ƒ์‚ฐ ๋ผ์ธ์˜ ํ’ˆ์งˆ ๊ด€๋ฆฌ ๋ฐ ๊ฒฐํ•จ ํƒ์ง€ ์ž‘์—…์„ ์ž๋™ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Developed by: ์œค์„๋ฏผ
  • Funded by: SOLUWINS Co., Ltd. (์†”๋ฃจ์œˆ์Šค)
  • Referenced by: zhou2023 anomalyclip [github]
  • Model type: CLIP (Contrastive Language-Image Pretraining) - Domain-Agnostic Prompt Learning Model
  • Language(s): Python
  • License: Apache 2.0, MIT, OpenAI

๊ธฐ์ˆ ์  ์ œํ•œ์‚ฌํ•ญ

  • ๋ชจ๋ธ์€ ๊ฒฐํ•จ ํƒ์ง€๋ฅผ ์œ„ํ•œ ์ถฉ๋ถ„ํ•˜๊ณ  ๋‹ค์–‘ํ•œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์ด ๋ถ€์กฑํ•˜๊ฑฐ๋‚˜ ๋ถˆ๊ท ํ˜•ํ•  ๊ฒฝ์šฐ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‹ค์‹œ๊ฐ„ ๊ฒฐํ•จ ๊ฐ์ง€ ์„ฑ๋Šฅ์€ ํ•˜๋“œ์›จ์–ด ์‚ฌ์–‘์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋†’์€ ํ•ด์ƒ๋„์—์„œ ๊ฒฐํ•จ์„ ํƒ์ง€ํ•˜๋Š” ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฒฐํ•จ์ด ๋ฏธ์„ธํ•˜๊ฑฐ๋‚˜ ์ œํ’ˆ ๊ฐ„ ์œ ์‚ฌ์„ฑ์ด ๋งค์šฐ ๋†’์€ ๊ฒฝ์šฐ, ๋ชจ๋ธ์ด ๊ฒฐํ•จ์„ ์ •ํ™•ํ•˜๊ฒŒ ํƒ์ง€ํ•˜์ง€ ๋ชปํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ์„ธ๋ถ€์‚ฌํ•ญ

Hardware

  • CPU: Intel Core i9-13900K (24 Cores, 32 Threads)
  • RAM: 64GB DDR5
  • GPU: NVIDIA RTX 4090Ti 24GB
  • Storage: 1TB NVMe SSD + 2TB HDD

Software

  • OS: Windows 11 64 bit/ Ubuntu 20.04LTS
  • Python: 3.8 (anaconda)
  • PyTorch: 1.9.0
  • OpenCV: 4.5.3
  • Cuda Toolkit: 11.8
  • CudDNN: 9.3.0.75 for cuda11

๋ฐ์ดํ„ฐ์…‹ ์ •๋ณด

์ด ๋ชจ๋ธ์€ ์ œํ’ˆ์˜ ์ •์ƒ ์ด๋ฏธ์ง€์™€ ๊ฒฐํ•จ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋Š” ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€, ๊ฒฐํ•จ ์˜์—ญ์— ๋Œ€ํ•œ ground truth ์ •๋ณด, ๊ทธ๋ฆฌ๊ณ  ๊ธฐํƒ€ ๊ด€๋ จ ํŠน์„ฑ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๋Š” CLIP ๋ชจ๋ธ์˜ ์ž…๋ ฅ ํ˜•์‹์— ์ ํ•ฉํ•˜๋„๋ก ์ „์ฒ˜๋ฆฌ๋˜๋ฉฐ, ๊ฒฐํ•จ ์˜์—ญ์˜ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ground truth ๋งˆํ‚น์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

  • ๋ฐ์ดํ„ฐ ์†Œ์Šค: https://huggingface.co/datasets/quandao92/ad-clip-dataset

  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์žฅ๋น„:

    • ์ˆ˜์ง‘ H/W: jetson orin nano 8GB
    • ์นด๋ฉ”๋ผ: BFS-U3-89S6C Color Camera
    • ๋ Œ์ฆˆ: 8mm Fiexd Focal Length Lens
    • ์กฐ๋ช…: LIDLA-120070
    • ๋ฐ์ดํ„ฐ ํ˜•์‹: .bpm, .jpg
  • ๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๊ด€๋ฆฌ:

    • 1์ฐจ : 20240910_V0_๊ฐ„์ด ํ™˜๊ฒฝ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
      • V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›๋ณธ: 7ea
      • V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜: 4ea/3ea
      • V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ•_45/90/135๋„๋กœ ํšŒ์ „_28ea

        Ground Truth Marking

    PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”

    Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ

    • 2์ฐจ : 20240920_V1_ํ•˜์šฐ์ง• ๋‚ด ์ด๋ฏธ์ง€ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
      • V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์›๋ณธ: 16ea
      • V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜: 14ea/2ea
      • V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ•__64ea

        Ground Truth Marking

    PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”

    Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ

    • 3์ฐจ : 20241002_V2_์„ค๋น„ ๋‚ด ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ์ดํ„ฐ ๋ฒ„์ „ ๋ฐ ์‚ฌ์šฉ ์ด๋ ฅ
      • V01: ์ „์ฒ˜๋ฆฌ ์ „ ๋ฐ์ดํ„ฐ ์›๋ณธ -> ์ด๋ฏธ์ง€ ์ˆ˜์ง‘_49๊ฐœ
      • V02: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ -> ์ •์ƒ/๋ถˆ๋Ÿ‰ ๋ถ„๋ฅ˜ ์ˆ˜ํ–‰_error/normal
      • V03: ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜, ๋ฐ์ดํ„ฐ ํšŒ์ „ -> ์ด๋ฏธ์ง€ ์ฆ๊ฐ• ์ˆ˜ํ–‰_์ด๋ฏธ์ง€ ํšŒ์ „์„ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๊ฐœ์ˆ˜ 102๊ฐœ

        Ground Truth Marking

    PCA ๋ถ„ํฌ ์‹œ๊ฐํ™”

    Isolation Forest๋กœ ์ด์ƒ๊ฐ’ ์‹๋ณ„ ๊ฒฐ๊ณผ

  • Data Configuration:

    • ์ด๋ฏธ์ง€ ํฌ๊ธฐ ์กฐ์ • ๋ฐ ์ •๊ทœํ™”:

      • ์ด๋ฏธ์ง€๋Š” ์ผ์ •ํ•œ ํฌ๊ธฐ(์˜ˆ: 518x518)๋กœ ๋ฆฌ์‚ฌ์ด์ฆˆ๋˜๋ฉฐ, CLIP ๋ชจ๋ธ์˜ ์ž…๋ ฅ์œผ๋กœ ์ ํ•ฉํ•˜๊ฒŒ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.
      • ์ •๊ทœํ™”๋ฅผ ํ†ตํ•ด ํ”ฝ์…€ ๊ฐ’์„ [0, 1] ๋ฒ”์œ„๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • Ground Truth ๋งˆํ‚น:

      • ๊ฒฐํ•จ์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฒฐํ•จ ์˜์—ญ์„ bounding box ํ˜•์‹ ๋˜๋Š” binary mask๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค.
      • ๋งˆํ‚น๋œ ๋ฐ์ดํ„ฐ๋ฅผ JSON ๋˜๋Š” CSV ํ˜•์‹์œผ๋กœ ์ €์žฅํ•˜์—ฌ ๋ชจ๋ธ ํ‰๊ฐ€ ์‹œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

      Ground Truth Marking

    • ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜:

      • Normal: ๊ฒฐํ•จ์ด ์—†๋Š” ์ •์ƒ ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€.
      • Error: ๊ฒฐํ•จ์ด ์žˆ๋Š” ์ œํ’ˆ์˜ ์ด๋ฏธ์ง€. ๊ฒฐํ•จ ์œ„์น˜์™€ ๊ด€๋ จ ์ •๋ณด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

        Normal Product Images

      Error Product Images

๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ฐ€์ด๋“œ

๋ณธ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ฐ€์ด๋“œ๋Š” AnomalyDetection ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•ด ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ผ๋ฒจ๋งํ•˜๋Š” ๊ธฐ์ค€๊ณผ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ช…ํ™•ํžˆ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ์ฃผ๋กœ ์ •์ƒ(normal) ๋ฐ์ดํ„ฐ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์ตœ์†Œํ•œ์˜ ๋น„์ •์ƒ(anomaly) ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๊ฐ€์ด๋“œ๋Š” ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•˜๊ณ  ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ…Œ์ŠคํŠธ๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ๋ชฉํ‘œ๋ฅผ ๋‘ก๋‹ˆ๋‹ค.

  • ๋ผ๋ฒจ๋ง ๋ฒ”์œ„

    1. ์ •์ƒ(normal) ๋ฐ์ดํ„ฐ:
      • ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์•ฝ 95% ์ด์ƒ์„ ์ฐจ์ง€.
      • ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ์กฐ๊ฑด์—์„œ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ (์กฐ๋ช…, ๊ฐ๋„, ๋ฐฐ๊ฒฝ ๋“ฑ).
      • ์ •์ƒ์ ์ธ ์ƒํƒœ์˜ ๊ธˆ์† ํ‘œ๋ฉด, ์ •๋ฐ€ํ•œ ๊ตฌ์กฐ, ๊ท ์ผํ•œ ๊ด‘ํƒ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ.
    2. ๋น„์ •์ƒ(anomaly) ๋ฐ์ดํ„ฐ:
      • ์ „์ฒด ๋ฐ์ดํ„ฐ์˜ ์•ฝ 5**% ์ดํ•˜**๋กœ ์ œํ•œ.
      • ๊ฒฐํ•จ ์œ ํ˜•:
        • Scratch: ์Šคํฌ๋ž˜์น˜.
        • Contamination: ์–ผ๋ฃฉ ๋˜๋Š” ์ด๋ฌผ์งˆ.
        • Crack: ํ‘œ๋ฉด ๊ท ์—ด.
        • ๊ฒฐํ•จ ์ด๋ฏธ์ง€ ์˜ˆ์‹œ
  • ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ธฐ์ค€

    -1. ํŒŒ์ผ ๋„ค์ด๋ฐ ๊ทœ์น™

    • ๋ฐ์ดํ„ฐ ๋ฒ„์ „๋ณ„ ํŒŒ์ผ๋ช…์€ ๋ฒ„์ „๋ณ„๋กœ ์ƒ์ดํ•จ.

    • ๊ฐ ๋ฒ„์ „์˜ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๋ฌธ์„œ ์ฐธ๊ณ 

    • ๋ฐ์ดํ„ฐ ํด๋”๋ช…์€ <์ˆ˜์ง‘๋…„์›”์ผ>_<V๋ฒ„์ „>_<๊ฐ„๋‹จํ•œ ์„ค๋ช…> ํ˜•์‹์œผ๋กœ ์ž‘์„ฑ.

    • ์˜ˆ์‹œ:20240910_V0_๊ฐ„์ด ํ™˜๊ฒฝ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘

    • 2. ๋ผ๋ฒจ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

    ๋ผ๋ฒจ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋Š” csv ํ˜•์‹์œผ๋กœ ์ €์žฅํ•˜๋ฉฐ, ๊ฐ ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ ๋ฐ ์„ค๋ช…์„ ํฌํ•จ.

    • ํ•„์ˆ˜ ํ•„๋“œ:
      • image_id: ์ด๋ฏธ์ง€ ํŒŒ์ผ๋ช….
      • label: ์ •์ƒ(normal) ๋˜๋Š” ๋น„์ •์ƒ(anomaly) ์—ฌ๋ถ€.
      • description: ์ƒ์„ธ ์„ค๋ช…(์˜ˆ: ๊ฒฐํ•จ ์œ ํ˜•).
  • ์˜ˆ์‹œ:

    {
      "image_id": "normal_20241111_001.jpg",
      "label": "normal",
      "description": "๋งค๋„๋Ÿฌ์šด ํ‘œ๋ฉด์„ ๊ฐ€์ง„ ์ •์ƒ์ ์ธ ๊ธˆ์† ๋ถ€ํ’ˆ, ๊ด‘ํƒ์ด ๊ท ์ผํ•จ."
    }
    {
      "image_id": "abnormal_20241111_002.jpg",
      "label": "error",
      "description": "ํ‘œ๋ฉด์— ์„ ํ˜• ์Šคํฌ๋ž˜์น˜๊ฐ€ ๋ฐœ๊ฒฌ๋จ."
    }
    

AD-CLIP Model Architecture

AD-CLIP ๋ชจ๋ธ์€ CLIP (ViT-B-32)์„ ๋ฐฑ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ๋Œ€์กฐ ํ•™์Šต์„ ํ†ตํ•ด ์ด์ƒ์„ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ตœ์ข… ์ถœ๋ ฅ์€ ์ด๋ฏธ์ง€๊ฐ€ ๋น„์ •์ƒ์ธ์ง€ ์ •์ƒ์ธ์ง€๋ฅผ ํŒ๋ณ„ํ•˜๋Š” ์ด์ƒ ์ ์ˆ˜์™€ ๊ฐ ํด๋ž˜์Šค์˜ ํ™•๋ฅ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

CLIP-based Anomaly Detection Model Architecture

  • model:
    • ์ž…๋ ฅ ๊ณ„์ธต (Input Layer):
      • ์ž…๋ ฅ ์ด๋ฏธ์ง€: ๋ชจ๋ธ์€ ํฌ๊ธฐ [640, 640, 3]์˜ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ 640x640์€ ์ด๋ฏธ์ง€์˜ ๊ฐ€๋กœ์™€ ์„ธ๋กœ ํฌ๊ธฐ์ด๋ฉฐ, 3์€ RGB ์ƒ‰์ƒ์˜ ์ฑ„๋„ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
      • ๊ธฐ๋Šฅ: ์ด ๊ณ„์ธต์€ ์ž…๋ ฅ๋œ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋ชจ๋ธ์˜ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์— ๋งž๋Š” ํ˜•์‹์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ค€๋น„ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
    • backbone:
      • CLIP (ViT-B-32): ๋ชจ๋ธ์€ CLIP์˜ Vision Transformer (ViT-B-32) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ViT-B-32๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ ๊ธ‰ ํŠน์„ฑ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
      • ํ•„ํ„ฐ: ํ•„ํ„ฐ ํฌ๊ธฐ [32, 64, 128, 256, 512]๋Š” ๊ฐ ViT ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, ์ด๋ฏธ์ง€์˜ ๊ฐ ๋ ˆ๋ฒจ์—์„œ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜์—ฌ ํŠน์ง•์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
    • neck:
      • ์ด์ƒ ํƒ์ง€ ๋ชจ๋“ˆ (Anomaly Detection Module): ์ด ๋ชจ๋“ˆ์€ CLIP์—์„œ ์ถ”์ถœ๋œ ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์ด์ƒ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ๋Š” ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์ •์ƒ๊ณผ ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์œ„ํ•œ ์ค‘์š”ํ•œ ์ฒ˜๋ฆฌ๊ฐ€ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
      • ๋Œ€์กฐ ํ•™์Šต (Contrastive Learning): ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์ •์ƒ ์ด๋ฏธ์ง€์™€ ๋น„์ •์ƒ ์ด๋ฏธ์ง€ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•˜์—ฌ, ์ด๋ฏธ์ง€์˜ ์ด์ƒ ์—ฌ๋ถ€๋ฅผ ๋”์šฑ ๋ช…ํ™•ํ•˜๊ฒŒ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋„์™€์ค๋‹ˆ๋‹ค.
    • head:
      • ์ด์ƒ ํƒ์ง€ ํ—ค๋“œ (Anomaly Detection Head): ๋ชจ๋ธ์˜ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์œผ๋กœ, ์ด ๊ณ„์ธต์€ ์ด๋ฏธ์ง€๊ฐ€ ๋น„์ •์ƒ์ ์ธ์ง€ ์ •์ƒ์ ์ธ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
      • outputs:
        • ์ด์ƒ ์ ์ˆ˜ (Anomaly Score): ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๊ฐ€ ์ด์ƒ์ธ์ง€ ์•„๋‹Œ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์ ์ˆ˜(์˜ˆ: 1์€ ์ด์ƒ, 0์€ ์ •์ƒ)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
        • ํด๋ž˜์Šค ํ™•๋ฅ  (Class Probabilities): ๋ชจ๋ธ์€ ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ด ํ™•๋ฅ ์„ ํ†ตํ•ด ๊ฒฐํ•จ์ด ์žˆ๋Š”์ง€ ์—†๋Š”์ง€์˜ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

Optimizer and Loss Function

  • training:
    • optimizer:
      • name: AdamW # AdamW ์˜ตํ‹ฐ๋งˆ์ด์ € (๊ฐ€์ค‘์น˜ ๊ฐ์‡  ํฌํ•จ)
      • lr: 0.0001 # ํ•™์Šต๋ฅ 
    • loss:
      • classification_loss: 1.0 # ๋ถ„๋ฅ˜ ์†์‹ค (๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ)
      • anomaly_loss: 1.0 # ๊ฒฐํ•จ ํƒ์ง€ ์†์‹ค (์ด์ƒ ํƒ์ง€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์†์‹ค)
      • contrastive_loss: 1.0 # ๋Œ€์กฐ ํ•™์Šต ์†์‹ค (์œ ์‚ฌ๋„ ๊ธฐ๋ฐ˜ ์†์‹ค)

Metrics

  • metrics:
    • Precision # ์ •๋ฐ€๋„ (Precision)
    • Recall # ์žฌํ˜„์œจ (Recall)
    • mAP # ํ‰๊ท  ์ •๋ฐ€๋„ (Mean Average Precision)
    • F1-Score # F1-์ ์ˆ˜ (๊ท ํ˜• ์žกํžŒ ํ‰๊ฐ€ ์ง€ํ‘œ)

Training Parameters

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •

  • Learning Rate: 0.001.
  • Batch Size: 8.
  • Epochs: 200.

Pre-trained CLIP model

Model Download
ViT-B/32 download
ViT-B/16 download
ViT-L/14 download
ViT-L/14@336px download

Evaluation Parameters

  • F1-score: 90%์ด์ƒ.

ํ•™์Šต ์„ฑ๋Šฅ ๋ฐ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ

  • ํ•™์Šต์„ฑ๋Šฅ ๊ฒฐ๊ณผ๊ณผ ๊ทธ๋ž˜ํ”„:

    ํ•™์Šต ๊ณผ์ • ์˜ˆ์‹œ

1์ฐจ ํ•™์Šต ์„ฑ๋Šฅ

2์ฐจ ํ•™์Šต ์„ฑ๋Šฅ

3์ฐจ ํ•™์Šต ์„ฑ๋Šฅ

  • ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผํ‘œ:

  • ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ:

    Anomaly Product

    Normal Product

์„ค์น˜ ๋ฐ ์‹คํ–‰ ๊ฐ€์ด๋ผ์ธ

์ด ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋ ค๋ฉด Python๊ณผ ํ•จ๊ป˜ ๋‹ค์Œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:

  • ftfy==6.2.0: ํ…์ŠคํŠธ ์ •๊ทœํ™” ๋ฐ ์ธ์ฝ”๋”ฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • matplotlib==3.9.0: ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” ๋ฐ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • numpy==1.24.3: ์ˆ˜์น˜ ์—ฐ์‚ฐ์„ ์œ„ํ•œ ํ•ต์‹ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • opencv_python==4.9.0.80: ์ด๋ฏธ์ง€ ๋ฐ ๋น„๋””์˜ค ์ฒ˜๋ฆฌ์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • pandas==2.2.2: ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ์กฐ์ž‘์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • Pillow==10.3.0: ์ด๋ฏธ์ง€ ํŒŒ์ผ ์ฒ˜๋ฆฌ ๋ฐ ๋ณ€ํ™˜์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • PyQt5==5.15.10: GUI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ.

  • PyQt5_sip==12.13.0: PyQt5์™€ Python ๊ฐ„์˜ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • regex==2024.5.15: ์ •๊ทœ ํ‘œํ˜„์‹ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • scikit_learn==1.2.2: ๊ธฐ๊ณ„ ํ•™์Šต ๋ฐ ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • scipy==1.9.1: ๊ณผํ•™ ๋ฐ ๊ธฐ์ˆ  ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • setuptools==59.5.0: Python ํŒจํ‚ค์ง€ ๋ฐฐํฌ ๋ฐ ์„ค์น˜๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • scikit-image: ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ ๋ฐ ๋ถ„์„์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • tabulate==0.9.0: ํ‘œ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • thop==0.1.1.post2209072238: PyTorch ๋ชจ๋ธ์˜ FLOP ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋„๊ตฌ.

  • timm==0.6.13: ๋‹ค์–‘ํ•œ ์ตœ์‹  ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • torch==2.0.0: PyTorch ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ.

  • torchvision==0.15.1: ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์„ ์œ„ํ•œ PyTorch ํ™•์žฅ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • tqdm==4.65.0: ์ง„ํ–‰ ์ƒํ™ฉ์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œ์‹œํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • pyautogui: GUI ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.

  • Install Python libraries

    pip install -r requirements.txt
    

๋ชจ๋ธ ์‹คํ–‰ ๋‹จ๊ณ„:

โœ…Dataset configuration

  • Dataset configuration as example below
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ COMP_1/
โ”‚   โ”‚   โ”œโ”€โ”€ product_1/
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€grouth_truth
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_1
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_2
โ”‚   โ”‚   โ”‚   โ”‚
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€test/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€good
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_1
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_2
โ”‚   โ”‚   โ”‚   โ”‚
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€train/
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€good
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_1
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€anomaly_2
โ”‚   โ”‚   โ”‚   โ”‚ 
โ”‚   โ”‚   โ”œโ”€โ”€ product_2/
โ”‚   โ”‚   โ”‚   โ”‚
โ”‚   โ”‚   โ”œโ”€โ”€ meta.json
โ”‚   โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ COMP_2/
โ”‚   โ”‚ 
  • Generate JSON file storing all the above information of dataset ( -> meta_train.json, meta_test.json)
cd dataset_config
python dataset_get_json.py
  • Making all grouth_truth (only anomaly mask) by hand
cd dataset_config
python image_ground_truth.py
  • Dataset configuration for train and test
cd training_libs
python dataset.py

โ†’ _ init _ ๋ฉ”์„œ๋“œ๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ, ๋ณ€ํ™˜ ํ•จ์ˆ˜, ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„, ๋ชจ๋“œ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Œ
โ†’ ๋ฉ”ํƒ€ ์ •๋ณด๋ฅผ ๋‹ด์€ JSON ํŒŒ์ผ (meta_train.json)์„ ์ฝ์–ด์™€ ํด๋ž˜์Šค ์ด๋ฆ„ ๋ชฉ๋ก๊ณผ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํ•ญ๋ชฉ์„ ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
โ†’ generate_class_info ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ํด๋ž˜์Šค ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ํด๋ž˜์Šค ์ด๋ฆ„์„ ํด๋ž˜์Šค ID์— ๋งคํ•‘
โ†’ _ len _ ๋ฉ”์„œ๋“œ๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ์ƒ˜ํ”Œ ์ˆ˜๋ฅผ ๋ฐ˜ํ™˜
โ†’ _ getitem _ ๋ฉ”์„œ๋“œ๋Š” ์ฃผ์–ด์ง„ ์ธ๋ฑ์Šค์˜ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜
โ†’ ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€๋ฅผ ์ฝ๊ณ , ์ด์ƒ ์—ฌ๋ถ€์— ๋”ฐ๋ผ ๋งˆ์Šคํฌ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ
โ†’ ํ•„์š”์‹œ ์ด๋ฏธ์ง€์™€ ๋งˆ์Šคํฌ์— ๋ณ€ํ™˜ ํ•จ์ˆ˜๋ฅผ ์ ์šฉ
โ†’ ์ด๋ฏธ์ง€, ๋งˆ์Šคํฌ, ํด๋ž˜์Šค ์ด๋ฆ„, ์ด์ƒ ์—ฌ๋ถ€, ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ, ํด๋ž˜์Šค ID๋ฅผ ํฌํ•จํ•œ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜

โœ… Image pre-processing (transformation) for train and test

  training_libs/utils.py
  AnomalyCLIP_lib/transform.py
  • Data Processing Techniques:
    • normalization: description: "์ด๋ฏธ์ง€ ํ”ฝ์…€ ๊ฐ’์„ ํ‰๊ท  ๋ฐ ํ‘œ์ค€ํŽธ์ฐจ๋กœ ํ‘œ์ค€ํ™”" method: "'Normalize' from 'torchvision.transforms'"
    • max_resize: description: "์ด๋ฏธ์ง€์˜ ์ตœ๋Œ€ ํฌ๊ธฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ, ๋น„์œจ์„ ๋งž์ถ”๊ณ  ํŒจ๋”ฉ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํฌ๊ธฐ ์กฐ์ •" method: "Custom 'ResizeMaxSize' class"
    • random_resized_crop: description: "ํ›ˆ๋ จ ์ค‘์— ์ด๋ฏธ์ง€๋ฅผ ๋žœ๋ค์œผ๋กœ ์ž๋ฅด๊ณ  ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋ณ€ํ˜•์„ ์ถ”๊ฐ€" method: "'RandomResizedCrop' from 'torchvision.transforms'"
    • resize: description: "๋ชจ๋ธ ์ž…๋ ฅ์— ๋งž๊ฒŒ ์ด๋ฏธ์ง€๋ฅผ ๊ณ ์ •๋œ ํฌ๊ธฐ๋กœ ์กฐ์ •" method: "'Resize' with BICUBIC interpolation"
    • center_crop: description: "์ด๋ฏธ์ง€์˜ ์ค‘์•™ ๋ถ€๋ถ„์„ ์ง€์ •๋œ ํฌ๊ธฐ๋กœ ์ž๋ฅด๊ธฐ" method: "'CenterCrop'"
    • to_tensor: description: "์ด๋ฏธ์ง€๋ฅผ PyTorch ํ…์„œ๋กœ ๋ณ€ํ™˜" method: "'ToTensor'"
    • augmentation (optional): description: "๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋žœ๋ค ๋ณ€ํ™˜ ์ ์šฉ, 'AugmentationCfg'๋กœ ์„ค์ • ๊ฐ€๋Šฅ" method: "Uses 'timm' library if specified"

โœ… Prompt generating

  training_lib/prompt_ensemble.py

๐Ÿ‘ Prompts Built in the Code

  1. Normal Prompt: '["{ }"]'
    โ†’ Normal Prompt Example: "object"
  2. Anomaly Prompt: '["damaged { }"]'
    โ†’ Anomaly Prompt Example: "damaged object"

๐Ÿ‘ Construction Process

  1. 'prompts_pos (Normal)': Combines the class name with the normal template
  2. 'prompts_neg (Anomaly)': Combines the class name with the anomaly template

โœ… Initial setting for training

  • Define the path to the training dataset and model checkpoint saving
parser.add_argument("--train_data_path", type=str, default="./data/", help="train dataset path")
parser.add_argument("--dataset", type=str, default='smoke_cloud', help="train dataset name")
parser.add_argument("--save_path", type=str, default='./checkpoint/', help='path to save results')

โœ… Hyper parameters setting

  • Set the depth parameter: depth of the embedding learned during prompt training. This affects the model's ability to learn complex features from the data
parser.add_argument("--depth", type=int, default=9, help="image size")
  • Define the size of input images used for training (pixel)
parser.add_argument("--image_size", type=int, default=518, help="image size")
  • Setting parameters for training
parser.add_argument("--epoch", type=int, default=500, help="epochs")
parser.add_argument("--learning_rate", type=float, default=0.0001, help="learning rate")
parser.add_argument("--batch_size", type=int, default=8, help="batch size")
  • Size/depth parameter for the DPAM (Deep Prompt Attention Mechanism)
parser.add_argument("--dpam", type=int, default=20, help="dpam size")

1. ViT-B/32 and ViT-B/16: --dpam should be around 10-13
2. ViT-L/14 and ViT-L/14@336px: --dpam should be around 20-24
โ†’ DPAM is used to refine and enhance specific layers of a model, particularly in Vision Transformers (ViT).
โ†’ Helps the model focus on important features within each layer through an attention mechanism
โ†’ Layers: DPAM is applied across multiple layers, allowing deeper and more detailed feature extraction
โ†’ Number of layers DPAM influences is adjustable (--dpam), controlling how much of the model is fine-tuned.
โ†’ If you want to refine the entire model, you can set --dpam to the number of layers in the model (e.g., 12 for ViT-B and 24 for ViT-L).
โ†’  If you want to focus only on the final layers (where the model usually learns complex features), you can choose fewer DPAM layers.

โœ… Test process

๐Ÿ‘ Load pre-trained and Fine tuned (Checkpoints) models

  1. Pre-trained mode (./pre-trained model/):
โ†’ Contains the pre-trained model (ViT-B, ViT-L,....)
โ†’ Used as the starting point for training the CLIP model
โ†’ Pre-trained model helps speed up and improve training by leveraging previously learned features
  1. Fine-tuned models (./checkpoint/):
โ†’ "epoch_N.pth" files in this folder store the model's states during the fine-tuning process.
โ†’ Each ".pth" file represents a version of the model fine-tuned from the pre-trained model
โ†’ These checkpoints can be used to resume fine-tuning, evaluate the model at different stages, or select the best-performing version

๋ชจ๋ธ ๊ณต๊ฒฉ ์ทจ์•ฝ์  ๋ถ„์„

๋ณธ ๋ฌธ์„œ๋Š” AnomalyCLIP ๋ชจ๋ธ์˜ ์ทจ์•ฝ์  ๋ถ„์„ ๋ฐ ์ ๋Œ€์  ๊ณต๊ฒฉ(Adversarial Attacks)์— ๋Œ€ํ•œ ๋ฐฉ์–ด ๋Œ€์ฑ…์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ˆ˜๋ฆฝํ•˜๊ธฐ ์œ„ํ•ด ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์•ˆ์ •์„ฑ์„ ํ™•๋ณดํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฐ์ดํ„ฐ ๋ฐ ๋ชจ๋ธ ์ˆ˜์ค€์˜ ๋ฐฉ์–ด ์ „๋žต์„ ๊ตฌํ˜„ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

1. ์ทจ์•ฝ์  ๋ถ„์„

  • ** ์ ๋Œ€์  ๊ณต๊ฒฉ ์‹œ๋‚˜๋ฆฌ์˜ค**

  1. Adversarial Examples:
    • ์„ค๋ช…: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ์ž‘์€ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ์™œ๊ณก.
    • ์˜ˆ: ์ •์ƒ ์ด๋ฏธ์ง€๋ฅผ ๊ฒฐํ•จ ์ด๋ฏธ์ง€๋กœ ์˜ˆ์ธกํ•˜๋„๋ก ์œ ๋„.
  2. Data Poisoning:
    • ์„ค๋ช…: ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์•…์˜์  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฝ์ž…ํ•˜์—ฌ ๋ชจ๋ธ ํ•™์Šต์„ ์™œ๊ณก.
    • ์˜ˆ: ๋น„์ •์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒฝ์šฐ.
  3. Evasion Attacks:
    • ์„ค๋ช…: ์ถ”๋ก  ์‹œ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ๋ฅผ ์กฐ์ž‘.
    • ์˜ˆ: ๊ฒฐํ•จ ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ์œผ๋กœ ์˜ˆ์ธกํ•˜๋„๋ก ์œ ๋„.
  • ๋ชจ๋ธ ๋ฐ ๋ฐ์ดํ„ฐ์…‹ ์˜ํ–ฅ

    • ์„ฑ๋Šฅ ์ €ํ•˜: ์ ๋Œ€์  ์ƒ˜ํ”Œ ์ž…๋ ฅ ์‹œ ๋ชจ๋ธ์˜ ์ •ํ™•๋„ ๊ฐ์†Œ.
    • ๋ฌด๊ฒฐ์„ฑ ์†์ƒ: ๋ฐ์ดํ„ฐ ๋ณ€์กฐ๋กœ ์ธํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์‹ ๋ขฐ์„ฑ์„ ์ƒ์‹ค.
    • ์•…์˜์  ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ: ๋ชจ๋ธ์˜ ์˜์‚ฌ๊ฒฐ์ •์ด ์˜ค์ž‘๋™ํ•˜์—ฌ ์ƒ์‚ฐ ํ’ˆ์งˆ ๊ด€๋ฆฌ ์‹คํŒจ ๊ฐ€๋Šฅ์„ฑ ์ฆ๊ฐ€.

2. ๋Œ€์‘ ๋ฐฉ์•ˆ

  • ** ๋ฐ์ดํ„ฐ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…**

    1. ๋ฐ์ดํ„ฐ ์ •์ œ:
      • ํ๋ฆฟํ•˜๊ฑฐ๋‚˜ ์ž˜๋ฆฐ ์ด๋ฏธ์ง€ ์ œ๊ฑฐ.
      • ๋ฐ์ดํ„ฐ ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ ๋ฐ ๊ฒฐํ•จ ๋ณต๊ตฌ.
      • ๊ฒฐ๊ณผ: ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๊ฐ•ํ™”๋กœ ์ ๋Œ€์  ๋…ธ์ด์ฆˆ ํšจ๊ณผ ๊ฐ์†Œ.
    2. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(Data Augmentation):
      • ๋žœ๋ค ํšŒ์ „, ํฌ๊ธฐ ์กฐ์ •, ๋ฐ๊ธฐ ๋ฐ ๋Œ€๋น„ ์กฐ์ •.
      • Gaussian Noise ๋ฐ Salt-and-Pepper Noise ์ถ”๊ฐ€.
      • ๊ฒฐ๊ณผ: ๋ฐ์ดํ„ฐ ๋‹ค์–‘์„ฑ ํ™•๋ณด ๋ฐ ๋ชจ๋ธ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๊ฐ•ํ™”.
    3. ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ:
      • ๊ฐ ๋ฐ์ดํ„ฐ ํ•ด์‹œ๊ฐ’(MD5) ์ €์žฅ ๋ฐ ์œ„๋ณ€์กฐ ์—ฌ๋ถ€ ํ™•์ธ.
      • ๊ฒฐ๊ณผ: ๋ฐ์ดํ„ฐ์…‹ ์‹ ๋ขฐ์„ฑ ๋ฐ ๋ฌด๊ฒฐ์„ฑ ๋ณด์žฅ.
  • ๋ชจ๋ธ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…

    1. Adversarial Training:
      • FGSM ๊ธฐ๋ฐ˜์˜ ์ ๋Œ€์  ์ƒ˜ํ”Œ์„ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ.
      • ๊ฒฐ๊ณผ: ์ ๋Œ€์  ์ƒ˜ํ”Œ์—์„œ๋„ ํ‰๊ท  ์ •ํ™•๋„ 5% ํ–ฅ์ƒ.
    2. Gradient Masking:
      • ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์ˆจ๊ฒจ ๋ชจ๋ธ์ด ์ ๋Œ€์  ๊ณต๊ฒฉ์— ๋…ธ์ถœ๋˜์ง€ ์•Š๋„๋ก ๋ฐฉ์–ด.
    3. Temperature Scaling:
      • ๋ชจ๋ธ์˜ ์˜ˆ์ธก ํ™•๋ฅ ์„ ์กฐ์ •ํ•˜์—ฌ ์ ๋Œ€์  ์ƒ˜ํ”Œ ๋ฏผ๊ฐ๋„ ์™„ํ™”.
  • ์‹œ์Šคํ…œ ์ˆ˜์ค€ ๋ฐฉ์–ด ๋Œ€์ฑ…

    1. ์‹ค์‹œ๊ฐ„ ํƒ์ง€ ๋ฐ ๋Œ€์‘:
      • ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ด์ƒ ํŒจํ„ด์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํƒ์ง€ํ•˜๋Š” ์‹œ์Šคํ…œ ๊ตฌ์ถ•.
      • ๊ฒฐ๊ณผ: ์ ๋Œ€์  ๊ณต๊ฒฉ ๋ฐœ์ƒ ์‹œ ์ฆ‰๊ฐ์ ์ธ ๊ฒฝ๊ณ  ๋ฐ ๋Œ€์‘ ๊ฐ€๋Šฅ.
    2. ์ž๋™ํ™”๋œ ๋ฐฉ์–ด ๋„๊ตฌ:
      • Adversarial Examples ์ƒ์„ฑ ๋ฐ ๋ฐฉ์–ด ํ…Œ์ŠคํŠธ ์ž๋™ํ™”.

3. ์‹คํ—˜ ๊ฒฐ๊ณผ

  • ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ

    • ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ:
      • ์ •์ƒ ๋ฐ์ดํ„ฐ: 110๊ฑด
      • ๊ฒฐํ•จ ๋ฐ์ดํ„ฐ: 10๊ฑด
      • ์ ๋Œ€์  ๋ฐ์ดํ„ฐ(FGSM ๊ณต๊ฒฉ): 100๊ฑด
  • ์ฃผ์š” ์„ฑ๋Šฅ ์ง€ํ‘œ

๋ฉ”ํŠธ๋ฆญ ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ ์ ๋Œ€์  ๋ฐ์ดํ„ฐ ๋ณ€ํ™”์œจ
Accuracy 98% 92% -6%
F1 Score 0.935 0.91 -2.5%
False Positive 2% 5% +3%
False Negative 3% 7% +4%

4. ํ–ฅํ›„ ๊ณ„ํš

  1. ๋‹ค์–‘ํ•œ ๊ณต๊ฒฉ ๊ธฐ๋ฒ• ํ…Œ์ŠคํŠธ:
    • PGD, DeepFool ๋“ฑ ์ƒˆ๋กœ์šด ๊ณต๊ฒฉ ๊ธฐ๋ฒ• ์ ์šฉ ๋ฐ ํ‰๊ฐ€.
  2. ๋ชจ๋ธ ๊ฐœ์„ :
    • Contrastive Learning ๋ฐ ์•™์ƒ๋ธ” ํ•™์Šต์„ ํ†ตํ•œ ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™”.
  3. ์‹ค์‹œ๊ฐ„ ๋ฐฉ์–ด ์‹œ์Šคํ…œ ๊ตฌ์ถ•:
    • ๋ชจ๋ธ์˜ ์‹ค์‹œ๊ฐ„ ์˜ˆ์ธก ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์ ๋Œ€์  ์ž…๋ ฅ ํƒ์ง€ ๋ฐ ์ฐจ๋‹จ.

References

  • AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection [github]
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for quandao92/clip-based-anomaly-detection

Finetuned
(67)
this model

Dataset used to train quandao92/clip-based-anomaly-detection