lenamerkli commited on
Commit
813b26c
·
verified ·
1 Parent(s): c2426a1

Add readme

Browse files
Files changed (2) hide show
  1. README.md +77 -0
  2. requirements.txt +146 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ pipeline_tag: image-to-text
5
+ ---
6
+ # Ingredient Scanner
7
+ ## Abstract
8
+
9
+ With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.
10
+
11
+ The training code and data is available on [GitHub](https://github.com/lenamerkli/ingredient-scanner/). This repository just contains an inference example and the [report](https://huggingface.co/lenamerkli/ingredient-scanner/blob/main/ingredient-scanner.pdf).
12
+
13
+ This is an entry for the [2024 Swiss AI competition](https://www.ki-wettbewerb.ch/).
14
+
15
+ ## Table of Contents
16
+
17
+ 0. [Abstract](#abstract)
18
+ 1. [Report](#report)
19
+ 2. [Model Details](#model-details)
20
+ 3. [Usage](#usage)
21
+ 4. [Citation](#citation)
22
+
23
+ ## Report
24
+ Read the full report [here](https://huggingface.co/lenamerkli/ingredient-scanner/blob/main/ingredient-scanner.pdf).
25
+
26
+ ## Model Details
27
+ This repository consists of two models, one vision model and a large language model.
28
+
29
+ ### Vision Model
30
+ Custom convolutional neural network based on [ResNet18](https://pytorch.org/hub/pytorch_vision_resnet/). It detects the four corner points and the upper and lower limits of a product.
31
+
32
+ ### Language Model
33
+ Converts the text from the optical character recognition engine which lies in-between the two models to JSON. It is fine-tuned from [unsloth/Qwen2-0.5B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Qwen2-0.5B-Instruct-bnb-4bit).
34
+
35
+ ## Usage
36
+ Clone the repository and install the dependencies on any debian-based system:
37
+ ```bash
38
+ git clone https://huggingface.co/lenamerkli/ingredient-scanner
39
+ cd ingredient-scanner
40
+ python3 -m venv .venv
41
+ source .venv/bin/activate
42
+ pip3 install -r requirements.txt
43
+ ```
44
+ Note: not all requirements are needed for inference, as both training and inference requirements are listed.
45
+
46
+ Select the OCR engine in `main.py` by uncommenting one of the lines 20 to 22:
47
+ ```python
48
+ # ENGINE: list[str] = ['easyocr']
49
+ # ENGINE: list[str] = ['anthropic', 'claude-3-5-sonnet-20240620']
50
+ # ENGINE: list[str] = ['llama_cpp/v2/vision', 'qwen-vl-next_b2583']
51
+ ```
52
+ Note: Qwen-VL-Next is not an official qwen model. This is only to protect business secrets of a private model.
53
+
54
+ Run the inference script:
55
+ ```bash
56
+ python3 main.py
57
+ ```
58
+
59
+ You will be asked to enter the file path to a PNG image.
60
+
61
+ ### Anthropic API
62
+
63
+ If you want to use the Anthropic API, create a `.env` file with the following content:
64
+ ```
65
+ ANTHROPIC_API_KEY=YOUR_API_KEY
66
+ ```
67
+
68
+ ## Citation
69
+ Here is how to cite this paper in the bibtex format:
70
+ ```bibtex
71
+ @misc{merkli2024ingriedient-scanner,
72
+ title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
73
+ author={Lena Merkli and Sonja Merkli},
74
+ date={2024-07-16},
75
+ url={https://huggingface.co/lenamerkli/ingredient-scanner},
76
+ }
77
+ ```
requirements.txt ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ accelerate==0.32.1
2
+ aiohttp==3.9.5
3
+ aiosignal==1.3.1
4
+ astroid==3.2.2
5
+ asttokens==2.4.1
6
+ attrs==23.2.0
7
+ bitsandbytes==0.43.1
8
+ blinker==1.8.2
9
+ certifi==2024.7.4
10
+ cffi==1.16.0
11
+ charset-normalizer==3.3.2
12
+ click==8.1.7
13
+ colorama==0.4.6
14
+ coloredlogs==15.0.1
15
+ contourpy==1.2.1
16
+ cycler==0.12.1
17
+ datasets==2.20.0
18
+ dill==0.3.8
19
+ diskcache==5.6.3
20
+ docstring_parser==0.16
21
+ docutils==0.21.2
22
+ easyocr==1.7.1
23
+ einops==0.8.0
24
+ ffmpeg-python==0.2.0
25
+ filelock==3.13.1
26
+ Flask==3.0.3
27
+ fonttools==4.53.0
28
+ frozenlist==1.4.1
29
+ fsspec==2024.2.0
30
+ future==1.0.0
31
+ graphviz==0.20.3
32
+ h11==0.14.0
33
+ huggingface-hub==0.23.4
34
+ humanfriendly==10.0
35
+ idna==3.7
36
+ imageio==2.34.1
37
+ intel-openmp==2021.4.0
38
+ isort==5.13.2
39
+ itsdangerous==2.2.0
40
+ jedi==0.19.1
41
+ Jinja2==3.1.3
42
+ kiwisolver==1.4.5
43
+ lazy_loader==0.4
44
+ llama_cpp_python==0.2.82
45
+ markdown-it-py==3.0.0
46
+ MarkupSafe==2.1.5
47
+ matplotlib==3.9.0
48
+ mccabe==0.7.0
49
+ mdurl==0.1.2
50
+ mkl==2021.4.0
51
+ mpmath==1.3.0
52
+ multidict==6.0.5
53
+ multiprocess==0.70.16
54
+ mypy==1.10.0
55
+ mypy-extensions==1.0.0
56
+ networkx==3.2.1
57
+ ninja==1.11.1.1
58
+ numpy==1.26.3
59
+ nvidia-cublas-cu12==12.1.3.1
60
+ nvidia-cuda-cupti-cu12==12.1.105
61
+ nvidia-cuda-nvrtc-cu12==12.1.105
62
+ nvidia-cuda-runtime-cu12==12.1.105
63
+ nvidia-cudnn-cu12==8.9.2.26
64
+ nvidia-cufft-cu12==11.0.2.54
65
+ nvidia-curand-cu12==10.3.2.106
66
+ nvidia-cusolver-cu12==11.4.5.107
67
+ nvidia-cusparse-cu12==12.1.0.106
68
+ nvidia-nccl-cu12==2.20.5
69
+ nvidia-nvjitlink-cu12==12.1.105
70
+ nvidia-nvtx-cu12==12.1.105
71
+ nvidia-pyindex==1.0.9
72
+ opencv-python==4.10.0.84
73
+ opencv-python-headless==4.10.0.84
74
+ optimum==1.20.0
75
+ outcome==1.3.0.post0
76
+ packaging==24.1
77
+ pandas==2.2.2
78
+ parso==0.8.4
79
+ peft==0.11.1
80
+ pillow==10.2.0
81
+ pip==24.1
82
+ platformdirs==4.2.2
83
+ protobuf==5.27.1
84
+ psutil==6.0.0
85
+ pyarrow==16.1.0
86
+ pyarrow-hotfix==0.6
87
+ pyclipper==1.3.0.post5
88
+ pycparser==2.22
89
+ pylint==3.2.2
90
+ pyparsing==3.1.2
91
+ pyreadline3==3.4.1
92
+ pyserial==3.5
93
+ PySocks==1.7.1
94
+ python-bidi==0.4.2
95
+ python-dateutil==2.9.0.post0
96
+ python-dotenv==1.0.1
97
+ pytz==2024.1
98
+ PyYAML==6.0.1
99
+ Pygments==2.18.0
100
+ regex==2024.5.15
101
+ requests==2.32.3
102
+ rich==13.7.1
103
+ safetensors==0.4.3
104
+ scikit-image==0.24.0
105
+ scipy==1.13.1
106
+ selenium==4.22.0
107
+ Send2Trash==1.8.3
108
+ sentencepiece==0.2.0
109
+ setuptools==66.1.1
110
+ shapely==2.0.4
111
+ shtab==1.7.1
112
+ six==1.16.0
113
+ sniffio==1.3.1
114
+ sortedcontainers==2.4.0
115
+ sympy==1.12
116
+ tbb==2021.13.0
117
+ thonny==4.1.4
118
+ tifffile==2024.6.18
119
+ tiktoken==0.7.0
120
+ timm==1.0.7
121
+ tk==0.1.0
122
+ tokenizers==0.19.1
123
+ tomlkit==0.12.5
124
+ torch==2.3.0+cu121
125
+ torchaudio==2.3.0+cu121
126
+ torchvision==0.18.0+cu121
127
+ torchviz==0.0.2
128
+ tqdm==4.66.4
129
+ transformers==4.42.3
130
+ transformers-stream-generator==0.0.5
131
+ trio==0.25.1
132
+ trio-websocket==0.11.1
133
+ triton==2.3.0
134
+ trl==0.8.6
135
+ typing_extensions==4.9.0
136
+ tyro==0.8.5
137
+ tzdata==2024.1
138
+ unsloth @ git+https://github.com/unslothai/unsloth.git@5ab565fb2c811d0b85d68dadd2ac1b32dee05e8b
139
+ urllib3==2.2.2
140
+ websocket-client==1.8.0
141
+ Werkzeug==3.0.3
142
+ wheel==0.43.0
143
+ wsproto==1.2.0
144
+ xformers==0.0.26.post1
145
+ xxhash==3.4.1
146
+ yarl==1.9.4