lenamerkli
/

ingredient-scanner

+---
+language:
+- de
+pipeline_tag: image-to-text
+---
+# Ingredient Scanner
+## Abstract
+With the recent advancements in computer vision and optical character recognition and using a convolutional neural network to cut out the product from a picture, it has now become possible to reliably extract ingredient lists from the back of a product using the Anthropic API. Open-weight or even only on-device optical character recognition lacks the quality to be used in a production environment, although the progress in development is promising. The Anthropic API is also currently not feasible due to the high cost of 1 Swiss Franc per 100 pictures.
+The training code and data is available on [GitHub](https://github.com/lenamerkli/ingredient-scanner/). This repository just contains an inference example and the [report](https://huggingface.co/lenamerkli/ingredient-scanner/blob/main/ingredient-scanner.pdf).
+This is an entry for the [2024 Swiss AI competition](https://www.ki-wettbewerb.ch/).
+## Table of Contents
+0. [Abstract](#abstract)
+1. [Report](#report)
+2. [Model Details](#model-details)
+3. [Usage](#usage)
+4. [Citation](#citation)
+## Report
+Read the full report [here](https://huggingface.co/lenamerkli/ingredient-scanner/blob/main/ingredient-scanner.pdf).
+## Model Details
+This repository consists of two models, one vision model and a large language model.
+### Vision Model
+Custom convolutional neural network based on [ResNet18](https://pytorch.org/hub/pytorch_vision_resnet/). It detects the four corner points and the upper and lower limits of a product.
+### Language Model
+Converts the text from the optical character recognition engine which lies in-between the two models to JSON. It is fine-tuned from [unsloth/Qwen2-0.5B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Qwen2-0.5B-Instruct-bnb-4bit).
+## Usage
+Clone the repository and install the dependencies on any debian-based system:
+```bash
+git clone https://huggingface.co/lenamerkli/ingredient-scanner
+cd ingredient-scanner
+python3 -m venv .venv
+source .venv/bin/activate
+pip3 install -r requirements.txt
+```
+Note: not all requirements are needed for inference, as both training and inference requirements are listed.
+Select the OCR engine in `main.py` by uncommenting one of the lines 20 to 22:
+```python
+# ENGINE: list[str] = ['easyocr']
+# ENGINE: list[str] = ['anthropic', 'claude-3-5-sonnet-20240620']
+# ENGINE: list[str] = ['llama_cpp/v2/vision', 'qwen-vl-next_b2583']
+```
+Note: Qwen-VL-Next is not an official qwen model. This is only to protect business secrets of a private model.
+Run the inference script:
+```bash
+python3 main.py
+```
+You will be asked to enter the file path to a PNG image.
+### Anthropic API
+If you want to use the Anthropic API, create a `.env` file with the following content:
+```
+ANTHROPIC_API_KEY=YOUR_API_KEY
+```
+## Citation
+Here is how to cite this paper in the bibtex format:
+```bibtex
+@misc{merkli2024ingriedient-scanner,
+    title={Ingredient Scanner: Automating Reading of Ingredient Labels with Computer Vision},
+    author={Lena Merkli and Sonja Merkli},
+    date={2024-07-16},
+    url={https://huggingface.co/lenamerkli/ingredient-scanner},
+}
+```

requirements.txt ADDED Viewed

	@@ -0,0 +1,146 @@

+accelerate==0.32.1
+aiohttp==3.9.5
+aiosignal==1.3.1
+astroid==3.2.2
+asttokens==2.4.1
+attrs==23.2.0
+bitsandbytes==0.43.1
+blinker==1.8.2
+certifi==2024.7.4
+cffi==1.16.0
+charset-normalizer==3.3.2
+click==8.1.7
+colorama==0.4.6
+coloredlogs==15.0.1
+contourpy==1.2.1
+cycler==0.12.1
+datasets==2.20.0
+dill==0.3.8
+diskcache==5.6.3
+docstring_parser==0.16
+docutils==0.21.2
+easyocr==1.7.1
+einops==0.8.0
+ffmpeg-python==0.2.0
+filelock==3.13.1
+Flask==3.0.3
+fonttools==4.53.0
+frozenlist==1.4.1
+fsspec==2024.2.0
+future==1.0.0
+graphviz==0.20.3
+h11==0.14.0
+huggingface-hub==0.23.4
+humanfriendly==10.0
+idna==3.7
+imageio==2.34.1
+intel-openmp==2021.4.0
+isort==5.13.2
+itsdangerous==2.2.0
+jedi==0.19.1
+Jinja2==3.1.3
+kiwisolver==1.4.5
+lazy_loader==0.4
+llama_cpp_python==0.2.82
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+matplotlib==3.9.0
+mccabe==0.7.0
+mdurl==0.1.2
+mkl==2021.4.0
+mpmath==1.3.0
+multidict==6.0.5
+multiprocess==0.70.16
+mypy==1.10.0
+mypy-extensions==1.0.0
+networkx==3.2.1
+ninja==1.11.1.1
+numpy==1.26.3
+nvidia-cublas-cu12==12.1.3.1
+nvidia-cuda-cupti-cu12==12.1.105
+nvidia-cuda-nvrtc-cu12==12.1.105
+nvidia-cuda-runtime-cu12==12.1.105
+nvidia-cudnn-cu12==8.9.2.26
+nvidia-cufft-cu12==11.0.2.54
+nvidia-curand-cu12==10.3.2.106
+nvidia-cusolver-cu12==11.4.5.107
+nvidia-cusparse-cu12==12.1.0.106
+nvidia-nccl-cu12==2.20.5
+nvidia-nvjitlink-cu12==12.1.105
+nvidia-nvtx-cu12==12.1.105
+nvidia-pyindex==1.0.9
+opencv-python==4.10.0.84
+opencv-python-headless==4.10.0.84
+optimum==1.20.0
+outcome==1.3.0.post0
+packaging==24.1
+pandas==2.2.2
+parso==0.8.4
+peft==0.11.1
+pillow==10.2.0
+pip==24.1
+platformdirs==4.2.2
+protobuf==5.27.1
+psutil==6.0.0
+pyarrow==16.1.0
+pyarrow-hotfix==0.6
+pyclipper==1.3.0.post5
+pycparser==2.22
+pylint==3.2.2
+pyparsing==3.1.2
+pyreadline3==3.4.1
+pyserial==3.5
+PySocks==1.7.1
+python-bidi==0.4.2
+python-dateutil==2.9.0.post0
+python-dotenv==1.0.1
+pytz==2024.1
+PyYAML==6.0.1
+Pygments==2.18.0
+regex==2024.5.15
+requests==2.32.3
+rich==13.7.1
+safetensors==0.4.3
+scikit-image==0.24.0
+scipy==1.13.1
+selenium==4.22.0
+Send2Trash==1.8.3
+sentencepiece==0.2.0
+setuptools==66.1.1
+shapely==2.0.4
+shtab==1.7.1
+six==1.16.0
+sniffio==1.3.1
+sortedcontainers==2.4.0
+sympy==1.12
+tbb==2021.13.0
+thonny==4.1.4
+tifffile==2024.6.18
+tiktoken==0.7.0
+timm==1.0.7
+tk==0.1.0
+tokenizers==0.19.1
+tomlkit==0.12.5
+torch==2.3.0+cu121
+torchaudio==2.3.0+cu121
+torchvision==0.18.0+cu121
+torchviz==0.0.2
+tqdm==4.66.4
+transformers==4.42.3
+transformers-stream-generator==0.0.5
+trio==0.25.1
+trio-websocket==0.11.1
+triton==2.3.0
+trl==0.8.6
+typing_extensions==4.9.0
+tyro==0.8.5
+tzdata==2024.1
+unsloth @ git+https://github.com/unslothai/unsloth.git@5ab565fb2c811d0b85d68dadd2ac1b32dee05e8b
+urllib3==2.2.2
+websocket-client==1.8.0
+Werkzeug==3.0.3
+wheel==0.43.0
+wsproto==1.2.0
+xformers==0.0.26.post1
+xxhash==3.4.1
+yarl==1.9.4