File size: 2,881 Bytes
2be27ab
 
 
 
 
 
 
 
 
 
 
1fd9ec2
 
7aab80c
1fd9ec2
ef7b203
1fd9ec2
7aab80c
1fd9ec2
ef7b203
1fd9ec2
ef7b203
 
 
 
1fd9ec2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef7b203
1fd9ec2
ef7b203
 
 
 
7aab80c
ef7b203
 
 
 
 
7aab80c
ef7b203
 
e2292e9
 
 
ef7b203
 
 
7aab80c
ef7b203
 
 
 
 
 
 
 
7aab80c
ef7b203
 
 
 
 
 
7aab80c
ef7b203
 
 
 
 
 
 
7aab80c
ef7b203
 
 
 
 
7aab80c
1fd9ec2
44f740c
1fd9ec2
44f740c
 
7aab80c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: Amazon E-commerce Visual Assistant
emoji: 🛍️
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.28.0"
app_file: amazon_app.py
pinned: false
---

# Amazon E-commerce Visual Assistant

A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions.

## Project Overview

This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries.

## Project Structure

- `amazon_app.py`: Main Streamlit application
- `model.py`: Core AI model implementations
- `Vision_AI.ipynb`: EDA, Embedding Model, LLM
- `requirements.txt`: Project dependencies

## Setup and Installation

1. Clone the repository:
```bash
git clone https://github.com/wisdom196473/amazon-multimodal-product-assistant.git
cd amazon-multimodal-product-assistant
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
streamlit run amazon_app.py
```

## Technical Architecture

### Data Processing & Storage
- Standardized text fields and normalized numeric attributes
- Enhanced metadata indices for categories, price ranges, keywords, brands
- Validated image quality and managed duplicates
- Structured data storage in Parquet format

### Model Components
- **Vision-Language Integration**: FashionCLIP for multimodal embedding generation
- **Vector Search**: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
- **Language Model**: Mistral-7B with 4-bit quantization
- **RAG Framework**: Context-enhanced response generation

### Performance Metrics

#### FahisonClip Embedding Model

- Recall@1: 0.6385
- Recall@10: 0.9008
- Precision@1: 0.6385
- NDCG@10: 0.7725

## Implementation Details

### Core Features
- Text and image-based product search
- Product comparisons and recommendations
- Visual product recognition
- Detailed product information retrieval
- Price analysis and comparison

### Technologies Used
- FashionCLIP for visual understanding
- Mistral-7B Language Model (4-bit quantized)
- FAISS for similarity search
- Google Vertex AI for vector storage
- Streamlit for user interface

## Challenges & Solutions

### Technical Challenges Addressed
- Image processing with varying quality
- GPU memory optimization
- Efficient embedding storage
- Query response accuracy

### Implemented Solutions
- Robust image validation pipeline
- 4-bit model quantization
- Optimized batch processing
- Enhanced metadata enrichment

## Future Directions

- [ ] Fine-Tune FashionClip embedding model based on the specific domain data
- [ ] Fine-Tune large language model to improve its generalization capabilities
- [ ] Develop feedback loops for continuous improvement