Spaces:
Build error
Build error
File size: 2,881 Bytes
2be27ab 1fd9ec2 7aab80c 1fd9ec2 ef7b203 1fd9ec2 7aab80c 1fd9ec2 ef7b203 1fd9ec2 ef7b203 1fd9ec2 ef7b203 1fd9ec2 ef7b203 7aab80c ef7b203 7aab80c ef7b203 e2292e9 ef7b203 7aab80c ef7b203 7aab80c ef7b203 7aab80c ef7b203 7aab80c ef7b203 7aab80c 1fd9ec2 44f740c 1fd9ec2 44f740c 7aab80c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
title: Amazon E-commerce Visual Assistant
emoji: 🛍️
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.28.0"
app_file: amazon_app.py
pinned: false
---
# Amazon E-commerce Visual Assistant
A multimodal AI assistant leveraging the Amazon Product Dataset 2020 to provide comprehensive product search and recommendations through natural language and image-based interactions.
## Project Overview
This conversational AI system combines advanced language and vision models to enhance e-commerce customer support, enabling accurate, context-aware responses to product-related queries.
## Project Structure
- `amazon_app.py`: Main Streamlit application
- `model.py`: Core AI model implementations
- `Vision_AI.ipynb`: EDA, Embedding Model, LLM
- `requirements.txt`: Project dependencies
## Setup and Installation
1. Clone the repository:
```bash
git clone https://github.com/wisdom196473/amazon-multimodal-product-assistant.git
cd amazon-multimodal-product-assistant
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
streamlit run amazon_app.py
```
## Technical Architecture
### Data Processing & Storage
- Standardized text fields and normalized numeric attributes
- Enhanced metadata indices for categories, price ranges, keywords, brands
- Validated image quality and managed duplicates
- Structured data storage in Parquet format
### Model Components
- **Vision-Language Integration**: FashionCLIP for multimodal embedding generation
- **Vector Search**: FAISS with hybrid retrieval combining embedding similarity and metadata filtering
- **Language Model**: Mistral-7B with 4-bit quantization
- **RAG Framework**: Context-enhanced response generation
### Performance Metrics
#### FahisonClip Embedding Model
- Recall@1: 0.6385
- Recall@10: 0.9008
- Precision@1: 0.6385
- NDCG@10: 0.7725
## Implementation Details
### Core Features
- Text and image-based product search
- Product comparisons and recommendations
- Visual product recognition
- Detailed product information retrieval
- Price analysis and comparison
### Technologies Used
- FashionCLIP for visual understanding
- Mistral-7B Language Model (4-bit quantized)
- FAISS for similarity search
- Google Vertex AI for vector storage
- Streamlit for user interface
## Challenges & Solutions
### Technical Challenges Addressed
- Image processing with varying quality
- GPU memory optimization
- Efficient embedding storage
- Query response accuracy
### Implemented Solutions
- Robust image validation pipeline
- 4-bit model quantization
- Optimized batch processing
- Enhanced metadata enrichment
## Future Directions
- [ ] Fine-Tune FashionClip embedding model based on the specific domain data
- [ ] Fine-Tune large language model to improve its generalization capabilities
- [ ] Develop feedback loops for continuous improvement
|