Spaces:

airabbitX
/

mongo-vector-search-util

Running

App Files Files Community

mongo-vector-search-util / README.md

airabbitX's picture

Update README.md

787933d verified 5 months ago

|

history blame contribute delete

2.82 kB

A newer version of the Gradio SDK is available: 5.35.0

Upgrade

metadata

license: agpl-3.0
sdk: gradio

Vector Search Demo App

This is a Gradio web application that demonstrates vector search capabilities using MongoDB Atlas and OpenAI embeddings.

Prerequisites

MongoDB Atlas account with vector search enabled
OpenAI API key
Python 3.8+
Sample movie data loaded in MongoDB Atlas (sample_mflix database)

Setup

Clone this repository
Install dependencies:

pip install -r requirements.txt

Set up environment variables:

export OPENAI_API_KEY="your-openai-api-key"
export ATLAS_URI="your-mongodb-atlas-connection-string"

Ensure your MongoDB Atlas setup:

Database name: sample_mflix
Collection: embedded_movies
Vector search index: idx_plot_embedding
Index configuration:

{
  "fields": [
    {
      "type": "vector",
      "path": "plot_embedding",
      "numDimensions": 1536,
      "similarity": "dotProduct"
    }
  ]
}

Running the App

Start the application:

python app.py

The app will be available at http://localhost:7860

Usage

Generating Embeddings

Select your database and collection from the dropdowns
Choose the field to generate embeddings for
Specify the embedding field name (defaults to "embedding")
Set a document limit (0 for all documents)
Click "Generate Embeddings" to start processing

The app uses memory-efficient cursor-based batch processing that can handle large collections:

Documents are processed in batches (default 20 documents per batch)
Memory usage is optimized through cursor-based iteration
Real-time progress tracking shows completed/total documents
Supports processing of large collections (100,000+ documents)
Automatically resumes from where it left off if embeddings already exist

Searching

Enter a natural language query in the text box (e.g., "humans fighting aliens")
Click "Submit" to search
View the results showing matching documents with their similarity scores

Example Queries

"humans fighting aliens"
"relationship drama between two good friends"
"comedy about family vacation"
"detective solving mysterious murder"

Performance Notes

The application is optimized for handling large datasets:

Uses cursor-based batch processing to avoid memory issues
Processes documents in configurable batch sizes (default: 20)
Implements parallel processing with ThreadPoolExecutor
Provides real-time progress tracking
Automatically handles memory cleanup during processing
Supports resuming interrupted operations

Notes

The search uses OpenAI's text-embedding-ada-002 model to create embeddings
Results are limited to top 5 matches
Similarity scores range from 0 to 1, with higher scores indicating better matches