pandasai_chart / details.txt
srivatsavdamaraju's picture
Create details.txt
f3125e4 verified
# Enhanced PandasAI Data Analysis with Groq
## πŸ“‹ Table of Contents
1. [Overview](#overview)
2. [Features](#features)
3. [Prerequisites](#prerequisites)
4. [Installation](#installation)
5. [Configuration](#configuration)
6. [Usage Guide](#usage-guide)
7. [API Keys Setup](#api-keys-setup)
8. [File Structure](#file-structure)
9. [Dependencies](#dependencies)
10. [Troubleshooting](#troubleshooting)
11. [Examples](#examples)
12. [Contributing](#contributing)
13. [License](#license)
## 🎯 Overview
Enhanced PandasAI Data Analysis is a powerful web application that combines the capabilities of PandasAI with Groq's language models to provide intelligent data analysis and visualization. The application features separate query processing and chart generation with smart feasibility analysis.
### Key Capabilities:
- **Intelligent Data Analysis**: Ask natural language questions about your CSV data
- **Smart Chart Generation**: Generate visualizations only when appropriate
- **Feasibility Analysis**: Automatic assessment of whether queries can be visualized
- **Interactive Web Interface**: User-friendly Gradio-based interface
- **Multi-format Support**: Handles various data types and structures
## ✨ Features
### Core Features:
- **Separated Query Processing**: Analyze data without generating unnecessary charts
- **Smart Chart Detection**: Automatically determines if a query can be visualized
- **Chart Feasibility Analysis**: Provides reasoning and recommendations for visualizations
- **Multiple Chart Types**: Supports bar charts, line plots, scatter plots, pie charts, histograms
- **Real-time Processing**: Instant analysis and visualization generation
- **Error Handling**: Comprehensive error management and user feedback
### Advanced Features:
- **Data Persistence**: Keeps data loaded between queries for efficiency
- **Automatic Chart Cleanup**: Removes old visualization files automatically
- **Query Type Detection**: Identifies statistical, comparative, and analytical queries
- **Recommendation Engine**: Suggests appropriate visualization types
- **Reset Functionality**: Easy data reset for new file uploads
## πŸ“‹ Prerequisites
### System Requirements:
- **Python**: 3.8 or higher
- **Operating System**: Windows, macOS, or Linux
- **Memory**: Minimum 4GB RAM (8GB recommended)
- **Storage**: At least 1GB free space for dependencies
### Required Accounts:
- **Groq API Account**: For language model access
- **Internet Connection**: Required for API calls and package installation
## πŸ”§ Installation
### Step 1: Clone or Download the Code
```bash
# If using Git
git clone <repository-url>
cd enhanced-pandasai
# Or download and extract the files manually
```
### Step 2: Create Virtual Environment (Recommended)
```bash
# Create virtual environment
python -m venv pandasai_env
# Activate virtual environment
# On Windows:
pandasai_env\Scripts\activate
# On macOS/Linux:
source pandasai_env/bin/activate
```
### Step 3: Install Dependencies
```bash
# Install all required packages
pip install -r requirements.txt
# Or install manually:
pip install gradio pandas matplotlib pandasai langchain-groq python-dotenv
```
### Step 4: Verify Installation
```python
# Test import (run in Python)
python -c "import gradio, pandas, pandasai; print('All packages installed successfully!')"
```
## βš™οΈ Configuration
### Environment Variables Setup
#### Option 1: Using .env file (Recommended)
1. Create a `.env` file in the project root:
```
GROQ_API_KEY=your_actual_groq_api_key_here
```
2. Replace the hardcoded API key in the code:
```python
# Replace this line in the code:
GROQ_API_KEY = "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"
# With this:
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
```
#### Option 2: Direct Code Modification
Replace the API key directly in the code:
```python
GROQ_API_KEY = "your_actual_groq_api_key_here"
```
### Server Configuration
Modify these settings in the `demo.launch()` section:
```python
demo.launch(
server_name="0.0.0.0", # Change to "127.0.0.1" for local only
server_port=7860, # Change port if needed
share=False # Set to True for public access
)
```
## πŸ“– Usage Guide
### Starting the Application
```bash
# Navigate to project directory
cd enhanced-pandasai
# Activate virtual environment (if using)
source pandasai_env/bin/activate # macOS/Linux
# or
pandasai_env\Scripts\activate # Windows
# Run the application
python app.py
```
### Web Interface Access
- **Local URL**: http://localhost:7860
- **Network URL**: http://your-ip-address:7860 (if server_name="0.0.0.0")
### Step-by-Step Workflow
#### Step 1: Upload Data
1. Click "Upload CSV File"
2. Select your CSV file
3. Wait for upload confirmation
#### Step 2: Analyze Data
1. Enter your query in the text box
2. Click "πŸ” Analyze Query"
3. Review the analysis result
4. Check chart feasibility analysis
5. Read chart recommendations
#### Step 3: Generate Visualizations (Optional)
1. If chart is recommended, click "πŸ“Š Generate Chart"
2. Wait for chart generation
3. View the generated visualization
4. Check generation status
#### Step 4: Reset for New Data
1. Click "πŸ”„ Reset Data" to clear current data
2. Upload a new CSV file
3. Repeat the process
## πŸ”‘ API Keys Setup
### Getting a Groq API Key
1. **Visit Groq Console**: Go to https://console.groq.com
2. **Create Account**: Sign up or log in
3. **Generate API Key**:
- Navigate to API Keys section
- Click "Create API Key"
- Copy the generated key
4. **Configure in Application**:
- Replace the hardcoded key in the code
- Or use environment variables (recommended)
### API Key Security Best Practices
- **Never commit API keys to version control**
- **Use environment variables for production**
- **Rotate keys regularly**
- **Monitor API usage and billing**
- **Restrict key permissions if possible**
## πŸ“ File Structure
```
enhanced-pandasai/
β”œβ”€β”€ app.py # Main application file
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # Environment variables (create this)
β”œβ”€β”€ .gitignore # Git ignore file
β”œβ”€β”€ README.txt # This documentation
β”œβ”€β”€ examples/ # Example CSV files
β”‚ β”œβ”€β”€ sample_sales.csv
β”‚ β”œβ”€β”€ sample_population.csv
β”‚ └── sample_financial.csv
β”œβ”€β”€ docs/ # Additional documentation
β”‚ β”œβ”€β”€ api_reference.md
β”‚ └── troubleshooting.md
└── temp/ # Temporary files (auto-created)
```
## πŸ“¦ Dependencies
### Core Dependencies
```
gradio>=4.0.0 # Web interface framework
pandas>=1.5.0 # Data manipulation library
matplotlib>=3.6.0 # Plotting library
pandasai>=1.5.0 # AI-powered data analysis
langchain-groq>=0.1.0 # Groq integration for LangChain
python-dotenv>=1.0.0 # Environment variable loading
```
### Optional Dependencies
```
seaborn>=0.12.0 # Enhanced statistical visualizations
plotly>=5.15.0 # Interactive plots
numpy>=1.24.0 # Numerical computing
scikit-learn>=1.3.0 # Machine learning (if needed)
```
### System Dependencies
- **Python 3.8+**: Core runtime
- **pip**: Package manager
- **virtualenv**: Virtual environment (recommended)
## πŸ”§ Troubleshooting
### Common Issues and Solutions
#### 1. Import Errors
**Problem**: `ModuleNotFoundError: No module named 'xxx'`
**Solution**:
```bash
pip install --upgrade pip
pip install -r requirements.txt
```
#### 2. API Key Issues
**Problem**: `Invalid API key` or authentication errors
**Solution**:
- Verify API key is correct
- Check environment variable setup
- Ensure API key has proper permissions
- Try regenerating the API key
#### 3. File Upload Issues
**Problem**: CSV files not uploading or processing
**Solution**:
- Ensure CSV file is properly formatted
- Check file size (should be reasonable)
- Verify CSV has headers
- Try different CSV encoding (UTF-8 recommended)
#### 4. Chart Generation Failures
**Problem**: Charts not generating despite recommendations
**Solution**:
- Check if query is suitable for visualization
- Ensure data has numeric columns for plotting
- Try simpler queries first
- Check temporary directory permissions
#### 5. Port Already in Use
**Problem**: `Address already in use` error
**Solution**:
```python
# Change port in code
demo.launch(server_port=7861) # Try different port
```
#### 6. Memory Issues
**Problem**: Application crashes with large datasets
**Solution**:
- Use smaller CSV files for testing
- Increase system memory
- Process data in chunks if possible
### Debug Mode
Enable debug mode for detailed error information:
```python
# Add this to the beginning of app.py
import logging
logging.basicConfig(level=logging.DEBUG)
# Launch with debug
demo.launch(debug=True)
```
## πŸ“Š Examples
### Example 1: Sales Data Analysis
**CSV Structure**:
```csv
Region,Product,Sales,Quantity
North,Widget A,1000,50
South,Widget B,1500,75
East,Widget A,1200,60
West,Widget B,1800,90
```
**Sample Queries**:
- "Which region has the highest sales?"
- "Show total sales by product"
- "Create a bar chart of sales by region"
### Example 2: Population Data Analysis
**CSV Structure**:
```csv
Country,Population,GDP,Area
USA,331000000,21000000,9834000
China,1440000000,14000000,9597000
India,1380000000,3000000,3287000
```
**Sample Queries**:
- "Which are the top 3 countries by population?"
- "What's the relationship between GDP and population?"
- "Create a scatter plot of GDP vs Population"
### Example 3: Time Series Data
**CSV Structure**:
```csv
Date,Value,Category
2023-01-01,100,A
2023-01-02,105,A
2023-01-03,98,B
2023-01-04,112,B
```
**Sample Queries**:
- "Show the trend over time"
- "Compare categories A and B"
- "Create a line plot of values over time"
## 🀝 Contributing
### How to Contribute
1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature-name`
3. **Make changes and test thoroughly**
4. **Commit changes**: `git commit -m "Add feature description"`
5. **Push to branch**: `git push origin feature-name`
6. **Submit a pull request**
### Contribution Guidelines
- Follow Python PEP 8 style guidelines
- Add docstrings to new functions
- Include error handling for new features
- Test with various CSV formats
- Update documentation for new features
### Reporting Issues
When reporting issues, please include:
- Python version
- Operating system
- Error messages (full traceback)
- Steps to reproduce
- Sample data (if applicable)
## πŸ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
### Third-Party Licenses
- **Gradio**: Apache License 2.0
- **Pandas**: BSD 3-Clause License
- **Matplotlib**: License based on Python Software Foundation License
- **PandasAI**: MIT License
- **LangChain**: MIT License
## πŸ“ž Support
### Getting Help
- **Documentation**: Check this README and inline code comments
- **Issues**: Report bugs via GitHub issues
- **Community**: Join discussions in project forums
- **API Documentation**: Refer to Groq and PandasAI official docs
### Contact Information
- **Project Maintainer**: [Your Name/Organization]
- **Email**: [Your Email]
- **GitHub**: [Your GitHub Profile]
---
## πŸš€ Quick Start Commands
```bash
# Complete setup in one go
git clone <repository-url>
cd enhanced-pandasai
python -m venv pandasai_env
source pandasai_env/bin/activate # Linux/Mac
# or pandasai_env\Scripts\activate # Windows
pip install -r requirements.txt
# Edit API key in app.py
python app.py
# Open http://localhost:7860
```
---
**Last Updated**: [Current Date]
**Version**: 1.0.0
**Compatibility**: Python 3.8+, All major operating systems