Spaces:
Sleeping
Sleeping
# Enhanced PandasAI Data Analysis with Groq | |
## π Table of Contents | |
1. [Overview](#overview) | |
2. [Features](#features) | |
3. [Prerequisites](#prerequisites) | |
4. [Installation](#installation) | |
5. [Configuration](#configuration) | |
6. [Usage Guide](#usage-guide) | |
7. [API Keys Setup](#api-keys-setup) | |
8. [File Structure](#file-structure) | |
9. [Dependencies](#dependencies) | |
10. [Troubleshooting](#troubleshooting) | |
11. [Examples](#examples) | |
12. [Contributing](#contributing) | |
13. [License](#license) | |
## π― Overview | |
Enhanced PandasAI Data Analysis is a powerful web application that combines the capabilities of PandasAI with Groq's language models to provide intelligent data analysis and visualization. The application features separate query processing and chart generation with smart feasibility analysis. | |
### Key Capabilities: | |
- **Intelligent Data Analysis**: Ask natural language questions about your CSV data | |
- **Smart Chart Generation**: Generate visualizations only when appropriate | |
- **Feasibility Analysis**: Automatic assessment of whether queries can be visualized | |
- **Interactive Web Interface**: User-friendly Gradio-based interface | |
- **Multi-format Support**: Handles various data types and structures | |
## β¨ Features | |
### Core Features: | |
- **Separated Query Processing**: Analyze data without generating unnecessary charts | |
- **Smart Chart Detection**: Automatically determines if a query can be visualized | |
- **Chart Feasibility Analysis**: Provides reasoning and recommendations for visualizations | |
- **Multiple Chart Types**: Supports bar charts, line plots, scatter plots, pie charts, histograms | |
- **Real-time Processing**: Instant analysis and visualization generation | |
- **Error Handling**: Comprehensive error management and user feedback | |
### Advanced Features: | |
- **Data Persistence**: Keeps data loaded between queries for efficiency | |
- **Automatic Chart Cleanup**: Removes old visualization files automatically | |
- **Query Type Detection**: Identifies statistical, comparative, and analytical queries | |
- **Recommendation Engine**: Suggests appropriate visualization types | |
- **Reset Functionality**: Easy data reset for new file uploads | |
## π Prerequisites | |
### System Requirements: | |
- **Python**: 3.8 or higher | |
- **Operating System**: Windows, macOS, or Linux | |
- **Memory**: Minimum 4GB RAM (8GB recommended) | |
- **Storage**: At least 1GB free space for dependencies | |
### Required Accounts: | |
- **Groq API Account**: For language model access | |
- **Internet Connection**: Required for API calls and package installation | |
## π§ Installation | |
### Step 1: Clone or Download the Code | |
```bash | |
# If using Git | |
git clone <repository-url> | |
cd enhanced-pandasai | |
# Or download and extract the files manually | |
``` | |
### Step 2: Create Virtual Environment (Recommended) | |
```bash | |
# Create virtual environment | |
python -m venv pandasai_env | |
# Activate virtual environment | |
# On Windows: | |
pandasai_env\Scripts\activate | |
# On macOS/Linux: | |
source pandasai_env/bin/activate | |
``` | |
### Step 3: Install Dependencies | |
```bash | |
# Install all required packages | |
pip install -r requirements.txt | |
# Or install manually: | |
pip install gradio pandas matplotlib pandasai langchain-groq python-dotenv | |
``` | |
### Step 4: Verify Installation | |
```python | |
# Test import (run in Python) | |
python -c "import gradio, pandas, pandasai; print('All packages installed successfully!')" | |
``` | |
## βοΈ Configuration | |
### Environment Variables Setup | |
#### Option 1: Using .env file (Recommended) | |
1. Create a `.env` file in the project root: | |
``` | |
GROQ_API_KEY=your_actual_groq_api_key_here | |
``` | |
2. Replace the hardcoded API key in the code: | |
```python | |
# Replace this line in the code: | |
GROQ_API_KEY = "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" | |
# With this: | |
GROQ_API_KEY = os.getenv("GROQ_API_KEY") | |
``` | |
#### Option 2: Direct Code Modification | |
Replace the API key directly in the code: | |
```python | |
GROQ_API_KEY = "your_actual_groq_api_key_here" | |
``` | |
### Server Configuration | |
Modify these settings in the `demo.launch()` section: | |
```python | |
demo.launch( | |
server_name="0.0.0.0", # Change to "127.0.0.1" for local only | |
server_port=7860, # Change port if needed | |
share=False # Set to True for public access | |
) | |
``` | |
## π Usage Guide | |
### Starting the Application | |
```bash | |
# Navigate to project directory | |
cd enhanced-pandasai | |
# Activate virtual environment (if using) | |
source pandasai_env/bin/activate # macOS/Linux | |
# or | |
pandasai_env\Scripts\activate # Windows | |
# Run the application | |
python app.py | |
``` | |
### Web Interface Access | |
- **Local URL**: http://localhost:7860 | |
- **Network URL**: http://your-ip-address:7860 (if server_name="0.0.0.0") | |
### Step-by-Step Workflow | |
#### Step 1: Upload Data | |
1. Click "Upload CSV File" | |
2. Select your CSV file | |
3. Wait for upload confirmation | |
#### Step 2: Analyze Data | |
1. Enter your query in the text box | |
2. Click "π Analyze Query" | |
3. Review the analysis result | |
4. Check chart feasibility analysis | |
5. Read chart recommendations | |
#### Step 3: Generate Visualizations (Optional) | |
1. If chart is recommended, click "π Generate Chart" | |
2. Wait for chart generation | |
3. View the generated visualization | |
4. Check generation status | |
#### Step 4: Reset for New Data | |
1. Click "π Reset Data" to clear current data | |
2. Upload a new CSV file | |
3. Repeat the process | |
## π API Keys Setup | |
### Getting a Groq API Key | |
1. **Visit Groq Console**: Go to https://console.groq.com | |
2. **Create Account**: Sign up or log in | |
3. **Generate API Key**: | |
- Navigate to API Keys section | |
- Click "Create API Key" | |
- Copy the generated key | |
4. **Configure in Application**: | |
- Replace the hardcoded key in the code | |
- Or use environment variables (recommended) | |
### API Key Security Best Practices | |
- **Never commit API keys to version control** | |
- **Use environment variables for production** | |
- **Rotate keys regularly** | |
- **Monitor API usage and billing** | |
- **Restrict key permissions if possible** | |
## π File Structure | |
``` | |
enhanced-pandasai/ | |
βββ app.py # Main application file | |
βββ requirements.txt # Python dependencies | |
βββ .env # Environment variables (create this) | |
βββ .gitignore # Git ignore file | |
βββ README.txt # This documentation | |
βββ examples/ # Example CSV files | |
β βββ sample_sales.csv | |
β βββ sample_population.csv | |
β βββ sample_financial.csv | |
βββ docs/ # Additional documentation | |
β βββ api_reference.md | |
β βββ troubleshooting.md | |
βββ temp/ # Temporary files (auto-created) | |
``` | |
## π¦ Dependencies | |
### Core Dependencies | |
``` | |
gradio>=4.0.0 # Web interface framework | |
pandas>=1.5.0 # Data manipulation library | |
matplotlib>=3.6.0 # Plotting library | |
pandasai>=1.5.0 # AI-powered data analysis | |
langchain-groq>=0.1.0 # Groq integration for LangChain | |
python-dotenv>=1.0.0 # Environment variable loading | |
``` | |
### Optional Dependencies | |
``` | |
seaborn>=0.12.0 # Enhanced statistical visualizations | |
plotly>=5.15.0 # Interactive plots | |
numpy>=1.24.0 # Numerical computing | |
scikit-learn>=1.3.0 # Machine learning (if needed) | |
``` | |
### System Dependencies | |
- **Python 3.8+**: Core runtime | |
- **pip**: Package manager | |
- **virtualenv**: Virtual environment (recommended) | |
## π§ Troubleshooting | |
### Common Issues and Solutions | |
#### 1. Import Errors | |
**Problem**: `ModuleNotFoundError: No module named 'xxx'` | |
**Solution**: | |
```bash | |
pip install --upgrade pip | |
pip install -r requirements.txt | |
``` | |
#### 2. API Key Issues | |
**Problem**: `Invalid API key` or authentication errors | |
**Solution**: | |
- Verify API key is correct | |
- Check environment variable setup | |
- Ensure API key has proper permissions | |
- Try regenerating the API key | |
#### 3. File Upload Issues | |
**Problem**: CSV files not uploading or processing | |
**Solution**: | |
- Ensure CSV file is properly formatted | |
- Check file size (should be reasonable) | |
- Verify CSV has headers | |
- Try different CSV encoding (UTF-8 recommended) | |
#### 4. Chart Generation Failures | |
**Problem**: Charts not generating despite recommendations | |
**Solution**: | |
- Check if query is suitable for visualization | |
- Ensure data has numeric columns for plotting | |
- Try simpler queries first | |
- Check temporary directory permissions | |
#### 5. Port Already in Use | |
**Problem**: `Address already in use` error | |
**Solution**: | |
```python | |
# Change port in code | |
demo.launch(server_port=7861) # Try different port | |
``` | |
#### 6. Memory Issues | |
**Problem**: Application crashes with large datasets | |
**Solution**: | |
- Use smaller CSV files for testing | |
- Increase system memory | |
- Process data in chunks if possible | |
### Debug Mode | |
Enable debug mode for detailed error information: | |
```python | |
# Add this to the beginning of app.py | |
import logging | |
logging.basicConfig(level=logging.DEBUG) | |
# Launch with debug | |
demo.launch(debug=True) | |
``` | |
## π Examples | |
### Example 1: Sales Data Analysis | |
**CSV Structure**: | |
```csv | |
Region,Product,Sales,Quantity | |
North,Widget A,1000,50 | |
South,Widget B,1500,75 | |
East,Widget A,1200,60 | |
West,Widget B,1800,90 | |
``` | |
**Sample Queries**: | |
- "Which region has the highest sales?" | |
- "Show total sales by product" | |
- "Create a bar chart of sales by region" | |
### Example 2: Population Data Analysis | |
**CSV Structure**: | |
```csv | |
Country,Population,GDP,Area | |
USA,331000000,21000000,9834000 | |
China,1440000000,14000000,9597000 | |
India,1380000000,3000000,3287000 | |
``` | |
**Sample Queries**: | |
- "Which are the top 3 countries by population?" | |
- "What's the relationship between GDP and population?" | |
- "Create a scatter plot of GDP vs Population" | |
### Example 3: Time Series Data | |
**CSV Structure**: | |
```csv | |
Date,Value,Category | |
2023-01-01,100,A | |
2023-01-02,105,A | |
2023-01-03,98,B | |
2023-01-04,112,B | |
``` | |
**Sample Queries**: | |
- "Show the trend over time" | |
- "Compare categories A and B" | |
- "Create a line plot of values over time" | |
## π€ Contributing | |
### How to Contribute | |
1. **Fork the repository** | |
2. **Create a feature branch**: `git checkout -b feature-name` | |
3. **Make changes and test thoroughly** | |
4. **Commit changes**: `git commit -m "Add feature description"` | |
5. **Push to branch**: `git push origin feature-name` | |
6. **Submit a pull request** | |
### Contribution Guidelines | |
- Follow Python PEP 8 style guidelines | |
- Add docstrings to new functions | |
- Include error handling for new features | |
- Test with various CSV formats | |
- Update documentation for new features | |
### Reporting Issues | |
When reporting issues, please include: | |
- Python version | |
- Operating system | |
- Error messages (full traceback) | |
- Steps to reproduce | |
- Sample data (if applicable) | |
## π License | |
This project is licensed under the MIT License - see the LICENSE file for details. | |
### Third-Party Licenses | |
- **Gradio**: Apache License 2.0 | |
- **Pandas**: BSD 3-Clause License | |
- **Matplotlib**: License based on Python Software Foundation License | |
- **PandasAI**: MIT License | |
- **LangChain**: MIT License | |
## π Support | |
### Getting Help | |
- **Documentation**: Check this README and inline code comments | |
- **Issues**: Report bugs via GitHub issues | |
- **Community**: Join discussions in project forums | |
- **API Documentation**: Refer to Groq and PandasAI official docs | |
### Contact Information | |
- **Project Maintainer**: [Your Name/Organization] | |
- **Email**: [Your Email] | |
- **GitHub**: [Your GitHub Profile] | |
--- | |
## π Quick Start Commands | |
```bash | |
# Complete setup in one go | |
git clone <repository-url> | |
cd enhanced-pandasai | |
python -m venv pandasai_env | |
source pandasai_env/bin/activate # Linux/Mac | |
# or pandasai_env\Scripts\activate # Windows | |
pip install -r requirements.txt | |
# Edit API key in app.py | |
python app.py | |
# Open http://localhost:7860 | |
``` | |
--- | |
**Last Updated**: [Current Date] | |
**Version**: 1.0.0 | |
**Compatibility**: Python 3.8+, All major operating systems |