srivatsavdamaraju commited on
Commit
f3125e4
Β·
verified Β·
1 Parent(s): 1bbd8aa

Create details.txt

Browse files
Files changed (1) hide show
  1. details.txt +420 -0
details.txt ADDED
@@ -0,0 +1,420 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Enhanced PandasAI Data Analysis with Groq
2
+
3
+ ## πŸ“‹ Table of Contents
4
+ 1. [Overview](#overview)
5
+ 2. [Features](#features)
6
+ 3. [Prerequisites](#prerequisites)
7
+ 4. [Installation](#installation)
8
+ 5. [Configuration](#configuration)
9
+ 6. [Usage Guide](#usage-guide)
10
+ 7. [API Keys Setup](#api-keys-setup)
11
+ 8. [File Structure](#file-structure)
12
+ 9. [Dependencies](#dependencies)
13
+ 10. [Troubleshooting](#troubleshooting)
14
+ 11. [Examples](#examples)
15
+ 12. [Contributing](#contributing)
16
+ 13. [License](#license)
17
+
18
+ ## 🎯 Overview
19
+
20
+ Enhanced PandasAI Data Analysis is a powerful web application that combines the capabilities of PandasAI with Groq's language models to provide intelligent data analysis and visualization. The application features separate query processing and chart generation with smart feasibility analysis.
21
+
22
+ ### Key Capabilities:
23
+ - **Intelligent Data Analysis**: Ask natural language questions about your CSV data
24
+ - **Smart Chart Generation**: Generate visualizations only when appropriate
25
+ - **Feasibility Analysis**: Automatic assessment of whether queries can be visualized
26
+ - **Interactive Web Interface**: User-friendly Gradio-based interface
27
+ - **Multi-format Support**: Handles various data types and structures
28
+
29
+ ## ✨ Features
30
+
31
+ ### Core Features:
32
+ - **Separated Query Processing**: Analyze data without generating unnecessary charts
33
+ - **Smart Chart Detection**: Automatically determines if a query can be visualized
34
+ - **Chart Feasibility Analysis**: Provides reasoning and recommendations for visualizations
35
+ - **Multiple Chart Types**: Supports bar charts, line plots, scatter plots, pie charts, histograms
36
+ - **Real-time Processing**: Instant analysis and visualization generation
37
+ - **Error Handling**: Comprehensive error management and user feedback
38
+
39
+ ### Advanced Features:
40
+ - **Data Persistence**: Keeps data loaded between queries for efficiency
41
+ - **Automatic Chart Cleanup**: Removes old visualization files automatically
42
+ - **Query Type Detection**: Identifies statistical, comparative, and analytical queries
43
+ - **Recommendation Engine**: Suggests appropriate visualization types
44
+ - **Reset Functionality**: Easy data reset for new file uploads
45
+
46
+ ## πŸ“‹ Prerequisites
47
+
48
+ ### System Requirements:
49
+ - **Python**: 3.8 or higher
50
+ - **Operating System**: Windows, macOS, or Linux
51
+ - **Memory**: Minimum 4GB RAM (8GB recommended)
52
+ - **Storage**: At least 1GB free space for dependencies
53
+
54
+ ### Required Accounts:
55
+ - **Groq API Account**: For language model access
56
+ - **Internet Connection**: Required for API calls and package installation
57
+
58
+ ## πŸ”§ Installation
59
+
60
+ ### Step 1: Clone or Download the Code
61
+ ```bash
62
+ # If using Git
63
+ git clone <repository-url>
64
+ cd enhanced-pandasai
65
+
66
+ # Or download and extract the files manually
67
+ ```
68
+
69
+ ### Step 2: Create Virtual Environment (Recommended)
70
+ ```bash
71
+ # Create virtual environment
72
+ python -m venv pandasai_env
73
+
74
+ # Activate virtual environment
75
+ # On Windows:
76
+ pandasai_env\Scripts\activate
77
+
78
+ # On macOS/Linux:
79
+ source pandasai_env/bin/activate
80
+ ```
81
+
82
+ ### Step 3: Install Dependencies
83
+ ```bash
84
+ # Install all required packages
85
+ pip install -r requirements.txt
86
+
87
+ # Or install manually:
88
+ pip install gradio pandas matplotlib pandasai langchain-groq python-dotenv
89
+ ```
90
+
91
+ ### Step 4: Verify Installation
92
+ ```python
93
+ # Test import (run in Python)
94
+ python -c "import gradio, pandas, pandasai; print('All packages installed successfully!')"
95
+ ```
96
+
97
+ ## βš™οΈ Configuration
98
+
99
+ ### Environment Variables Setup
100
+
101
+ #### Option 1: Using .env file (Recommended)
102
+ 1. Create a `.env` file in the project root:
103
+ ```
104
+ GROQ_API_KEY=your_actual_groq_api_key_here
105
+ ```
106
+
107
+ 2. Replace the hardcoded API key in the code:
108
+ ```python
109
+ # Replace this line in the code:
110
+ GROQ_API_KEY = "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"
111
+
112
+ # With this:
113
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY")
114
+ ```
115
+
116
+ #### Option 2: Direct Code Modification
117
+ Replace the API key directly in the code:
118
+ ```python
119
+ GROQ_API_KEY = "your_actual_groq_api_key_here"
120
+ ```
121
+
122
+ ### Server Configuration
123
+ Modify these settings in the `demo.launch()` section:
124
+ ```python
125
+ demo.launch(
126
+ server_name="0.0.0.0", # Change to "127.0.0.1" for local only
127
+ server_port=7860, # Change port if needed
128
+ share=False # Set to True for public access
129
+ )
130
+ ```
131
+
132
+ ## πŸ“– Usage Guide
133
+
134
+ ### Starting the Application
135
+ ```bash
136
+ # Navigate to project directory
137
+ cd enhanced-pandasai
138
+
139
+ # Activate virtual environment (if using)
140
+ source pandasai_env/bin/activate # macOS/Linux
141
+ # or
142
+ pandasai_env\Scripts\activate # Windows
143
+
144
+ # Run the application
145
+ python app.py
146
+ ```
147
+
148
+ ### Web Interface Access
149
+ - **Local URL**: http://localhost:7860
150
+ - **Network URL**: http://your-ip-address:7860 (if server_name="0.0.0.0")
151
+
152
+ ### Step-by-Step Workflow
153
+
154
+ #### Step 1: Upload Data
155
+ 1. Click "Upload CSV File"
156
+ 2. Select your CSV file
157
+ 3. Wait for upload confirmation
158
+
159
+ #### Step 2: Analyze Data
160
+ 1. Enter your query in the text box
161
+ 2. Click "πŸ” Analyze Query"
162
+ 3. Review the analysis result
163
+ 4. Check chart feasibility analysis
164
+ 5. Read chart recommendations
165
+
166
+ #### Step 3: Generate Visualizations (Optional)
167
+ 1. If chart is recommended, click "πŸ“Š Generate Chart"
168
+ 2. Wait for chart generation
169
+ 3. View the generated visualization
170
+ 4. Check generation status
171
+
172
+ #### Step 4: Reset for New Data
173
+ 1. Click "πŸ”„ Reset Data" to clear current data
174
+ 2. Upload a new CSV file
175
+ 3. Repeat the process
176
+
177
+ ## πŸ”‘ API Keys Setup
178
+
179
+ ### Getting a Groq API Key
180
+
181
+ 1. **Visit Groq Console**: Go to https://console.groq.com
182
+ 2. **Create Account**: Sign up or log in
183
+ 3. **Generate API Key**:
184
+ - Navigate to API Keys section
185
+ - Click "Create API Key"
186
+ - Copy the generated key
187
+ 4. **Configure in Application**:
188
+ - Replace the hardcoded key in the code
189
+ - Or use environment variables (recommended)
190
+
191
+ ### API Key Security Best Practices
192
+ - **Never commit API keys to version control**
193
+ - **Use environment variables for production**
194
+ - **Rotate keys regularly**
195
+ - **Monitor API usage and billing**
196
+ - **Restrict key permissions if possible**
197
+
198
+ ## πŸ“ File Structure
199
+
200
+ ```
201
+ enhanced-pandasai/
202
+ β”œβ”€β”€ app.py # Main application file
203
+ β”œβ”€β”€ requirements.txt # Python dependencies
204
+ β”œβ”€β”€ .env # Environment variables (create this)
205
+ β”œβ”€β”€ .gitignore # Git ignore file
206
+ β”œβ”€β”€ README.txt # This documentation
207
+ β”œβ”€β”€ examples/ # Example CSV files
208
+ β”‚ β”œβ”€β”€ sample_sales.csv
209
+ β”‚ β”œβ”€β”€ sample_population.csv
210
+ β”‚ └── sample_financial.csv
211
+ β”œβ”€β”€ docs/ # Additional documentation
212
+ β”‚ β”œβ”€β”€ api_reference.md
213
+ β”‚ └── troubleshooting.md
214
+ └── temp/ # Temporary files (auto-created)
215
+ ```
216
+
217
+ ## πŸ“¦ Dependencies
218
+
219
+ ### Core Dependencies
220
+ ```
221
+ gradio>=4.0.0 # Web interface framework
222
+ pandas>=1.5.0 # Data manipulation library
223
+ matplotlib>=3.6.0 # Plotting library
224
+ pandasai>=1.5.0 # AI-powered data analysis
225
+ langchain-groq>=0.1.0 # Groq integration for LangChain
226
+ python-dotenv>=1.0.0 # Environment variable loading
227
+ ```
228
+
229
+ ### Optional Dependencies
230
+ ```
231
+ seaborn>=0.12.0 # Enhanced statistical visualizations
232
+ plotly>=5.15.0 # Interactive plots
233
+ numpy>=1.24.0 # Numerical computing
234
+ scikit-learn>=1.3.0 # Machine learning (if needed)
235
+ ```
236
+
237
+ ### System Dependencies
238
+ - **Python 3.8+**: Core runtime
239
+ - **pip**: Package manager
240
+ - **virtualenv**: Virtual environment (recommended)
241
+
242
+ ## πŸ”§ Troubleshooting
243
+
244
+ ### Common Issues and Solutions
245
+
246
+ #### 1. Import Errors
247
+ **Problem**: `ModuleNotFoundError: No module named 'xxx'`
248
+ **Solution**:
249
+ ```bash
250
+ pip install --upgrade pip
251
+ pip install -r requirements.txt
252
+ ```
253
+
254
+ #### 2. API Key Issues
255
+ **Problem**: `Invalid API key` or authentication errors
256
+ **Solution**:
257
+ - Verify API key is correct
258
+ - Check environment variable setup
259
+ - Ensure API key has proper permissions
260
+ - Try regenerating the API key
261
+
262
+ #### 3. File Upload Issues
263
+ **Problem**: CSV files not uploading or processing
264
+ **Solution**:
265
+ - Ensure CSV file is properly formatted
266
+ - Check file size (should be reasonable)
267
+ - Verify CSV has headers
268
+ - Try different CSV encoding (UTF-8 recommended)
269
+
270
+ #### 4. Chart Generation Failures
271
+ **Problem**: Charts not generating despite recommendations
272
+ **Solution**:
273
+ - Check if query is suitable for visualization
274
+ - Ensure data has numeric columns for plotting
275
+ - Try simpler queries first
276
+ - Check temporary directory permissions
277
+
278
+ #### 5. Port Already in Use
279
+ **Problem**: `Address already in use` error
280
+ **Solution**:
281
+ ```python
282
+ # Change port in code
283
+ demo.launch(server_port=7861) # Try different port
284
+ ```
285
+
286
+ #### 6. Memory Issues
287
+ **Problem**: Application crashes with large datasets
288
+ **Solution**:
289
+ - Use smaller CSV files for testing
290
+ - Increase system memory
291
+ - Process data in chunks if possible
292
+
293
+ ### Debug Mode
294
+ Enable debug mode for detailed error information:
295
+ ```python
296
+ # Add this to the beginning of app.py
297
+ import logging
298
+ logging.basicConfig(level=logging.DEBUG)
299
+
300
+ # Launch with debug
301
+ demo.launch(debug=True)
302
+ ```
303
+
304
+ ## πŸ“Š Examples
305
+
306
+ ### Example 1: Sales Data Analysis
307
+ **CSV Structure**:
308
+ ```csv
309
+ Region,Product,Sales,Quantity
310
+ North,Widget A,1000,50
311
+ South,Widget B,1500,75
312
+ East,Widget A,1200,60
313
+ West,Widget B,1800,90
314
+ ```
315
+
316
+ **Sample Queries**:
317
+ - "Which region has the highest sales?"
318
+ - "Show total sales by product"
319
+ - "Create a bar chart of sales by region"
320
+
321
+ ### Example 2: Population Data Analysis
322
+ **CSV Structure**:
323
+ ```csv
324
+ Country,Population,GDP,Area
325
+ USA,331000000,21000000,9834000
326
+ China,1440000000,14000000,9597000
327
+ India,1380000000,3000000,3287000
328
+ ```
329
+
330
+ **Sample Queries**:
331
+ - "Which are the top 3 countries by population?"
332
+ - "What's the relationship between GDP and population?"
333
+ - "Create a scatter plot of GDP vs Population"
334
+
335
+ ### Example 3: Time Series Data
336
+ **CSV Structure**:
337
+ ```csv
338
+ Date,Value,Category
339
+ 2023-01-01,100,A
340
+ 2023-01-02,105,A
341
+ 2023-01-03,98,B
342
+ 2023-01-04,112,B
343
+ ```
344
+
345
+ **Sample Queries**:
346
+ - "Show the trend over time"
347
+ - "Compare categories A and B"
348
+ - "Create a line plot of values over time"
349
+
350
+ ## 🀝 Contributing
351
+
352
+ ### How to Contribute
353
+ 1. **Fork the repository**
354
+ 2. **Create a feature branch**: `git checkout -b feature-name`
355
+ 3. **Make changes and test thoroughly**
356
+ 4. **Commit changes**: `git commit -m "Add feature description"`
357
+ 5. **Push to branch**: `git push origin feature-name`
358
+ 6. **Submit a pull request**
359
+
360
+ ### Contribution Guidelines
361
+ - Follow Python PEP 8 style guidelines
362
+ - Add docstrings to new functions
363
+ - Include error handling for new features
364
+ - Test with various CSV formats
365
+ - Update documentation for new features
366
+
367
+ ### Reporting Issues
368
+ When reporting issues, please include:
369
+ - Python version
370
+ - Operating system
371
+ - Error messages (full traceback)
372
+ - Steps to reproduce
373
+ - Sample data (if applicable)
374
+
375
+ ## πŸ“„ License
376
+
377
+ This project is licensed under the MIT License - see the LICENSE file for details.
378
+
379
+ ### Third-Party Licenses
380
+ - **Gradio**: Apache License 2.0
381
+ - **Pandas**: BSD 3-Clause License
382
+ - **Matplotlib**: License based on Python Software Foundation License
383
+ - **PandasAI**: MIT License
384
+ - **LangChain**: MIT License
385
+
386
+ ## πŸ“ž Support
387
+
388
+ ### Getting Help
389
+ - **Documentation**: Check this README and inline code comments
390
+ - **Issues**: Report bugs via GitHub issues
391
+ - **Community**: Join discussions in project forums
392
+ - **API Documentation**: Refer to Groq and PandasAI official docs
393
+
394
+ ### Contact Information
395
+ - **Project Maintainer**: [Your Name/Organization]
396
+ - **Email**: [Your Email]
397
+ - **GitHub**: [Your GitHub Profile]
398
+
399
+ ---
400
+
401
+ ## πŸš€ Quick Start Commands
402
+
403
+ ```bash
404
+ # Complete setup in one go
405
+ git clone <repository-url>
406
+ cd enhanced-pandasai
407
+ python -m venv pandasai_env
408
+ source pandasai_env/bin/activate # Linux/Mac
409
+ # or pandasai_env\Scripts\activate # Windows
410
+ pip install -r requirements.txt
411
+ # Edit API key in app.py
412
+ python app.py
413
+ # Open http://localhost:7860
414
+ ```
415
+
416
+ ---
417
+
418
+ **Last Updated**: [Current Date]
419
+ **Version**: 1.0.0
420
+ **Compatibility**: Python 3.8+, All major operating systems