Drastically simplify VayuChat for reliability and better UX
Browse filesQUESTIONS SIMPLIFIED:
- Remove complex visualizations (windrose, polar plots, advanced maps)
- Focus on basic analysis: trends, comparisons, simple correlations
- Add "Getting Started" section with 10 simple questions (expanded by default)
- Organize remaining questions in collapsed categories
SYSTEM PROMPT STREAMLINED:
- Reduce from 140 lines to 36 lines - focus on essentials only
- Keep clear answer variable requirements: text, dataframe, or plot filename
- Emphasize simple matplotlib plotting over complex visualizations
- Basic data validation and error prevention rules
UI IMPROVEMENTS:
- Getting Started section expanded by default for new users
- Other categories collapsed to reduce overwhelm
- Simple, reliable questions that should work consistently
Net reduction: 157 lines removed, 61 lines added for better reliability
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
- app.py +10 -1
- new_system_prompt.txt +21 -126
- questions.txt +30 -30
|
@@ -651,8 +651,17 @@ with st.sidebar:
|
|
| 651 |
# Show all questions but in a scrollable format
|
| 652 |
if len(questions) > 0:
|
| 653 |
st.markdown("**Select a question to analyze:**")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 654 |
# Create expandable sections for better organization
|
| 655 |
-
with st.expander("📊 NCAP Funding & Policy Analysis", expanded=
|
| 656 |
for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
|
| 657 |
if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
| 658 |
selected_prompt = q
|
|
|
|
| 651 |
# Show all questions but in a scrollable format
|
| 652 |
if len(questions) > 0:
|
| 653 |
st.markdown("**Select a question to analyze:**")
|
| 654 |
+
|
| 655 |
+
# Getting Started section with simple questions
|
| 656 |
+
getting_started_questions = questions[:10] # First 10 simple questions
|
| 657 |
+
with st.expander("🚀 Getting Started - Simple Questions", expanded=True):
|
| 658 |
+
for i, q in enumerate(getting_started_questions):
|
| 659 |
+
if st.button(q, key=f"start_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
| 660 |
+
selected_prompt = q
|
| 661 |
+
st.session_state.last_selected_prompt = q
|
| 662 |
+
|
| 663 |
# Create expandable sections for better organization
|
| 664 |
+
with st.expander("📊 NCAP Funding & Policy Analysis", expanded=False):
|
| 665 |
for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
|
| 666 |
if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
| 667 |
selected_prompt = q
|
|
@@ -3,139 +3,34 @@ Generate Python code to answer the user's question about air quality data.
|
|
| 3 |
CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
|
| 4 |
|
| 5 |
AVAILABLE LIBRARIES:
|
| 6 |
-
You can use these pre-installed libraries:
|
| 7 |
- pandas, numpy (data manipulation)
|
| 8 |
- matplotlib, seaborn, plotly (visualization)
|
| 9 |
-
- statsmodels
|
| 10 |
-
- scikit-learn (machine learning, regression)
|
| 11 |
- geopandas (geospatial analysis)
|
| 12 |
|
| 13 |
-
|
| 14 |
-
- For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
|
| 15 |
-
- For regression: Use sklearn.linear_model.LinearRegression() for robust regression
|
| 16 |
-
- For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
|
| 17 |
-
- Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
|
| 18 |
-
- Handle missing libraries gracefully with try-except around imports
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
- MUST call plt.close() to prevent memory leaks
|
| 26 |
-
- MUST store filename in 'answer' variable: answer = filename
|
| 27 |
-
- Handle empty data gracefully before plotting
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
MANDATORY SAFETY & ROBUSTNESS RULES:
|
| 39 |
-
|
| 40 |
-
ROBUST DATA VALIDATION (MANDATORY):
|
| 41 |
-
- Check DataFrame exists: if df.empty: answer = "No data available"
|
| 42 |
-
- LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
|
| 43 |
-
- Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
|
| 44 |
-
- Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
|
| 45 |
-
- NEVER assume external files exist: check with try/except or provide alternative approach
|
| 46 |
-
- Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
|
| 47 |
-
|
| 48 |
-
OPERATION SAFETY (PREVENT CRASHES):
|
| 49 |
-
- ALWAYS use try/except for complex operations with fallback to simpler approach
|
| 50 |
-
- START SIMPLE: Use basic pandas operations before trying advanced techniques
|
| 51 |
-
- For mapping/visualization: Use scatter plots if complex maps fail
|
| 52 |
-
- For correlation: Use simple .corr() before advanced statistical methods
|
| 53 |
-
- Check denominators before division: if denominator == 0: continue
|
| 54 |
-
- Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
|
| 55 |
-
- Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
|
| 56 |
-
- NO return statements - use if/else logic flow with proper answer assignment
|
| 57 |
-
|
| 58 |
-
PLOT GENERATION (MANDATORY FOR PLOTS):
|
| 59 |
-
- Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
|
| 60 |
-
- Always create new figure: plt.figure(figsize=(12, 8))
|
| 61 |
-
- Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
|
| 62 |
-
- Handle long city names: plt.xticks(rotation=45, ha='right')
|
| 63 |
-
- Use tight layout: plt.tight_layout()
|
| 64 |
-
- CRITICAL PLOT SAVING SEQUENCE (no return statements):
|
| 65 |
-
1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
|
| 66 |
-
2. plt.savefig(filename, dpi=300, bbox_inches='tight')
|
| 67 |
-
3. plt.close()
|
| 68 |
-
4. answer = filename
|
| 69 |
-
- Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
|
| 70 |
-
|
| 71 |
-
CRITICAL CODING PRACTICES:
|
| 72 |
-
|
| 73 |
-
DATA VALIDATION & SAFETY:
|
| 74 |
-
- Always check if DataFrames/Series are empty before operations: if df.empty: answer = "No data available"; exit()
|
| 75 |
-
- Use .dropna() to handle missing values or .fillna() with appropriate defaults
|
| 76 |
-
- Validate column names exist before accessing: if 'column' in df.columns: else: answer = "Column not found"
|
| 77 |
-
- Check data types before operations: df['col'].dtype, isinstance() checks
|
| 78 |
-
- Handle edge cases: empty results, single row/column DataFrames, all NaN columns
|
| 79 |
-
- Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
|
| 80 |
-
|
| 81 |
-
ROBUST ANALYSIS APPROACHES:
|
| 82 |
-
|
| 83 |
-
GEOGRAPHICAL/MAPPING QUESTIONS:
|
| 84 |
-
- PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
|
| 85 |
-
- FALLBACK: If geographical data missing, use bar charts by state/city
|
| 86 |
-
- NEVER assume external shapefiles exist - always have simple alternative
|
| 87 |
-
|
| 88 |
-
CORRELATION/RELATIONSHIP ANALYSIS:
|
| 89 |
-
- Filter by location FIRST if question asks about specific city
|
| 90 |
-
- Use .dropna() and check len(data) > 50 for reliable correlations
|
| 91 |
-
- If complex analysis fails, use simple scatter plots with trend lines
|
| 92 |
-
- Report "insufficient data" rather than NaN/meaningless results
|
| 93 |
-
|
| 94 |
-
METEOROLOGICAL ANALYSIS:
|
| 95 |
-
- Check if weather columns have sufficient non-null values before analysis
|
| 96 |
-
- Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
|
| 97 |
-
- For complex plots, provide simple bar/line chart fallback
|
| 98 |
-
- Group by time periods (month/season) if daily data is too sparse
|
| 99 |
-
|
| 100 |
-
VARIABLE & TYPE HANDLING:
|
| 101 |
-
- Use descriptive variable names (avoid single letters in complex operations)
|
| 102 |
-
- Ensure all variables are defined before use - initialize with defaults
|
| 103 |
-
- Convert pandas/numpy objects to proper Python types before operations
|
| 104 |
-
- Convert datetime/period objects appropriately: .astype(str), .dt.strftime(), int()
|
| 105 |
-
- Always cast to appropriate types for indexing: int(), str(), list()
|
| 106 |
-
- CRITICAL: Convert pandas/numpy values to int before list indexing: int(value) for calendar.month_name[int(month_value)]
|
| 107 |
-
- Use explicit type conversions rather than relying on implicit casting
|
| 108 |
-
|
| 109 |
-
PANDAS OPERATIONS:
|
| 110 |
-
- Reference DataFrame properly: df['column'] not 'column' in operations
|
| 111 |
-
- Use .loc/.iloc correctly for indexing - avoid chained indexing
|
| 112 |
-
- Use .reset_index() after groupby operations when needed for clean DataFrames
|
| 113 |
-
- Sort results for consistent output: .sort_values(), .sort_index()
|
| 114 |
-
- Use .round() for numerical results to avoid excessive decimals
|
| 115 |
-
- Chain operations carefully - split complex chains for readability
|
| 116 |
-
|
| 117 |
-
MATPLOTLIB & PLOTTING:
|
| 118 |
-
- Always call plt.close() after saving plots to prevent memory leaks
|
| 119 |
-
- Use descriptive titles, axis labels, and legends
|
| 120 |
-
- Handle cases where no data exists for plotting
|
| 121 |
-
- Use proper figure sizing: plt.figure(figsize=(width, height))
|
| 122 |
-
- Convert datetime indices to strings for plotting if needed
|
| 123 |
-
- Use color palettes consistently
|
| 124 |
-
|
| 125 |
-
ERROR PREVENTION:
|
| 126 |
-
- Use try-except blocks for operations that might fail
|
| 127 |
-
- Check denominators before division operations
|
| 128 |
-
- Validate array/list lengths before indexing
|
| 129 |
-
- Use .get() method for dictionary access with defaults
|
| 130 |
-
- Handle timezone-aware vs naive datetime objects consistently
|
| 131 |
-
- Use proper string formatting and encoding for text output
|
| 132 |
|
| 133 |
TECHNICAL REQUIREMENTS:
|
| 134 |
- Save final result in variable called 'answer'
|
| 135 |
-
-
|
| 136 |
-
-
|
| 137 |
-
-
|
| 138 |
-
- Always use .iloc or .loc properly for pandas indexing
|
| 139 |
-
- Close matplotlib figures with plt.close() to prevent memory leaks
|
| 140 |
-
- Use proper column name checks before accessing columns
|
| 141 |
-
- For dataframes, ensure proper column names and sorting for readability
|
|
|
|
| 3 |
CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
|
| 4 |
|
| 5 |
AVAILABLE LIBRARIES:
|
|
|
|
| 6 |
- pandas, numpy (data manipulation)
|
| 7 |
- matplotlib, seaborn, plotly (visualization)
|
| 8 |
+
- statsmodels, scikit-learn (analysis)
|
|
|
|
| 9 |
- geopandas (geospatial analysis)
|
| 10 |
|
| 11 |
+
ESSENTIAL RULES:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
DATA SAFETY:
|
| 14 |
+
- Always check if data exists: if df.empty: answer = "No data available"
|
| 15 |
+
- For city-specific questions: filter first: df_city = df[df['City'].str.contains('CityName', case=False)]
|
| 16 |
+
- Check sufficient data: if len(df_filtered) < 10: answer = "Insufficient data"
|
| 17 |
+
- Use .dropna() to remove missing values before analysis
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
PLOTTING REQUIREMENTS:
|
| 20 |
+
- Create plots for visualization requests: plt.figure(figsize=(12, 8))
|
| 21 |
+
- Save plots: filename = f"plot_{uuid.uuid4().hex[:8]}.png"; plt.savefig(filename, dpi=300, bbox_inches='tight')
|
| 22 |
+
- Close plots: plt.close()
|
| 23 |
+
- Store filename: answer = filename
|
| 24 |
+
- For non-plots: answer = "text result"
|
| 25 |
|
| 26 |
+
BASIC ERROR PREVENTION:
|
| 27 |
+
- Use try/except for complex operations
|
| 28 |
+
- Validate results: if pd.isna(result): answer = "Analysis inconclusive"
|
| 29 |
+
- For correlations: check len(data) > 20 before calculating
|
| 30 |
+
- Use simple matplotlib plotting - avoid complex visualizations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
TECHNICAL REQUIREMENTS:
|
| 33 |
- Save final result in variable called 'answer'
|
| 34 |
+
- Use exact column names: 'PM2.5 (µg/m³)', 'WS (m/s)', etc.
|
| 35 |
+
- Handle dates with pd.to_datetime() if needed
|
| 36 |
+
- Round numerical results: round(value, 2)
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,30 +1,30 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Which
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
Does
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
Show
|
| 30 |
-
|
|
|
|
| 1 |
+
Which city has the highest average PM2.5 levels in 2023?
|
| 2 |
+
Show monthly PM2.5 trends for Delhi in 2023
|
| 3 |
+
Compare PM2.5 levels between winter and summer months
|
| 4 |
+
Which month had the highest pollution levels in Mumbai?
|
| 5 |
+
Calculate average PM2.5 for all cities in November 2023
|
| 6 |
+
Rank top 10 cities by highest PM2.5 pollution levels
|
| 7 |
+
Show seasonal pollution patterns across all cities
|
| 8 |
+
Compare pollution levels between weekdays and weekends
|
| 9 |
+
Which cities exceed WHO PM2.5 guidelines of 15 µg/m³?
|
| 10 |
+
Plot yearly PM2.5 trends from 2020 to 2023 for major cities
|
| 11 |
+
How much NCAP funding did Delhi receive vs Mumbai?
|
| 12 |
+
Which NCAP cities achieved the best PM2.5 reduction?
|
| 13 |
+
Does wind speed above 3 m/s reduce PM2.5 levels in Delhi?
|
| 14 |
+
Show correlation between temperature and PM2.5 in summer months
|
| 15 |
+
Which cities with high population have dangerous PM2.5 levels?
|
| 16 |
+
Compare PM2.5 levels in high-funded vs low-funded NCAP cities
|
| 17 |
+
Does rainfall help reduce pollution levels during monsoon?
|
| 18 |
+
Which meteorological factor correlates most with PM2.5 reduction?
|
| 19 |
+
Show monthly PM2.5 trends for top 5 Indian cities by population
|
| 20 |
+
Does humidity above 80% help reduce pollution in coastal cities?
|
| 21 |
+
Compare NO2 vs PM2.5 levels in traffic-heavy areas
|
| 22 |
+
Which NCAP-funded cities still exceed WHO guidelines?
|
| 23 |
+
Show relationship between city population and average PM2.5
|
| 24 |
+
Compare PM2.5 improvement rates: Delhi vs Mumbai vs Kolkata
|
| 25 |
+
Create simple scatter plot of PM2.5 vs PM10 correlation
|
| 26 |
+
Show state-wise average PM2.5 levels for policy planning
|
| 27 |
+
Which cities need immediate intervention with PM2.5 above 60 µg/m³?
|
| 28 |
+
Compare pollution trends between North vs South Indian cities
|
| 29 |
+
Show seasonal variation in PM2.5 across different climate zones
|
| 30 |
+
Identify cities with consistent pollution improvement over time
|