Nipun Claude commited on
Commit
8dbe5f9
·
1 Parent(s): 7f61e71

Drastically simplify VayuChat for reliability and better UX

Browse files

QUESTIONS SIMPLIFIED:
- Remove complex visualizations (windrose, polar plots, advanced maps)
- Focus on basic analysis: trends, comparisons, simple correlations
- Add "Getting Started" section with 10 simple questions (expanded by default)
- Organize remaining questions in collapsed categories

SYSTEM PROMPT STREAMLINED:
- Reduce from 140 lines to 36 lines - focus on essentials only
- Keep clear answer variable requirements: text, dataframe, or plot filename
- Emphasize simple matplotlib plotting over complex visualizations
- Basic data validation and error prevention rules

UI IMPROVEMENTS:
- Getting Started section expanded by default for new users
- Other categories collapsed to reduce overwhelm
- Simple, reliable questions that should work consistently

Net reduction: 157 lines removed, 61 lines added for better reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (3) hide show
  1. app.py +10 -1
  2. new_system_prompt.txt +21 -126
  3. questions.txt +30 -30
app.py CHANGED
@@ -651,8 +651,17 @@ with st.sidebar:
651
  # Show all questions but in a scrollable format
652
  if len(questions) > 0:
653
  st.markdown("**Select a question to analyze:**")
 
 
 
 
 
 
 
 
 
654
  # Create expandable sections for better organization
655
- with st.expander("📊 NCAP Funding & Policy Analysis", expanded=True):
656
  for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
657
  if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
658
  selected_prompt = q
 
651
  # Show all questions but in a scrollable format
652
  if len(questions) > 0:
653
  st.markdown("**Select a question to analyze:**")
654
+
655
+ # Getting Started section with simple questions
656
+ getting_started_questions = questions[:10] # First 10 simple questions
657
+ with st.expander("🚀 Getting Started - Simple Questions", expanded=True):
658
+ for i, q in enumerate(getting_started_questions):
659
+ if st.button(q, key=f"start_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
660
+ selected_prompt = q
661
+ st.session_state.last_selected_prompt = q
662
+
663
  # Create expandable sections for better organization
664
+ with st.expander("📊 NCAP Funding & Policy Analysis", expanded=False):
665
  for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
666
  if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
667
  selected_prompt = q
new_system_prompt.txt CHANGED
@@ -3,139 +3,34 @@ Generate Python code to answer the user's question about air quality data.
3
  CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
4
 
5
  AVAILABLE LIBRARIES:
6
- You can use these pre-installed libraries:
7
  - pandas, numpy (data manipulation)
8
  - matplotlib, seaborn, plotly (visualization)
9
- - statsmodels (statistical modeling, trend analysis)
10
- - scikit-learn (machine learning, regression)
11
  - geopandas (geospatial analysis)
12
 
13
- LIBRARY USAGE RULES:
14
- - For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
15
- - For regression: Use sklearn.linear_model.LinearRegression() for robust regression
16
- - For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
17
- - Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
18
- - Handle missing libraries gracefully with try-except around imports
19
 
20
- OUTPUT TYPE REQUIREMENTS:
21
- 1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
22
- - MUST create matplotlib figure with proper labels, title, legend
23
- - MUST save plot: filename = f"plot_{uuid.uuid4().hex[:8]}.png"
24
- - MUST call plt.savefig(filename, dpi=300, bbox_inches='tight')
25
- - MUST call plt.close() to prevent memory leaks
26
- - MUST store filename in 'answer' variable: answer = filename
27
- - Handle empty data gracefully before plotting
28
 
29
- 2. TEXT ANSWERS (for simple "Which", "What", single values):
30
- - Store direct string answer in 'answer' variable
31
- - Example: answer = "December had the highest pollution"
 
 
 
32
 
33
- 3. DATAFRAMES (for lists, rankings, comparisons, multiple results):
34
- - Create clean DataFrame with descriptive column names
35
- - Sort appropriately for readability
36
- - Store DataFrame in 'answer' variable: answer = result_df
37
-
38
- MANDATORY SAFETY & ROBUSTNESS RULES:
39
-
40
- ROBUST DATA VALIDATION (MANDATORY):
41
- - Check DataFrame exists: if df.empty: answer = "No data available"
42
- - LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
43
- - Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
44
- - Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
45
- - NEVER assume external files exist: check with try/except or provide alternative approach
46
- - Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
47
-
48
- OPERATION SAFETY (PREVENT CRASHES):
49
- - ALWAYS use try/except for complex operations with fallback to simpler approach
50
- - START SIMPLE: Use basic pandas operations before trying advanced techniques
51
- - For mapping/visualization: Use scatter plots if complex maps fail
52
- - For correlation: Use simple .corr() before advanced statistical methods
53
- - Check denominators before division: if denominator == 0: continue
54
- - Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
55
- - Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
56
- - NO return statements - use if/else logic flow with proper answer assignment
57
-
58
- PLOT GENERATION (MANDATORY FOR PLOTS):
59
- - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
60
- - Always create new figure: plt.figure(figsize=(12, 8))
61
- - Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
62
- - Handle long city names: plt.xticks(rotation=45, ha='right')
63
- - Use tight layout: plt.tight_layout()
64
- - CRITICAL PLOT SAVING SEQUENCE (no return statements):
65
- 1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
66
- 2. plt.savefig(filename, dpi=300, bbox_inches='tight')
67
- 3. plt.close()
68
- 4. answer = filename
69
- - Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
70
-
71
- CRITICAL CODING PRACTICES:
72
-
73
- DATA VALIDATION & SAFETY:
74
- - Always check if DataFrames/Series are empty before operations: if df.empty: answer = "No data available"; exit()
75
- - Use .dropna() to handle missing values or .fillna() with appropriate defaults
76
- - Validate column names exist before accessing: if 'column' in df.columns: else: answer = "Column not found"
77
- - Check data types before operations: df['col'].dtype, isinstance() checks
78
- - Handle edge cases: empty results, single row/column DataFrames, all NaN columns
79
- - Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
80
-
81
- ROBUST ANALYSIS APPROACHES:
82
-
83
- GEOGRAPHICAL/MAPPING QUESTIONS:
84
- - PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
85
- - FALLBACK: If geographical data missing, use bar charts by state/city
86
- - NEVER assume external shapefiles exist - always have simple alternative
87
-
88
- CORRELATION/RELATIONSHIP ANALYSIS:
89
- - Filter by location FIRST if question asks about specific city
90
- - Use .dropna() and check len(data) > 50 for reliable correlations
91
- - If complex analysis fails, use simple scatter plots with trend lines
92
- - Report "insufficient data" rather than NaN/meaningless results
93
-
94
- METEOROLOGICAL ANALYSIS:
95
- - Check if weather columns have sufficient non-null values before analysis
96
- - Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
97
- - For complex plots, provide simple bar/line chart fallback
98
- - Group by time periods (month/season) if daily data is too sparse
99
-
100
- VARIABLE & TYPE HANDLING:
101
- - Use descriptive variable names (avoid single letters in complex operations)
102
- - Ensure all variables are defined before use - initialize with defaults
103
- - Convert pandas/numpy objects to proper Python types before operations
104
- - Convert datetime/period objects appropriately: .astype(str), .dt.strftime(), int()
105
- - Always cast to appropriate types for indexing: int(), str(), list()
106
- - CRITICAL: Convert pandas/numpy values to int before list indexing: int(value) for calendar.month_name[int(month_value)]
107
- - Use explicit type conversions rather than relying on implicit casting
108
-
109
- PANDAS OPERATIONS:
110
- - Reference DataFrame properly: df['column'] not 'column' in operations
111
- - Use .loc/.iloc correctly for indexing - avoid chained indexing
112
- - Use .reset_index() after groupby operations when needed for clean DataFrames
113
- - Sort results for consistent output: .sort_values(), .sort_index()
114
- - Use .round() for numerical results to avoid excessive decimals
115
- - Chain operations carefully - split complex chains for readability
116
-
117
- MATPLOTLIB & PLOTTING:
118
- - Always call plt.close() after saving plots to prevent memory leaks
119
- - Use descriptive titles, axis labels, and legends
120
- - Handle cases where no data exists for plotting
121
- - Use proper figure sizing: plt.figure(figsize=(width, height))
122
- - Convert datetime indices to strings for plotting if needed
123
- - Use color palettes consistently
124
-
125
- ERROR PREVENTION:
126
- - Use try-except blocks for operations that might fail
127
- - Check denominators before division operations
128
- - Validate array/list lengths before indexing
129
- - Use .get() method for dictionary access with defaults
130
- - Handle timezone-aware vs naive datetime objects consistently
131
- - Use proper string formatting and encoding for text output
132
 
133
  TECHNICAL REQUIREMENTS:
134
  - Save final result in variable called 'answer'
135
- - For TEXT: Store the direct answer as a string in 'answer'
136
- - For PLOTS: Save with unique filename f"plot_{{uuid.uuid4().hex[:8]}}.png" and store filename in 'answer'
137
- - For DATAFRAMES: Store the pandas DataFrame directly in 'answer' (e.g., answer = result_df)
138
- - Always use .iloc or .loc properly for pandas indexing
139
- - Close matplotlib figures with plt.close() to prevent memory leaks
140
- - Use proper column name checks before accessing columns
141
- - For dataframes, ensure proper column names and sorting for readability
 
3
  CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
4
 
5
  AVAILABLE LIBRARIES:
 
6
  - pandas, numpy (data manipulation)
7
  - matplotlib, seaborn, plotly (visualization)
8
+ - statsmodels, scikit-learn (analysis)
 
9
  - geopandas (geospatial analysis)
10
 
11
+ ESSENTIAL RULES:
 
 
 
 
 
12
 
13
+ DATA SAFETY:
14
+ - Always check if data exists: if df.empty: answer = "No data available"
15
+ - For city-specific questions: filter first: df_city = df[df['City'].str.contains('CityName', case=False)]
16
+ - Check sufficient data: if len(df_filtered) < 10: answer = "Insufficient data"
17
+ - Use .dropna() to remove missing values before analysis
 
 
 
18
 
19
+ PLOTTING REQUIREMENTS:
20
+ - Create plots for visualization requests: plt.figure(figsize=(12, 8))
21
+ - Save plots: filename = f"plot_{uuid.uuid4().hex[:8]}.png"; plt.savefig(filename, dpi=300, bbox_inches='tight')
22
+ - Close plots: plt.close()
23
+ - Store filename: answer = filename
24
+ - For non-plots: answer = "text result"
25
 
26
+ BASIC ERROR PREVENTION:
27
+ - Use try/except for complex operations
28
+ - Validate results: if pd.isna(result): answer = "Analysis inconclusive"
29
+ - For correlations: check len(data) > 20 before calculating
30
+ - Use simple matplotlib plotting - avoid complex visualizations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  TECHNICAL REQUIREMENTS:
33
  - Save final result in variable called 'answer'
34
+ - Use exact column names: 'PM2.5 (µg/m³)', 'WS (m/s)', etc.
35
+ - Handle dates with pd.to_datetime() if needed
36
+ - Round numerical results: round(value, 2)
 
 
 
 
questions.txt CHANGED
@@ -1,30 +1,30 @@
1
- How much NCAP funding did Delhi receive vs its PM2.5 improvement from 2018-2023?
2
- Which NCAP cities achieved the best PM2.5 reduction per rupee invested?
3
- Does wind speed above 5 m/s significantly reduce PM2.5 levels in Delhi?
4
- Show correlation between rainfall and PM2.5 reduction in Mumbai during monsoon
5
- Which cities with high population have dangerously high PM2.5 exposure levels?
6
- Compare winter PM2.5 levels: high-funded vs low-funded NCAP cities
7
- Does temperature increase correlate with ozone levels in Chennai during summer?
8
- Plot wind direction vs PM2.5 concentration rose diagram for Delhi in November
9
- Which meteorological factor most influences PM2.5 reduction in Ahmedabad?
10
- Rank NCAP cities by pollution improvement efficiency per capita funding
11
- Show monthly PM2.5 trends for top 5 most populated Indian cities
12
- Does humidity above 70% help reduce PM10 levels in coastal cities?
13
- Compare NO2 vs PM2.5 correlation in traffic-heavy vs residential areas
14
- Which NCAP-funded cities still exceed WHO PM2.5 guidelines despite investment?
15
- Plot seasonal wind patterns vs pollution levels for North Indian cities
16
- Show population-weighted pollution exposure map across Indian states
17
- Does solar radiation intensity affect ground-level ozone formation patterns?
18
- Compare NCAP investment effectiveness: Tier-1 vs Tier-2 cities
19
- Which high-population cities need emergency NCAP funding based on current PM2.5?
20
- Show correlation between barometric pressure and pollution accumulation
21
- Does monsoon season consistently reduce all pollutant levels nationwide?
22
- Compare multi-pollutant exposure: children vs adults in high-density cities
23
- Which cities show pollution improvement correlated with NCAP timeline?
24
- Plot wind speed threshold for effective pollution dispersion by region
25
- Show relationship between city population density and average PM2.5 exposure
26
- Compare Ozone-PM2.5-NO2 interaction patterns in Delhi vs Mumbai
27
- Does vector wind speed predict pollution episodes better than average wind speed?
28
- Which NCAP phases (1,2,3) showed maximum pollution reduction per investment?
29
- Show real-time impact: do meteorological alerts help predict pollution spikes?
30
- Create pollution risk index combining PM2.5, population, and meteorology data
 
1
+ Which city has the highest average PM2.5 levels in 2023?
2
+ Show monthly PM2.5 trends for Delhi in 2023
3
+ Compare PM2.5 levels between winter and summer months
4
+ Which month had the highest pollution levels in Mumbai?
5
+ Calculate average PM2.5 for all cities in November 2023
6
+ Rank top 10 cities by highest PM2.5 pollution levels
7
+ Show seasonal pollution patterns across all cities
8
+ Compare pollution levels between weekdays and weekends
9
+ Which cities exceed WHO PM2.5 guidelines of 15 µg/m³?
10
+ Plot yearly PM2.5 trends from 2020 to 2023 for major cities
11
+ How much NCAP funding did Delhi receive vs Mumbai?
12
+ Which NCAP cities achieved the best PM2.5 reduction?
13
+ Does wind speed above 3 m/s reduce PM2.5 levels in Delhi?
14
+ Show correlation between temperature and PM2.5 in summer months
15
+ Which cities with high population have dangerous PM2.5 levels?
16
+ Compare PM2.5 levels in high-funded vs low-funded NCAP cities
17
+ Does rainfall help reduce pollution levels during monsoon?
18
+ Which meteorological factor correlates most with PM2.5 reduction?
19
+ Show monthly PM2.5 trends for top 5 Indian cities by population
20
+ Does humidity above 80% help reduce pollution in coastal cities?
21
+ Compare NO2 vs PM2.5 levels in traffic-heavy areas
22
+ Which NCAP-funded cities still exceed WHO guidelines?
23
+ Show relationship between city population and average PM2.5
24
+ Compare PM2.5 improvement rates: Delhi vs Mumbai vs Kolkata
25
+ Create simple scatter plot of PM2.5 vs PM10 correlation
26
+ Show state-wise average PM2.5 levels for policy planning
27
+ Which cities need immediate intervention with PM2.5 above 60 µg/m³?
28
+ Compare pollution trends between North vs South Indian cities
29
+ Show seasonal variation in PM2.5 across different climate zones
30
+ Identify cities with consistent pollution improvement over time