Drastically simplify VayuChat for reliability and better UX
Browse filesQUESTIONS SIMPLIFIED:
- Remove complex visualizations (windrose, polar plots, advanced maps)
- Focus on basic analysis: trends, comparisons, simple correlations
- Add "Getting Started" section with 10 simple questions (expanded by default)
- Organize remaining questions in collapsed categories
SYSTEM PROMPT STREAMLINED:
- Reduce from 140 lines to 36 lines - focus on essentials only
- Keep clear answer variable requirements: text, dataframe, or plot filename
- Emphasize simple matplotlib plotting over complex visualizations
- Basic data validation and error prevention rules
UI IMPROVEMENTS:
- Getting Started section expanded by default for new users
- Other categories collapsed to reduce overwhelm
- Simple, reliable questions that should work consistently
Net reduction: 157 lines removed, 61 lines added for better reliability
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
- app.py +10 -1
- new_system_prompt.txt +21 -126
- questions.txt +30 -30
@@ -651,8 +651,17 @@ with st.sidebar:
|
|
651 |
# Show all questions but in a scrollable format
|
652 |
if len(questions) > 0:
|
653 |
st.markdown("**Select a question to analyze:**")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
654 |
# Create expandable sections for better organization
|
655 |
-
with st.expander("📊 NCAP Funding & Policy Analysis", expanded=
|
656 |
for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
|
657 |
if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
658 |
selected_prompt = q
|
|
|
651 |
# Show all questions but in a scrollable format
|
652 |
if len(questions) > 0:
|
653 |
st.markdown("**Select a question to analyze:**")
|
654 |
+
|
655 |
+
# Getting Started section with simple questions
|
656 |
+
getting_started_questions = questions[:10] # First 10 simple questions
|
657 |
+
with st.expander("🚀 Getting Started - Simple Questions", expanded=True):
|
658 |
+
for i, q in enumerate(getting_started_questions):
|
659 |
+
if st.button(q, key=f"start_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
660 |
+
selected_prompt = q
|
661 |
+
st.session_state.last_selected_prompt = q
|
662 |
+
|
663 |
# Create expandable sections for better organization
|
664 |
+
with st.expander("📊 NCAP Funding & Policy Analysis", expanded=False):
|
665 |
for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
|
666 |
if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
|
667 |
selected_prompt = q
|
@@ -3,139 +3,34 @@ Generate Python code to answer the user's question about air quality data.
|
|
3 |
CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
|
4 |
|
5 |
AVAILABLE LIBRARIES:
|
6 |
-
You can use these pre-installed libraries:
|
7 |
- pandas, numpy (data manipulation)
|
8 |
- matplotlib, seaborn, plotly (visualization)
|
9 |
-
- statsmodels
|
10 |
-
- scikit-learn (machine learning, regression)
|
11 |
- geopandas (geospatial analysis)
|
12 |
|
13 |
-
|
14 |
-
- For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
|
15 |
-
- For regression: Use sklearn.linear_model.LinearRegression() for robust regression
|
16 |
-
- For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
|
17 |
-
- Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
|
18 |
-
- Handle missing libraries gracefully with try-except around imports
|
19 |
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
- MUST call plt.close() to prevent memory leaks
|
26 |
-
- MUST store filename in 'answer' variable: answer = filename
|
27 |
-
- Handle empty data gracefully before plotting
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
|
|
|
|
|
|
|
32 |
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
MANDATORY SAFETY & ROBUSTNESS RULES:
|
39 |
-
|
40 |
-
ROBUST DATA VALIDATION (MANDATORY):
|
41 |
-
- Check DataFrame exists: if df.empty: answer = "No data available"
|
42 |
-
- LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
|
43 |
-
- Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
|
44 |
-
- Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
|
45 |
-
- NEVER assume external files exist: check with try/except or provide alternative approach
|
46 |
-
- Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
|
47 |
-
|
48 |
-
OPERATION SAFETY (PREVENT CRASHES):
|
49 |
-
- ALWAYS use try/except for complex operations with fallback to simpler approach
|
50 |
-
- START SIMPLE: Use basic pandas operations before trying advanced techniques
|
51 |
-
- For mapping/visualization: Use scatter plots if complex maps fail
|
52 |
-
- For correlation: Use simple .corr() before advanced statistical methods
|
53 |
-
- Check denominators before division: if denominator == 0: continue
|
54 |
-
- Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
|
55 |
-
- Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
|
56 |
-
- NO return statements - use if/else logic flow with proper answer assignment
|
57 |
-
|
58 |
-
PLOT GENERATION (MANDATORY FOR PLOTS):
|
59 |
-
- Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
|
60 |
-
- Always create new figure: plt.figure(figsize=(12, 8))
|
61 |
-
- Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
|
62 |
-
- Handle long city names: plt.xticks(rotation=45, ha='right')
|
63 |
-
- Use tight layout: plt.tight_layout()
|
64 |
-
- CRITICAL PLOT SAVING SEQUENCE (no return statements):
|
65 |
-
1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
|
66 |
-
2. plt.savefig(filename, dpi=300, bbox_inches='tight')
|
67 |
-
3. plt.close()
|
68 |
-
4. answer = filename
|
69 |
-
- Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
|
70 |
-
|
71 |
-
CRITICAL CODING PRACTICES:
|
72 |
-
|
73 |
-
DATA VALIDATION & SAFETY:
|
74 |
-
- Always check if DataFrames/Series are empty before operations: if df.empty: answer = "No data available"; exit()
|
75 |
-
- Use .dropna() to handle missing values or .fillna() with appropriate defaults
|
76 |
-
- Validate column names exist before accessing: if 'column' in df.columns: else: answer = "Column not found"
|
77 |
-
- Check data types before operations: df['col'].dtype, isinstance() checks
|
78 |
-
- Handle edge cases: empty results, single row/column DataFrames, all NaN columns
|
79 |
-
- Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
|
80 |
-
|
81 |
-
ROBUST ANALYSIS APPROACHES:
|
82 |
-
|
83 |
-
GEOGRAPHICAL/MAPPING QUESTIONS:
|
84 |
-
- PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
|
85 |
-
- FALLBACK: If geographical data missing, use bar charts by state/city
|
86 |
-
- NEVER assume external shapefiles exist - always have simple alternative
|
87 |
-
|
88 |
-
CORRELATION/RELATIONSHIP ANALYSIS:
|
89 |
-
- Filter by location FIRST if question asks about specific city
|
90 |
-
- Use .dropna() and check len(data) > 50 for reliable correlations
|
91 |
-
- If complex analysis fails, use simple scatter plots with trend lines
|
92 |
-
- Report "insufficient data" rather than NaN/meaningless results
|
93 |
-
|
94 |
-
METEOROLOGICAL ANALYSIS:
|
95 |
-
- Check if weather columns have sufficient non-null values before analysis
|
96 |
-
- Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
|
97 |
-
- For complex plots, provide simple bar/line chart fallback
|
98 |
-
- Group by time periods (month/season) if daily data is too sparse
|
99 |
-
|
100 |
-
VARIABLE & TYPE HANDLING:
|
101 |
-
- Use descriptive variable names (avoid single letters in complex operations)
|
102 |
-
- Ensure all variables are defined before use - initialize with defaults
|
103 |
-
- Convert pandas/numpy objects to proper Python types before operations
|
104 |
-
- Convert datetime/period objects appropriately: .astype(str), .dt.strftime(), int()
|
105 |
-
- Always cast to appropriate types for indexing: int(), str(), list()
|
106 |
-
- CRITICAL: Convert pandas/numpy values to int before list indexing: int(value) for calendar.month_name[int(month_value)]
|
107 |
-
- Use explicit type conversions rather than relying on implicit casting
|
108 |
-
|
109 |
-
PANDAS OPERATIONS:
|
110 |
-
- Reference DataFrame properly: df['column'] not 'column' in operations
|
111 |
-
- Use .loc/.iloc correctly for indexing - avoid chained indexing
|
112 |
-
- Use .reset_index() after groupby operations when needed for clean DataFrames
|
113 |
-
- Sort results for consistent output: .sort_values(), .sort_index()
|
114 |
-
- Use .round() for numerical results to avoid excessive decimals
|
115 |
-
- Chain operations carefully - split complex chains for readability
|
116 |
-
|
117 |
-
MATPLOTLIB & PLOTTING:
|
118 |
-
- Always call plt.close() after saving plots to prevent memory leaks
|
119 |
-
- Use descriptive titles, axis labels, and legends
|
120 |
-
- Handle cases where no data exists for plotting
|
121 |
-
- Use proper figure sizing: plt.figure(figsize=(width, height))
|
122 |
-
- Convert datetime indices to strings for plotting if needed
|
123 |
-
- Use color palettes consistently
|
124 |
-
|
125 |
-
ERROR PREVENTION:
|
126 |
-
- Use try-except blocks for operations that might fail
|
127 |
-
- Check denominators before division operations
|
128 |
-
- Validate array/list lengths before indexing
|
129 |
-
- Use .get() method for dictionary access with defaults
|
130 |
-
- Handle timezone-aware vs naive datetime objects consistently
|
131 |
-
- Use proper string formatting and encoding for text output
|
132 |
|
133 |
TECHNICAL REQUIREMENTS:
|
134 |
- Save final result in variable called 'answer'
|
135 |
-
-
|
136 |
-
-
|
137 |
-
-
|
138 |
-
- Always use .iloc or .loc properly for pandas indexing
|
139 |
-
- Close matplotlib figures with plt.close() to prevent memory leaks
|
140 |
-
- Use proper column name checks before accessing columns
|
141 |
-
- For dataframes, ensure proper column names and sorting for readability
|
|
|
3 |
CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
|
4 |
|
5 |
AVAILABLE LIBRARIES:
|
|
|
6 |
- pandas, numpy (data manipulation)
|
7 |
- matplotlib, seaborn, plotly (visualization)
|
8 |
+
- statsmodels, scikit-learn (analysis)
|
|
|
9 |
- geopandas (geospatial analysis)
|
10 |
|
11 |
+
ESSENTIAL RULES:
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
+
DATA SAFETY:
|
14 |
+
- Always check if data exists: if df.empty: answer = "No data available"
|
15 |
+
- For city-specific questions: filter first: df_city = df[df['City'].str.contains('CityName', case=False)]
|
16 |
+
- Check sufficient data: if len(df_filtered) < 10: answer = "Insufficient data"
|
17 |
+
- Use .dropna() to remove missing values before analysis
|
|
|
|
|
|
|
18 |
|
19 |
+
PLOTTING REQUIREMENTS:
|
20 |
+
- Create plots for visualization requests: plt.figure(figsize=(12, 8))
|
21 |
+
- Save plots: filename = f"plot_{uuid.uuid4().hex[:8]}.png"; plt.savefig(filename, dpi=300, bbox_inches='tight')
|
22 |
+
- Close plots: plt.close()
|
23 |
+
- Store filename: answer = filename
|
24 |
+
- For non-plots: answer = "text result"
|
25 |
|
26 |
+
BASIC ERROR PREVENTION:
|
27 |
+
- Use try/except for complex operations
|
28 |
+
- Validate results: if pd.isna(result): answer = "Analysis inconclusive"
|
29 |
+
- For correlations: check len(data) > 20 before calculating
|
30 |
+
- Use simple matplotlib plotting - avoid complex visualizations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
TECHNICAL REQUIREMENTS:
|
33 |
- Save final result in variable called 'answer'
|
34 |
+
- Use exact column names: 'PM2.5 (µg/m³)', 'WS (m/s)', etc.
|
35 |
+
- Handle dates with pd.to_datetime() if needed
|
36 |
+
- Round numerical results: round(value, 2)
|
|
|
|
|
|
|
|
@@ -1,30 +1,30 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
Which
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
|
17 |
-
Does
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
Show
|
30 |
-
|
|
|
1 |
+
Which city has the highest average PM2.5 levels in 2023?
|
2 |
+
Show monthly PM2.5 trends for Delhi in 2023
|
3 |
+
Compare PM2.5 levels between winter and summer months
|
4 |
+
Which month had the highest pollution levels in Mumbai?
|
5 |
+
Calculate average PM2.5 for all cities in November 2023
|
6 |
+
Rank top 10 cities by highest PM2.5 pollution levels
|
7 |
+
Show seasonal pollution patterns across all cities
|
8 |
+
Compare pollution levels between weekdays and weekends
|
9 |
+
Which cities exceed WHO PM2.5 guidelines of 15 µg/m³?
|
10 |
+
Plot yearly PM2.5 trends from 2020 to 2023 for major cities
|
11 |
+
How much NCAP funding did Delhi receive vs Mumbai?
|
12 |
+
Which NCAP cities achieved the best PM2.5 reduction?
|
13 |
+
Does wind speed above 3 m/s reduce PM2.5 levels in Delhi?
|
14 |
+
Show correlation between temperature and PM2.5 in summer months
|
15 |
+
Which cities with high population have dangerous PM2.5 levels?
|
16 |
+
Compare PM2.5 levels in high-funded vs low-funded NCAP cities
|
17 |
+
Does rainfall help reduce pollution levels during monsoon?
|
18 |
+
Which meteorological factor correlates most with PM2.5 reduction?
|
19 |
+
Show monthly PM2.5 trends for top 5 Indian cities by population
|
20 |
+
Does humidity above 80% help reduce pollution in coastal cities?
|
21 |
+
Compare NO2 vs PM2.5 levels in traffic-heavy areas
|
22 |
+
Which NCAP-funded cities still exceed WHO guidelines?
|
23 |
+
Show relationship between city population and average PM2.5
|
24 |
+
Compare PM2.5 improvement rates: Delhi vs Mumbai vs Kolkata
|
25 |
+
Create simple scatter plot of PM2.5 vs PM10 correlation
|
26 |
+
Show state-wise average PM2.5 levels for policy planning
|
27 |
+
Which cities need immediate intervention with PM2.5 above 60 µg/m³?
|
28 |
+
Compare pollution trends between North vs South Indian cities
|
29 |
+
Show seasonal variation in PM2.5 across different climate zones
|
30 |
+
Identify cities with consistent pollution improvement over time
|