Spaces:

SustainabilityLabIITGN
/

VayuChat

Running

Nipun Claude commited on Aug 25

Commit

8dbe5f9

1 Parent(s): 7f61e71

Drastically simplify VayuChat for reliability and better UX

QUESTIONS SIMPLIFIED:
- Remove complex visualizations (windrose, polar plots, advanced maps)
- Focus on basic analysis: trends, comparisons, simple correlations
- Add "Getting Started" section with 10 simple questions (expanded by default)
- Organize remaining questions in collapsed categories

SYSTEM PROMPT STREAMLINED:
- Reduce from 140 lines to 36 lines - focus on essentials only
- Keep clear answer variable requirements: text, dataframe, or plot filename
- Emphasize simple matplotlib plotting over complex visualizations
- Basic data validation and error prevention rules

UI IMPROVEMENTS:
- Getting Started section expanded by default for new users
- Other categories collapsed to reduce overwhelm
- Simple, reliable questions that should work consistently

Net reduction: 157 lines removed, 61 lines added for better reliability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (3) hide show

app.py +10 -1
new_system_prompt.txt +21 -126
questions.txt +30 -30

app.py CHANGED Viewed

@@ -651,8 +651,17 @@ with st.sidebar:
     # Show all questions but in a scrollable format
     if len(questions) > 0:
         st.markdown("**Select a question to analyze:**")
         # Create expandable sections for better organization
-        with st.expander("📊 NCAP Funding & Policy Analysis", expanded=True):
             for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
                 if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
                     selected_prompt = q

     # Show all questions but in a scrollable format
     if len(questions) > 0:
         st.markdown("**Select a question to analyze:**")
+        # Getting Started section with simple questions
+        getting_started_questions = questions[:10]  # First 10 simple questions
+        with st.expander("🚀 Getting Started - Simple Questions", expanded=True):
+            for i, q in enumerate(getting_started_questions):
+                if st.button(q, key=f"start_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
+                    selected_prompt = q
+                    st.session_state.last_selected_prompt = q
         # Create expandable sections for better organization
+        with st.expander("📊 NCAP Funding & Policy Analysis", expanded=False):
             for i, q in enumerate([q for q in questions if any(word in q.lower() for word in ['ncap', 'funding', 'investment', 'rupee'])]):
                 if st.button(q, key=f"ncap_q_{i}", use_container_width=True, help=f"Analyze: {q}"):
                     selected_prompt = q

new_system_prompt.txt CHANGED Viewed

@@ -3,139 +3,34 @@ Generate Python code to answer the user's question about air quality data.
 CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
 AVAILABLE LIBRARIES:
-You can use these pre-installed libraries:
 - pandas, numpy (data manipulation)
 - matplotlib, seaborn, plotly (visualization)
-- statsmodels (statistical modeling, trend analysis)
-- scikit-learn (machine learning, regression)
 - geopandas (geospatial analysis)
-LIBRARY USAGE RULES:
-- For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
-- For regression: Use sklearn.linear_model.LinearRegression() for robust regression
-- For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
-- Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
-- Handle missing libraries gracefully with try-except around imports
-OUTPUT TYPE REQUIREMENTS:
-1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
-   - MUST create matplotlib figure with proper labels, title, legend
-   - MUST save plot: filename = f"plot_{uuid.uuid4().hex[:8]}.png"
-   - MUST call plt.savefig(filename, dpi=300, bbox_inches='tight')
-   - MUST call plt.close() to prevent memory leaks
-   - MUST store filename in 'answer' variable: answer = filename
-   - Handle empty data gracefully before plotting
-2. TEXT ANSWERS (for simple "Which", "What", single values):
-   - Store direct string answer in 'answer' variable
-   - Example: answer = "December had the highest pollution"
-3. DATAFRAMES (for lists, rankings, comparisons, multiple results):
-   - Create clean DataFrame with descriptive column names
-   - Sort appropriately for readability
-   - Store DataFrame in 'answer' variable: answer = result_df
-MANDATORY SAFETY & ROBUSTNESS RULES:
-ROBUST DATA VALIDATION (MANDATORY):
-- Check DataFrame exists: if df.empty: answer = "No data available"
-- LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
-- Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
-- Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
-- NEVER assume external files exist: check with try/except or provide alternative approach
-- Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
-OPERATION SAFETY (PREVENT CRASHES):
-- ALWAYS use try/except for complex operations with fallback to simpler approach
-- START SIMPLE: Use basic pandas operations before trying advanced techniques
-- For mapping/visualization: Use scatter plots if complex maps fail
-- For correlation: Use simple .corr() before advanced statistical methods
-- Check denominators before division: if denominator == 0: continue
-- Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
-- Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
-- NO return statements - use if/else logic flow with proper answer assignment
-PLOT GENERATION (MANDATORY FOR PLOTS):
-- Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
-- Always create new figure: plt.figure(figsize=(12, 8))
-- Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
-- Handle long city names: plt.xticks(rotation=45, ha='right')
-- Use tight layout: plt.tight_layout()
-- CRITICAL PLOT SAVING SEQUENCE (no return statements):
-  1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
-  2. plt.savefig(filename, dpi=300, bbox_inches='tight')
-  3. plt.close()
-  4. answer = filename
-- Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
-CRITICAL CODING PRACTICES:
-DATA VALIDATION & SAFETY:
-- Always check if DataFrames/Series are empty before operations: if df.empty: answer = "No data available"; exit()
-- Use .dropna() to handle missing values or .fillna() with appropriate defaults
-- Validate column names exist before accessing: if 'column' in df.columns: else: answer = "Column not found"
-- Check data types before operations: df['col'].dtype, isinstance() checks
-- Handle edge cases: empty results, single row/column DataFrames, all NaN columns
-- Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
-ROBUST ANALYSIS APPROACHES:
-GEOGRAPHICAL/MAPPING QUESTIONS:
-- PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
-- FALLBACK: If geographical data missing, use bar charts by state/city
-- NEVER assume external shapefiles exist - always have simple alternative
-CORRELATION/RELATIONSHIP ANALYSIS:
-- Filter by location FIRST if question asks about specific city
-- Use .dropna() and check len(data) > 50 for reliable correlations
-- If complex analysis fails, use simple scatter plots with trend lines
-- Report "insufficient data" rather than NaN/meaningless results
-METEOROLOGICAL ANALYSIS:
-- Check if weather columns have sufficient non-null values before analysis
-- Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
-- For complex plots, provide simple bar/line chart fallback
-- Group by time periods (month/season) if daily data is too sparse
-VARIABLE & TYPE HANDLING:
-- Use descriptive variable names (avoid single letters in complex operations)
-- Ensure all variables are defined before use - initialize with defaults
-- Convert pandas/numpy objects to proper Python types before operations
-- Convert datetime/period objects appropriately: .astype(str), .dt.strftime(), int()
-- Always cast to appropriate types for indexing: int(), str(), list()
-- CRITICAL: Convert pandas/numpy values to int before list indexing: int(value) for calendar.month_name[int(month_value)]
-- Use explicit type conversions rather than relying on implicit casting
-PANDAS OPERATIONS:
-- Reference DataFrame properly: df['column'] not 'column' in operations
-- Use .loc/.iloc correctly for indexing - avoid chained indexing
-- Use .reset_index() after groupby operations when needed for clean DataFrames
-- Sort results for consistent output: .sort_values(), .sort_index()
-- Use .round() for numerical results to avoid excessive decimals
-- Chain operations carefully - split complex chains for readability
-MATPLOTLIB & PLOTTING:
-- Always call plt.close() after saving plots to prevent memory leaks
-- Use descriptive titles, axis labels, and legends
-- Handle cases where no data exists for plotting
-- Use proper figure sizing: plt.figure(figsize=(width, height))
-- Convert datetime indices to strings for plotting if needed
-- Use color palettes consistently
-ERROR PREVENTION:
-- Use try-except blocks for operations that might fail
-- Check denominators before division operations
-- Validate array/list lengths before indexing
-- Use .get() method for dictionary access with defaults
-- Handle timezone-aware vs naive datetime objects consistently
-- Use proper string formatting and encoding for text output
 TECHNICAL REQUIREMENTS:
 - Save final result in variable called 'answer'
-- For TEXT: Store the direct answer as a string in 'answer'
-- For PLOTS: Save with unique filename f"plot_{{uuid.uuid4().hex[:8]}}.png" and store filename in 'answer'
-- For DATAFRAMES: Store the pandas DataFrame directly in 'answer' (e.g., answer = result_df)
-- Always use .iloc or .loc properly for pandas indexing
-- Close matplotlib figures with plt.close() to prevent memory leaks
-- Use proper column name checks before accessing columns
-- For dataframes, ensure proper column names and sorting for readability

 CRITICAL: Only generate Python code - no explanations, no thinking, just clean executable code.
 AVAILABLE LIBRARIES:
 - pandas, numpy (data manipulation)
 - matplotlib, seaborn, plotly (visualization)
+- statsmodels, scikit-learn (analysis)
 - geopandas (geospatial analysis)
+ESSENTIAL RULES:
+DATA SAFETY:
+- Always check if data exists: if df.empty: answer = "No data available"
+- For city-specific questions: filter first: df_city = df[df['City'].str.contains('CityName', case=False)]
+- Check sufficient data: if len(df_filtered) < 10: answer = "Insufficient data"
+- Use .dropna() to remove missing values before analysis
+PLOTTING REQUIREMENTS:
+- Create plots for visualization requests: plt.figure(figsize=(12, 8))
+- Save plots: filename = f"plot_{uuid.uuid4().hex[:8]}.png"; plt.savefig(filename, dpi=300, bbox_inches='tight')
+- Close plots: plt.close()
+- Store filename: answer = filename
+- For non-plots: answer = "text result"
+BASIC ERROR PREVENTION:
+- Use try/except for complex operations
+- Validate results: if pd.isna(result): answer = "Analysis inconclusive"
+- For correlations: check len(data) > 20 before calculating
+- Use simple matplotlib plotting - avoid complex visualizations
 TECHNICAL REQUIREMENTS:
 - Save final result in variable called 'answer'
+- Use exact column names: 'PM2.5 (µg/m³)', 'WS (m/s)', etc.
+- Handle dates with pd.to_datetime() if needed
+- Round numerical results: round(value, 2)

questions.txt CHANGED Viewed

@@ -1,30 +1,30 @@
-How much NCAP funding did Delhi receive vs its PM2.5 improvement from 2018-2023?
-Which NCAP cities achieved the best PM2.5 reduction per rupee invested?
-Does wind speed above 5 m/s significantly reduce PM2.5 levels in Delhi?
-Show correlation between rainfall and PM2.5 reduction in Mumbai during monsoon
-Which cities with high population have dangerously high PM2.5 exposure levels?
-Compare winter PM2.5 levels: high-funded vs low-funded NCAP cities
-Does temperature increase correlate with ozone levels in Chennai during summer?
-Plot wind direction vs PM2.5 concentration rose diagram for Delhi in November
-Which meteorological factor most influences PM2.5 reduction in Ahmedabad?
-Rank NCAP cities by pollution improvement efficiency per capita funding
-Show monthly PM2.5 trends for top 5 most populated Indian cities
-Does humidity above 70% help reduce PM10 levels in coastal cities?
-Compare NO2 vs PM2.5 correlation in traffic-heavy vs residential areas
-Which NCAP-funded cities still exceed WHO PM2.5 guidelines despite investment?
-Plot seasonal wind patterns vs pollution levels for North Indian cities
-Show population-weighted pollution exposure map across Indian states
-Does solar radiation intensity affect ground-level ozone formation patterns?
-Compare NCAP investment effectiveness: Tier-1 vs Tier-2 cities
-Which high-population cities need emergency NCAP funding based on current PM2.5?
-Show correlation between barometric pressure and pollution accumulation
-Does monsoon season consistently reduce all pollutant levels nationwide?
-Compare multi-pollutant exposure: children vs adults in high-density cities
-Which cities show pollution improvement correlated with NCAP timeline?
-Plot wind speed threshold for effective pollution dispersion by region
-Show relationship between city population density and average PM2.5 exposure
-Compare Ozone-PM2.5-NO2 interaction patterns in Delhi vs Mumbai
-Does vector wind speed predict pollution episodes better than average wind speed?
-Which NCAP phases (1,2,3) showed maximum pollution reduction per investment?
-Show real-time impact: do meteorological alerts help predict pollution spikes?
-Create pollution risk index combining PM2.5, population, and meteorology data

+Which city has the highest average PM2.5 levels in 2023?
+Show monthly PM2.5 trends for Delhi in 2023
+Compare PM2.5 levels between winter and summer months
+Which month had the highest pollution levels in Mumbai?
+Calculate average PM2.5 for all cities in November 2023
+Rank top 10 cities by highest PM2.5 pollution levels
+Show seasonal pollution patterns across all cities
+Compare pollution levels between weekdays and weekends
+Which cities exceed WHO PM2.5 guidelines of 15 µg/m³?
+Plot yearly PM2.5 trends from 2020 to 2023 for major cities
+How much NCAP funding did Delhi receive vs Mumbai?
+Which NCAP cities achieved the best PM2.5 reduction?
+Does wind speed above 3 m/s reduce PM2.5 levels in Delhi?
+Show correlation between temperature and PM2.5 in summer months
+Which cities with high population have dangerous PM2.5 levels?
+Compare PM2.5 levels in high-funded vs low-funded NCAP cities
+Does rainfall help reduce pollution levels during monsoon?
+Which meteorological factor correlates most with PM2.5 reduction?
+Show monthly PM2.5 trends for top 5 Indian cities by population
+Does humidity above 80% help reduce pollution in coastal cities?
+Compare NO2 vs PM2.5 levels in traffic-heavy areas
+Which NCAP-funded cities still exceed WHO guidelines?
+Show relationship between city population and average PM2.5
+Compare PM2.5 improvement rates: Delhi vs Mumbai vs Kolkata
+Create simple scatter plot of PM2.5 vs PM10 correlation
+Show state-wise average PM2.5 levels for policy planning
+Which cities need immediate intervention with PM2.5 above 60 µg/m³?
+Compare pollution trends between North vs South Indian cities
+Show seasonal variation in PM2.5 across different climate zones
+Identify cities with consistent pollution improvement over time