Spaces:

SustainabilityLabIITGN
/

VayuChat

Running

Nipun Claude commited on Aug 25

Commit

7f61e71

1 Parent(s): 4038c51

Implement robust code generation with generic failure prevention

- Mandate location filtering for city-specific questions before analysis
- Add graceful degradation: start simple, fallback to alternatives
- Remove external file dependencies with scatter plot alternatives
- Increase data sufficiency thresholds: >20 basic, >50 correlations
- Add comprehensive result validation for NaN/inf values
- Replace specific error fixes with root cause prevention

Generic improvements address:
- Location-specific analysis failures (Ahmedabad correlation NaN)
- External dependency errors (missing shapefiles)
- Insufficient data problems across all question types
- Complex technique failures with simple fallbacks
- Meaningless results from sparse/invalid data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

new_system_prompt.txt +29 -27

new_system_prompt.txt CHANGED Viewed

@@ -37,21 +37,23 @@ OUTPUT TYPE REQUIREMENTS:
 MANDATORY SAFETY & ROBUSTNESS RULES:
-DATA VALIDATION (ALWAYS CHECK):
-- Check if DataFrame exists and not empty: if df.empty: answer = "No data available"
-- Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"
-- Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"
-- Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
-- Use early exit pattern: if condition: answer = "error message"; else: continue with analysis
 OPERATION SAFETY (PREVENT CRASHES):
-- Wrap risky operations in try-except blocks
 - Check denominators before division: if denominator == 0: continue
-- Validate indexing bounds: if idx >= len(array): continue
-- Check for empty results after filtering: if result_df.empty: answer = "No data found"
-- Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
-- Handle timezone issues with datetime operations
-- NO return statements - this is script context, use if/else logic flow
 PLOT GENERATION (MANDATORY FOR PLOTS):
 - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
@@ -76,24 +78,24 @@ DATA VALIDATION & SAFETY:
 - Handle edge cases: empty results, single row/column DataFrames, all NaN columns
 - Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
-SPECIFIC PLOT TYPE REQUIREMENTS:
-WIND ROSE DIAGRAMS:
-- Use matplotlib polar projection: fig, ax = plt.subplots(subplot_kw=dict(projection='polar'))
-- Create proper wind direction bins: wind_bins = np.arange(0, 361, 30)
-- Handle wind direction (0-360 degrees): convert to radians with np.radians()
-- Group by wind direction bins and calculate statistics properly
-CORRELATION ANALYSIS:
-- Always use .dropna() before correlation: df_clean = df.dropna(subset=[col1, col2])
-- Handle correlation matrices properly: corr = df_clean.corr()
-- Check for sufficient data: if len(df_clean) < 10: answer = "Insufficient data"
-- Use .abs() on Series, not float values: corr_series.abs() not float_value.abs()
-STATISTICAL ANALYSIS:
-- For wind speed thresholds: use boolean indexing df[df['WS (m/s)'] > 5.0]
-- For meteorological factors: handle missing weather data gracefully
-- For time-based analysis: ensure proper datetime conversion and filtering
 VARIABLE & TYPE HANDLING:
 - Use descriptive variable names (avoid single letters in complex operations)

 MANDATORY SAFETY & ROBUSTNESS RULES:
+ROBUST DATA VALIDATION (MANDATORY):
+- Check DataFrame exists: if df.empty: answer = "No data available"
+- LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
+- Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
+- Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
+- NEVER assume external files exist: check with try/except or provide alternative approach
+- Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
 OPERATION SAFETY (PREVENT CRASHES):
+- ALWAYS use try/except for complex operations with fallback to simpler approach
+- START SIMPLE: Use basic pandas operations before trying advanced techniques
+- For mapping/visualization: Use scatter plots if complex maps fail
+- For correlation: Use simple .corr() before advanced statistical methods
 - Check denominators before division: if denominator == 0: continue
+- Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
+- Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
+- NO return statements - use if/else logic flow with proper answer assignment
 PLOT GENERATION (MANDATORY FOR PLOTS):
 - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
 - Handle edge cases: empty results, single row/column DataFrames, all NaN columns
 - Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
+ROBUST ANALYSIS APPROACHES:
+GEOGRAPHICAL/MAPPING QUESTIONS:
+- PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
+- FALLBACK: If geographical data missing, use bar charts by state/city
+- NEVER assume external shapefiles exist - always have simple alternative
+CORRELATION/RELATIONSHIP ANALYSIS:
+- Filter by location FIRST if question asks about specific city
+- Use .dropna() and check len(data) > 50 for reliable correlations
+- If complex analysis fails, use simple scatter plots with trend lines
+- Report "insufficient data" rather than NaN/meaningless results
+METEOROLOGICAL ANALYSIS:
+- Check if weather columns have sufficient non-null values before analysis
+- Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
+- For complex plots, provide simple bar/line chart fallback
+- Group by time periods (month/season) if daily data is too sparse
 VARIABLE & TYPE HANDLING:
 - Use descriptive variable names (avoid single letters in complex operations)