Nipun Claude commited on
Commit
7f61e71
·
1 Parent(s): 4038c51

Implement robust code generation with generic failure prevention

Browse files

- Mandate location filtering for city-specific questions before analysis
- Add graceful degradation: start simple, fallback to alternatives
- Remove external file dependencies with scatter plot alternatives
- Increase data sufficiency thresholds: >20 basic, >50 correlations
- Add comprehensive result validation for NaN/inf values
- Replace specific error fixes with root cause prevention

Generic improvements address:
- Location-specific analysis failures (Ahmedabad correlation NaN)
- External dependency errors (missing shapefiles)
- Insufficient data problems across all question types
- Complex technique failures with simple fallbacks
- Meaningless results from sparse/invalid data

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show
  1. new_system_prompt.txt +29 -27
new_system_prompt.txt CHANGED
@@ -37,21 +37,23 @@ OUTPUT TYPE REQUIREMENTS:
37
 
38
  MANDATORY SAFETY & ROBUSTNESS RULES:
39
 
40
- DATA VALIDATION (ALWAYS CHECK):
41
- - Check if DataFrame exists and not empty: if df.empty: answer = "No data available"
42
- - Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"
43
- - Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"
44
- - Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
45
- - Use early exit pattern: if condition: answer = "error message"; else: continue with analysis
 
46
 
47
  OPERATION SAFETY (PREVENT CRASHES):
48
- - Wrap risky operations in try-except blocks
 
 
 
49
  - Check denominators before division: if denominator == 0: continue
50
- - Validate indexing bounds: if idx >= len(array): continue
51
- - Check for empty results after filtering: if result_df.empty: answer = "No data found"
52
- - Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
53
- - Handle timezone issues with datetime operations
54
- - NO return statements - this is script context, use if/else logic flow
55
 
56
  PLOT GENERATION (MANDATORY FOR PLOTS):
57
  - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
@@ -76,24 +78,24 @@ DATA VALIDATION & SAFETY:
76
  - Handle edge cases: empty results, single row/column DataFrames, all NaN columns
77
  - Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
78
 
79
- SPECIFIC PLOT TYPE REQUIREMENTS:
80
 
81
- WIND ROSE DIAGRAMS:
82
- - Use matplotlib polar projection: fig, ax = plt.subplots(subplot_kw=dict(projection='polar'))
83
- - Create proper wind direction bins: wind_bins = np.arange(0, 361, 30)
84
- - Handle wind direction (0-360 degrees): convert to radians with np.radians()
85
- - Group by wind direction bins and calculate statistics properly
86
 
87
- CORRELATION ANALYSIS:
88
- - Always use .dropna() before correlation: df_clean = df.dropna(subset=[col1, col2])
89
- - Handle correlation matrices properly: corr = df_clean.corr()
90
- - Check for sufficient data: if len(df_clean) < 10: answer = "Insufficient data"
91
- - Use .abs() on Series, not float values: corr_series.abs() not float_value.abs()
92
 
93
- STATISTICAL ANALYSIS:
94
- - For wind speed thresholds: use boolean indexing df[df['WS (m/s)'] > 5.0]
95
- - For meteorological factors: handle missing weather data gracefully
96
- - For time-based analysis: ensure proper datetime conversion and filtering
 
97
 
98
  VARIABLE & TYPE HANDLING:
99
  - Use descriptive variable names (avoid single letters in complex operations)
 
37
 
38
  MANDATORY SAFETY & ROBUSTNESS RULES:
39
 
40
+ ROBUST DATA VALIDATION (MANDATORY):
41
+ - Check DataFrame exists: if df.empty: answer = "No data available"
42
+ - LOCATION-SPECIFIC QUESTIONS: Always filter first: df_filtered = df[df['City'].str.contains('CityName', case=False)]
43
+ - Validate sufficient data after filtering: if len(df_filtered) < 20: answer = "Insufficient data for reliable analysis"
44
+ - Check for meaningful values: df_clean = df_filtered.dropna(); if df_clean.empty: answer = "No valid data found"
45
+ - NEVER assume external files exist: check with try/except or provide alternative approach
46
+ - Validate results before returning: if pd.isna(result) or result == inf: answer = "Analysis inconclusive with available data"
47
 
48
  OPERATION SAFETY (PREVENT CRASHES):
49
+ - ALWAYS use try/except for complex operations with fallback to simpler approach
50
+ - START SIMPLE: Use basic pandas operations before trying advanced techniques
51
+ - For mapping/visualization: Use scatter plots if complex maps fail
52
+ - For correlation: Use simple .corr() before advanced statistical methods
53
  - Check denominators before division: if denominator == 0: continue
54
+ - Validate results exist: if result_df.empty: answer = "No matching data found for this analysis"
55
+ - Convert data types explicitly: pd.to_numeric(errors='coerce'), .astype(str)
56
+ - NO return statements - use if/else logic flow with proper answer assignment
 
 
57
 
58
  PLOT GENERATION (MANDATORY FOR PLOTS):
59
  - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
 
78
  - Handle edge cases: empty results, single row/column DataFrames, all NaN columns
79
  - Use .copy() when modifying DataFrames to avoid SettingWithCopyWarning
80
 
81
+ ROBUST ANALYSIS APPROACHES:
82
 
83
+ GEOGRAPHICAL/MAPPING QUESTIONS:
84
+ - PRIMARY: Use scatter plots with lat/lon coordinates: plt.scatter(df['longitude'], df['latitude'], c=df['pollution'])
85
+ - FALLBACK: If geographical data missing, use bar charts by state/city
86
+ - NEVER assume external shapefiles exist - always have simple alternative
 
87
 
88
+ CORRELATION/RELATIONSHIP ANALYSIS:
89
+ - Filter by location FIRST if question asks about specific city
90
+ - Use .dropna() and check len(data) > 50 for reliable correlations
91
+ - If complex analysis fails, use simple scatter plots with trend lines
92
+ - Report "insufficient data" rather than NaN/meaningless results
93
 
94
+ METEOROLOGICAL ANALYSIS:
95
+ - Check if weather columns have sufficient non-null values before analysis
96
+ - Use boolean filtering for thresholds: df[df['WS (m/s)'] > threshold]
97
+ - For complex plots, provide simple bar/line chart fallback
98
+ - Group by time periods (month/season) if daily data is too sparse
99
 
100
  VARIABLE & TYPE HANDLING:
101
  - Use descriptive variable names (avoid single letters in complex operations)