Spaces:

SustainabilityLabIITGN
/

VayuChat

Running

Nipun Claude commited on Aug 23

Commit

94a079d

1 Parent(s): 6fa1a97

Fix critical system prompt issues

- Remove 'return' statements from script context (causing syntax errors)
- Add library usage rules preferring numpy/sklearn over statsmodels
- Emphasize if/else logic flow instead of return statements
- Add graceful library import handling
- Guide toward simpler trend analysis with numpy.polyfit()

This fixes 'return outside function' and missing library errors.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

src.py +16 -10

src.py CHANGED Viewed

@@ -296,8 +296,13 @@ You can use these pre-installed libraries:
 - statsmodels (statistical modeling, trend analysis)
 - scikit-learn (machine learning, regression)
 - geopandas (geospatial analysis)
-Use appropriate libraries for trend analysis, regression, statistical tests, etc.
-For simple trends, prefer numpy.polyfit() over complex statistical libraries when possible.
 OUTPUT TYPE REQUIREMENTS:
 1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
@@ -320,32 +325,33 @@ OUTPUT TYPE REQUIREMENTS:
 MANDATORY SAFETY & ROBUSTNESS RULES:
 DATA VALIDATION (ALWAYS CHECK):
-- Check if DataFrame exists and not empty: if df.empty: answer = "No data available"; return
-- Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"; return
-- Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"; return
 - Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
-- Validate date ranges: ensure timestamps are within expected range
 OPERATION SAFETY (PREVENT CRASHES):
 - Wrap risky operations in try-except blocks
 - Check denominators before division: if denominator == 0: continue
 - Validate indexing bounds: if idx >= len(array): continue
-- Check for empty results after filtering: if result_df.empty: answer = "No data found"; return
 - Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
 - Handle timezone issues with datetime operations
 PLOT GENERATION (MANDATORY FOR PLOTS):
-- Check data exists before plotting: if plot_data.empty: answer = "No data to plot"; return
 - Always create new figure: plt.figure(figsize=(12, 8))
 - Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
 - Handle long city names: plt.xticks(rotation=45, ha='right')
 - Use tight layout: plt.tight_layout()
-- CRITICAL PLOT SAVING SEQUENCE:
   1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
   2. plt.savefig(filename, dpi=300, bbox_inches='tight')
   3. plt.close()
   4. answer = filename
-- Debug plot issues: print(f"Plot saved: {filename}") for testing
 CRITICAL CODING PRACTICES:

 - statsmodels (statistical modeling, trend analysis)
 - scikit-learn (machine learning, regression)
 - geopandas (geospatial analysis)
+LIBRARY USAGE RULES:
+- For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
+- For regression: Use sklearn.linear_model.LinearRegression() for robust regression
+- For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
+- Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
+- Handle missing libraries gracefully with try-except around imports
 OUTPUT TYPE REQUIREMENTS:
 1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
 MANDATORY SAFETY & ROBUSTNESS RULES:
 DATA VALIDATION (ALWAYS CHECK):
+- Check if DataFrame exists and not empty: if df.empty: answer = "No data available"
+- Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"
+- Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"
 - Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
+- Use early exit pattern: if condition: answer = "error message"; else: continue with analysis
 OPERATION SAFETY (PREVENT CRASHES):
 - Wrap risky operations in try-except blocks
 - Check denominators before division: if denominator == 0: continue
 - Validate indexing bounds: if idx >= len(array): continue
+- Check for empty results after filtering: if result_df.empty: answer = "No data found"
 - Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
 - Handle timezone issues with datetime operations
+- NO return statements - this is script context, use if/else logic flow
 PLOT GENERATION (MANDATORY FOR PLOTS):
+- Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
 - Always create new figure: plt.figure(figsize=(12, 8))
 - Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
 - Handle long city names: plt.xticks(rotation=45, ha='right')
 - Use tight layout: plt.tight_layout()
+- CRITICAL PLOT SAVING SEQUENCE (no return statements):
   1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
   2. plt.savefig(filename, dpi=300, bbox_inches='tight')
   3. plt.close()
   4. answer = filename
+- Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
 CRITICAL CODING PRACTICES: