Nipun Claude commited on
Commit
94a079d
·
1 Parent(s): 6fa1a97

Fix critical system prompt issues

Browse files

- Remove 'return' statements from script context (causing syntax errors)
- Add library usage rules preferring numpy/sklearn over statsmodels
- Emphasize if/else logic flow instead of return statements
- Add graceful library import handling
- Guide toward simpler trend analysis with numpy.polyfit()

This fixes 'return outside function' and missing library errors.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show
  1. src.py +16 -10
src.py CHANGED
@@ -296,8 +296,13 @@ You can use these pre-installed libraries:
296
  - statsmodels (statistical modeling, trend analysis)
297
  - scikit-learn (machine learning, regression)
298
  - geopandas (geospatial analysis)
299
- Use appropriate libraries for trend analysis, regression, statistical tests, etc.
300
- For simple trends, prefer numpy.polyfit() over complex statistical libraries when possible.
 
 
 
 
 
301
 
302
  OUTPUT TYPE REQUIREMENTS:
303
  1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
@@ -320,32 +325,33 @@ OUTPUT TYPE REQUIREMENTS:
320
  MANDATORY SAFETY & ROBUSTNESS RULES:
321
 
322
  DATA VALIDATION (ALWAYS CHECK):
323
- - Check if DataFrame exists and not empty: if df.empty: answer = "No data available"; return
324
- - Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"; return
325
- - Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"; return
326
  - Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
327
- - Validate date ranges: ensure timestamps are within expected range
328
 
329
  OPERATION SAFETY (PREVENT CRASHES):
330
  - Wrap risky operations in try-except blocks
331
  - Check denominators before division: if denominator == 0: continue
332
  - Validate indexing bounds: if idx >= len(array): continue
333
- - Check for empty results after filtering: if result_df.empty: answer = "No data found"; return
334
  - Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
335
  - Handle timezone issues with datetime operations
 
336
 
337
  PLOT GENERATION (MANDATORY FOR PLOTS):
338
- - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"; return
339
  - Always create new figure: plt.figure(figsize=(12, 8))
340
  - Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
341
  - Handle long city names: plt.xticks(rotation=45, ha='right')
342
  - Use tight layout: plt.tight_layout()
343
- - CRITICAL PLOT SAVING SEQUENCE:
344
  1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
345
  2. plt.savefig(filename, dpi=300, bbox_inches='tight')
346
  3. plt.close()
347
  4. answer = filename
348
- - Debug plot issues: print(f"Plot saved: {filename}") for testing
349
 
350
  CRITICAL CODING PRACTICES:
351
 
 
296
  - statsmodels (statistical modeling, trend analysis)
297
  - scikit-learn (machine learning, regression)
298
  - geopandas (geospatial analysis)
299
+
300
+ LIBRARY USAGE RULES:
301
+ - For trend analysis: Use numpy.polyfit(x, y, 1) for simple linear trends
302
+ - For regression: Use sklearn.linear_model.LinearRegression() for robust regression
303
+ - For statistical modeling: Use statsmodels only if needed, otherwise use numpy/sklearn
304
+ - Always import libraries at the top: import numpy as np, from sklearn.linear_model import LinearRegression
305
+ - Handle missing libraries gracefully with try-except around imports
306
 
307
  OUTPUT TYPE REQUIREMENTS:
308
  1. PLOT GENERATION (for "plot", "chart", "visualize", "show trend", "graph"):
 
325
  MANDATORY SAFETY & ROBUSTNESS RULES:
326
 
327
  DATA VALIDATION (ALWAYS CHECK):
328
+ - Check if DataFrame exists and not empty: if df.empty: answer = "No data available"
329
+ - Validate required columns exist: if 'PM2.5' not in df.columns: answer = "Required data not available"
330
+ - Check for sufficient data: if len(df) < 10: answer = "Insufficient data for analysis"
331
  - Remove invalid/missing values: df = df.dropna(subset=['PM2.5', 'city', 'Timestamp'])
332
+ - Use early exit pattern: if condition: answer = "error message"; else: continue with analysis
333
 
334
  OPERATION SAFETY (PREVENT CRASHES):
335
  - Wrap risky operations in try-except blocks
336
  - Check denominators before division: if denominator == 0: continue
337
  - Validate indexing bounds: if idx >= len(array): continue
338
+ - Check for empty results after filtering: if result_df.empty: answer = "No data found"
339
  - Convert data types explicitly: pd.to_numeric(), .astype(int), .astype(str)
340
  - Handle timezone issues with datetime operations
341
+ - NO return statements - this is script context, use if/else logic flow
342
 
343
  PLOT GENERATION (MANDATORY FOR PLOTS):
344
+ - Check data exists before plotting: if plot_data.empty: answer = "No data to plot"
345
  - Always create new figure: plt.figure(figsize=(12, 8))
346
  - Add comprehensive labels: plt.title(), plt.xlabel(), plt.ylabel()
347
  - Handle long city names: plt.xticks(rotation=45, ha='right')
348
  - Use tight layout: plt.tight_layout()
349
+ - CRITICAL PLOT SAVING SEQUENCE (no return statements):
350
  1. filename = f"plot_{uuid.uuid4().hex[:8]}.png"
351
  2. plt.savefig(filename, dpi=300, bbox_inches='tight')
352
  3. plt.close()
353
  4. answer = filename
354
+ - Use if/else logic: if data_valid: create_plot(); answer = filename else: answer = "error"
355
 
356
  CRITICAL CODING PRACTICES:
357