Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context. 
We categorize them for clarity and provide details on their format and expected characteristics.
This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics.

**I. Inputs**

The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories:

**A. Historical Sales Data:**

*   **Description:** Time-series data of past sales, at the most granular level possible (ideally SKU-store-day).
*   **Format:**
    *   **Structure:** Typically a tabular format (e.g., CSV, Parquet, database table).  Could also be a tensor if pre-processed for the Transformer.
    *   **Columns:**
        *   `timestamp`: Date and time of the sale (e.g., `YYYY-MM-DD HH:MM:SS` or a Unix timestamp).
        *   `sku`: Stock Keeping Unit (unique product identifier).
        *   `store_id`: Identifier for the store location.
        *   `quantity`: Number of units sold.
        *   `price`: Unit price at the time of sale.
        *   `discount`: Any discount applied (amount or percentage).
*   **Characteristics:**
    *   High frequency (daily or even hourly).
    *   Potentially millions or billions of rows.
    *   May exhibit seasonality, trends, and noise.

**B. Promotional Data:**

*   **Description:** Information about past, current, and *planned* promotional activities.
*   **Format:**
    *   **Structure:** Tabular format.
    *   **Columns:**
        *   `promotion_id`: Unique identifier for the promotion.
        *   `sku`:  SKU(s) included in the promotion.
        *   `store_id`: Store(s) where the promotion is active.
        *   `start_date`: Start date of the promotion.
        *   `end_date`: End date of the promotion.
        *   `promotion_type`:  Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon").
        *   `discount_value`:  Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount).
        *   `marketing_spend`:  (Optional) Amount spent on advertising for the promotion.
*   **Characteristics:**
    *   Less frequent than sales data.
    *   Should include *future* planned promotions, which are crucial for forecasting.

**C. Inventory Data:**

*   **Description:**  Information about current and historical inventory levels.
*   **Format:**
    *   **Structure:** Tabular format.
    *   **Columns:**
        *   `timestamp`: Date and time of the inventory snapshot.
        *   `sku`: Stock Keeping Unit.
        *   `store_id`: Store location (or warehouse ID for wholesalers).
        *   `quantity_on_hand`: Number of units currently in stock.
        *   `quantity_on_order`: Number of units ordered but not yet received.
        *   `reorder_point`:  (Optional) The inventory level at which a new order should be placed.
        *    `safety_stock` (Optional) Minimum stock.
*   **Characteristics:**
    *   Frequency can vary (daily, weekly).

**D. External Factors:**

*   **Description:** Data that is not directly related to sales or inventory but can influence demand.
*   **Format:**
    *   **Structure:** Can be tabular or time-series data from various sources.
    *   **Examples:**
        *   **Economic Indicators:**  GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.)
        *   **Weather Data:**  Temperature, precipitation, forecasts. (Time-series data from weather APIs.)
        *   **Holiday/Event Indicators:**  Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.)
        *   **Social Media Sentiment:**  Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.)
        *   **Web Traffic Data:**  Website visits, product page views, search queries. (Data from web analytics platforms.)
        *   **Competitor Data:**  Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers).
*   **Characteristics:**
    *   Varying frequencies and formats depending on the source.

**E. Product Metadata:**

*   **Description:**  Static information about the products.
*   **Format:**
    *   **Structure:** Tabular format.
    *   **Columns:**
        *   `sku`: Stock Keeping Unit.
        *   `product_category`:  Category the product belongs to.
        *   `product_subcategory`:  Subcategory.
        *   `brand`:  Brand name.
        *   `product_description`:  Textual description (may be used for embeddings).
        *   `price_tier`: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium").
* **Characteristics:**
    *   Relatively static; changes infrequently.

**F. Store Metadata:**

* **Description:** Static information of store.
* **Format:**
  * **Structure:** Tabular format.
    * **Columns:**
        *`store_id`: Unique store identifier.
        *`location`: City and state.
        *`store_type`: Physical, online, mixed.

**II. Outputs**

The outputs are the forecasts generated by the Transformer model.

**A. Probabilistic Forecasts:**

*   **Description:**  Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a *probability distribution* of future demand. This quantifies the uncertainty in the forecast.
*   **Format:**
    *   **Structure:**  Typically a set of quantiles (or percentiles) for each SKU-store-future time period.
    *   **Example:**  For SKU 123, store A, on 2024-07-04, the model might output:
        *   `p10`: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less)
        *   `p50`: 105 units (50th percentile - median forecast)
        *   `p90`: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less)
        *   ...and other quantiles as needed (e.g., p25, p75, p95, p99).
*   **Characteristics:**
    *   Provides a range of possible outcomes, allowing for risk-aware decision-making.
    *   Allows for calculation of confidence intervals.

**B. Forecast Horizon:**

*   **Description:** The length of time into the future for which the model generates forecasts.
*   **Format:**
    *   Defined by the model configuration and the needs of the business.  Could be days, weeks, or months.
    *   Typically specified as a number of time steps (e.g., 28 days, 12 weeks).
*   **Characteristics:**
    *   Longer horizons generally have greater uncertainty.

**C. Forecast Granularity:**

*   **Description:**  The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.).
*   **Format:**
    *   Determined by the model and the available data.
    *   Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts).

**D. Forecast Timestamps:**

*    **Description:**  The specific dates and times for which the forecasts are generated.
*   **Format:**
    *   A list or sequence of timestamps corresponding to the forecast horizon and granularity.
    *   Example: `[2024-07-04, 2024-07-05, 2024-07-06, ...]`

**E. (Optional) Explainability Outputs:**

*   **Description:**  Outputs that help explain *why* the model made a particular forecast.  This is especially important for building trust and understanding.
*   **Format:**
    *   **Attention Weights:**  For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction.
    *   **Feature Importance Scores:**  Estimates of the relative importance of different input features.
    *   **SHAP Values:**  A more sophisticated method for explaining individual predictions.
*   **Characteristics:**
    *   Can be complex to interpret, but provide valuable insights.

**Summary Table:**

| Category         | Description                                                                      | Format                                    | Characteristics                                                                 |
| ---------------- | -------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- |
| **Inputs**       |                                                                                  |                                            |                                                                                 |
| Historical Sales | Past sales data (SKU-store-day level)                                           | Tabular (timestamp, sku, store_id, quantity, price, discount) | High frequency, potentially large, may exhibit seasonality/trends/noise.       |
| Promotional Data | Past, current, and *planned* promotions                                          | Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) | Less frequent than sales data, includes future promotions.                       |
| Inventory Data   | Current and historical inventory levels                                         | Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) | Frequency varies (daily, weekly).                                                 |
| External Factors | Economic indicators, weather, holidays, social media, web traffic, competitors | Tabular or time-series (various)          | Varying frequencies and formats.                                                |
| Product Metadata | Static information about products                                               | Tabular (sku, category, subcategory, brand, description, price_tier)      | Relatively static.                                                               |
| Store Metadata      | Static information of store      | Tabular (store_id, location, store_type)        | Relatively static.

| **Outputs**        | Description                                            | Format                                                                     | Characteristics                                                      |
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------- |
| Probabilistic Forecasts | Probability distribution of future demand         | Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period | Provides a range of outcomes, quantifies uncertainty.              |
| Forecast Horizon   | Length of time into the future                        | Number of time steps (days, weeks, months)                                  | Longer horizons have greater uncertainty.                             |
| Forecast Granularity| Level of detail (SKU-store-day, SKU-region-week, etc.) | Determined by model and business needs                                      | Aligns with business requirements.                                   |
| Forecast Timestamps | Dates/times for which forecasts are generated       | List/sequence of timestamps                                                   | Corresponds to horizon and granularity.                              |
| Explainability (Optional) | Outputs that explain model predictions            | Attention weights, feature importance scores, SHAP values                   | Complex to interpret, but provide valuable insights.                 |