Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context. We categorize them for clarity and provide details on their format and expected characteristics. This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics. **I. Inputs** The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories: **A. Historical Sales Data:** * **Description:** Time-series data of past sales, at the most granular level possible (ideally SKU-store-day). * **Format:** * **Structure:** Typically a tabular format (e.g., CSV, Parquet, database table). Could also be a tensor if pre-processed for the Transformer. * **Columns:** * `timestamp`: Date and time of the sale (e.g., `YYYY-MM-DD HH:MM:SS` or a Unix timestamp). * `sku`: Stock Keeping Unit (unique product identifier). * `store_id`: Identifier for the store location. * `quantity`: Number of units sold. * `price`: Unit price at the time of sale. * `discount`: Any discount applied (amount or percentage). * **Characteristics:** * High frequency (daily or even hourly). * Potentially millions or billions of rows. * May exhibit seasonality, trends, and noise. **B. Promotional Data:** * **Description:** Information about past, current, and *planned* promotional activities. * **Format:** * **Structure:** Tabular format. * **Columns:** * `promotion_id`: Unique identifier for the promotion. * `sku`: SKU(s) included in the promotion. * `store_id`: Store(s) where the promotion is active. * `start_date`: Start date of the promotion. * `end_date`: End date of the promotion. * `promotion_type`: Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon"). * `discount_value`: Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount). * `marketing_spend`: (Optional) Amount spent on advertising for the promotion. * **Characteristics:** * Less frequent than sales data. * Should include *future* planned promotions, which are crucial for forecasting. **C. Inventory Data:** * **Description:** Information about current and historical inventory levels. * **Format:** * **Structure:** Tabular format. * **Columns:** * `timestamp`: Date and time of the inventory snapshot. * `sku`: Stock Keeping Unit. * `store_id`: Store location (or warehouse ID for wholesalers). * `quantity_on_hand`: Number of units currently in stock. * `quantity_on_order`: Number of units ordered but not yet received. * `reorder_point`: (Optional) The inventory level at which a new order should be placed. * `safety_stock` (Optional) Minimum stock. * **Characteristics:** * Frequency can vary (daily, weekly). **D. External Factors:** * **Description:** Data that is not directly related to sales or inventory but can influence demand. * **Format:** * **Structure:** Can be tabular or time-series data from various sources. * **Examples:** * **Economic Indicators:** GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.) * **Weather Data:** Temperature, precipitation, forecasts. (Time-series data from weather APIs.) * **Holiday/Event Indicators:** Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.) * **Social Media Sentiment:** Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.) * **Web Traffic Data:** Website visits, product page views, search queries. (Data from web analytics platforms.) * **Competitor Data:** Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers). * **Characteristics:** * Varying frequencies and formats depending on the source. **E. Product Metadata:** * **Description:** Static information about the products. * **Format:** * **Structure:** Tabular format. * **Columns:** * `sku`: Stock Keeping Unit. * `product_category`: Category the product belongs to. * `product_subcategory`: Subcategory. * `brand`: Brand name. * `product_description`: Textual description (may be used for embeddings). * `price_tier`: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium"). * **Characteristics:** * Relatively static; changes infrequently. **F. Store Metadata:** * **Description:** Static information of store. * **Format:** * **Structure:** Tabular format. * **Columns:** *`store_id`: Unique store identifier. *`location`: City and state. *`store_type`: Physical, online, mixed. **II. Outputs** The outputs are the forecasts generated by the Transformer model. **A. Probabilistic Forecasts:** * **Description:** Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a *probability distribution* of future demand. This quantifies the uncertainty in the forecast. * **Format:** * **Structure:** Typically a set of quantiles (or percentiles) for each SKU-store-future time period. * **Example:** For SKU 123, store A, on 2024-07-04, the model might output: * `p10`: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less) * `p50`: 105 units (50th percentile - median forecast) * `p90`: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less) * ...and other quantiles as needed (e.g., p25, p75, p95, p99). * **Characteristics:** * Provides a range of possible outcomes, allowing for risk-aware decision-making. * Allows for calculation of confidence intervals. **B. Forecast Horizon:** * **Description:** The length of time into the future for which the model generates forecasts. * **Format:** * Defined by the model configuration and the needs of the business. Could be days, weeks, or months. * Typically specified as a number of time steps (e.g., 28 days, 12 weeks). * **Characteristics:** * Longer horizons generally have greater uncertainty. **C. Forecast Granularity:** * **Description:** The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.). * **Format:** * Determined by the model and the available data. * Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts). **D. Forecast Timestamps:** * **Description:** The specific dates and times for which the forecasts are generated. * **Format:** * A list or sequence of timestamps corresponding to the forecast horizon and granularity. * Example: `[2024-07-04, 2024-07-05, 2024-07-06, ...]` **E. (Optional) Explainability Outputs:** * **Description:** Outputs that help explain *why* the model made a particular forecast. This is especially important for building trust and understanding. * **Format:** * **Attention Weights:** For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction. * **Feature Importance Scores:** Estimates of the relative importance of different input features. * **SHAP Values:** A more sophisticated method for explaining individual predictions. * **Characteristics:** * Can be complex to interpret, but provide valuable insights. **Summary Table:** | Category | Description | Format | Characteristics | | ---------------- | -------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | | **Inputs** | | | | | Historical Sales | Past sales data (SKU-store-day level) | Tabular (timestamp, sku, store_id, quantity, price, discount) | High frequency, potentially large, may exhibit seasonality/trends/noise. | | Promotional Data | Past, current, and *planned* promotions | Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) | Less frequent than sales data, includes future promotions. | | Inventory Data | Current and historical inventory levels | Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) | Frequency varies (daily, weekly). | | External Factors | Economic indicators, weather, holidays, social media, web traffic, competitors | Tabular or time-series (various) | Varying frequencies and formats. | | Product Metadata | Static information about products | Tabular (sku, category, subcategory, brand, description, price_tier) | Relatively static. | | Store Metadata | Static information of store | Tabular (store_id, location, store_type) | Relatively static. | **Outputs** | Description | Format | Characteristics | | ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------- | | Probabilistic Forecasts | Probability distribution of future demand | Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period | Provides a range of outcomes, quantifies uncertainty. | | Forecast Horizon | Length of time into the future | Number of time steps (days, weeks, months) | Longer horizons have greater uncertainty. | | Forecast Granularity| Level of detail (SKU-store-day, SKU-region-week, etc.) | Determined by model and business needs | Aligns with business requirements. | | Forecast Timestamps | Dates/times for which forecasts are generated | List/sequence of timestamps | Corresponds to horizon and granularity. | | Explainability (Optional) | Outputs that explain model predictions | Attention weights, feature importance scores, SHAP values | Complex to interpret, but provide valuable insights. |