aibom-generator / docs /AI_SBOM_API_doc.md
a1c00l's picture
Upload 3 files
550ed76 verified
|
raw
history blame
15.8 kB

AI SBOM Generator API Documentation

Overview

The AI SBOM Generator API provides a comprehensive solution for generating CycloneDX-compliant AI Software Bill of Materials (AI SBOM) for Hugging Face models. This API uses a configurable field registry system to extract and score AI SBOM fields across 5 categories, providing detailed completeness assessment and standards compliance.


Table of Contents


Base URL

When deployed on Hugging Face Spaces, the base URL will be:

https://aetheris-ai-aibom-generator.hf.space

Replace this with your actual deployment URL.


API Endpoints

API Status

Purpose: Check if the API is operational and get version information.

Endpoint: /status

Method: GET

cURL Example:

curl -X GET "https://aetheris-ai-aibom-generator.hf.space/status"

Expected Response:

{
  "status": "operational",
  "version": "1.0.0",
  "generator_version": "1.0.0"
}

Registry Status

Purpose: Check the field registry configuration status and available fields.

Endpoint: /api/registry/status

Method: GET

cURL Example:

curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/registry/status"

Expected Response:

{
  "registry_available": true,
  "total_fields": 29,
  "categories": [
    "required_fields",
    "metadata", 
    "component_basic",
    "component_model_card",
    "external_references"
  ],
  "field_count_by_category": {
    "required_fields": 4,
    "metadata": 5,
    "component_basic": 5,
    "component_model_card": 14,
    "external_references": 1
  },
  "registry_manager_loaded": true
}

Generate AI SBOM

Purpose: Generate an AI SBOM for a specified Hugging Face model.

Endpoint: /api/generate

Method: POST

Parameters:

  • model_id (required): The Hugging Face model ID (e.g., 'meta-llama/Llama-2-7b-chat-hf')
  • include_inference (optional): Whether to use AI inference to enhance the AI SBOM (default: true)
  • use_best_practices (optional): Whether to use industry best practices for scoring (default: true)
  • hf_token (optional): Hugging Face API token for accessing private models

cURL Example:

curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "deepseek-ai/DeepSeek-R1",
    "include_inference": true,
    "use_best_practices": true
  }'

Expected Response: JSON containing the generated AI SBOM, model ID, timestamp, and download URL.

{
  "aibom": {
    "bomFormat": "CycloneDX",
    "specVersion": "1.6",
    "serialNumber": "urn:uuid:deepseek-ai-DeepSeek-R1",
    "version": "1.0.0",
    "metadata": {
      "timestamp": "2025-07-15T18:31:18Z",
      "tools": [
        {
          "vendor": "Aetheris AI",
          "name": "AI SBOM Generator",
          "version": "1.0.0"
        }
      ],
      "properties": [
        {
          "name": "primaryPurpose",
          "value": "text-generation"
        },
        {
          "name": "suppliedBy", 
          "value": "deepseek-ai"
        }
      ]
    },
    "components": [
      {
        "type": "machine-learning-model",
        "name": "DeepSeek-R1",
        "purl": "pkg:huggingface/deepseek-ai/DeepSeek-R1",
        "description": "Advanced reasoning model with enhanced capabilities",
        "licenses": [
          {
            "license": {
              "name": "DeepSeek License"
            }
          }
        ],
        "modelCard": {
          "limitation": "Model may have limitations in certain domains"
        }
      }
    ],
    "externalReferences": [
      {
        "type": "distribution",
        "url": "https://huggingface.co/deepseek-ai/DeepSeek-R1"
      }
    ]
  },
  "model_id": "deepseek-ai/DeepSeek-R1",
  "generated_at": "2025-07-15T18:31:18Z",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json"
}

Generate AI SBOM with Completeness Score Report

Purpose: Generate an AI SBOM along with a detailed completeness score report.

Endpoint: /api/generate-with-report

Method: POST

Parameters: Same as Generate AI SBOM

cURL Example:

curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate-with-report" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "deepseek-ai/DeepSeek-R1",
    "include_inference": true,
    "use_best_practices": true
  }'

Expected Response: Same as Generate AI SBOM plus completeness score details.

{
  "aibom": { ... },
  "model_id": "deepseek-ai/DeepSeek-R1",
  "generated_at": "2025-07-15T18:31:18Z",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json",
  "completeness_score": {
    "total_score": 62.3,
    "section_scores": {
      "required_fields": 20.0,
      "metadata": 8.0,
      "component_basic": 20.0,
      "component_model_card": 4.3,
      "external_references": 10.0
    },
    "max_scores": {
      "required_fields": 20,
      "metadata": 20,
      "component_basic": 20,
      "component_model_card": 30,
      "external_references": 10
    },
    "field_checklist": {
      "bomFormat": "present",
      "specVersion": "present",
      "serialNumber": "present",
      "version": "present",
      "primaryPurpose": "present",
      "suppliedBy": "present",
      "standardCompliance": "missing",
      "domain": "missing",
      "autonomyType": "missing",
      "name": "present",
      "type": "present",
      "purl": "present",
      "description": "present",
      "licenses": "present",
      "energyConsumption": "missing",
      "hyperparameter": "missing",
      "limitation": "present",
      "safetyRiskAssessment": "missing",
      "typeOfModel": "present",
      "modelExplainability": "missing",
      "energyQuantity": "missing",
      "energyUnit": "missing",
      "informationAboutTraining": "missing",
      "informationAboutApplication": "missing",
      "metric": "missing",
      "metricDecisionThreshold": "missing",
      "modelDataPreprocessing": "missing",
      "useSensitivePersonalInformation": "missing",
      "downloadLocation": "present"
    },
    "category_details": {
      "required_fields": {
        "present_fields": 4,
        "total_fields": 4,
        "percentage": 100.0
      },
      "metadata": {
        "present_fields": 2,
        "total_fields": 5,
        "percentage": 40.0
      },
      "component_basic": {
        "present_fields": 5,
        "total_fields": 5,
        "percentage": 100.0
      },
      "component_model_card": {
        "present_fields": 2,
        "total_fields": 14,
        "percentage": 14.3
      },
      "external_references": {
        "present_fields": 1,
        "total_fields": 1,
        "percentage": 100.0
      }
    }
  }
}

Get Completeness Score Only

Purpose: Get the completeness score for a model without generating a full AI SBOM.

Endpoint: /api/models/{model_id}/score

Method: GET

Parameters:

  • model_id (path parameter): The Hugging Face model ID
  • use_best_practices (query parameter, optional): Whether to use industry best practices for scoring (default: true)
  • hf_token (query parameter, optional): Hugging Face API token for accessing private models

cURL Example:

curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/models/deepseek-ai/DeepSeek-R1/score?use_best_practices=true"

Expected Response:

{
  "model_id": "deepseek-ai/DeepSeek-R1",
  "total_score": 62.3,
  "section_scores": {
    "required_fields": 20.0,
    "metadata": 8.0,
    "component_basic": 20.0,
    "component_model_card": 4.3,
    "external_references": 10.0
  },
  "max_scores": {
    "required_fields": 20,
    "metadata": 20,
    "component_basic": 20,
    "component_model_card": 30,
    "external_references": 10
  },
  "field_checklist": {
    "bomFormat": "present",
    "specVersion": "present",
    "name": "present",
    "downloadLocation": "present"
  },
  "generated_at": "2025-07-15T18:31:18Z",
  "request_id": "550e8400-e29b-41d4-a716-446655440000"
}

Download Generated AI SBOM

Purpose: Download a previously generated AI SBOM file.

Endpoint: /output/{filename}

Method: GET

cURL Example:

curl -X GET "https://aetheris-ai-aibom-generator.hf.space/output/deepseek-ai_DeepSeek-R1_ai_sbom.json" \
  -o "deepseek_r1_aibom.json"

Form-Based Generation (Web UI)

Purpose: Generate AI SBOM through the web interface form submission.

Endpoint: /generate

Method: POST

Content-Type: application/x-www-form-urlencoded

Parameters:

  • model_id (required): The Hugging Face model ID
  • g-recaptcha-response (required): reCAPTCHA response token

cURL Example:

curl -X POST "https://aetheris-ai-aibom-generator.hf.space/generate" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "model_id=deepseek-ai/DeepSeek-R1&g-recaptcha-response=YOUR_RECAPTCHA_TOKEN"

Web UI

The API also provides a user-friendly web interface accessible at the base URL. The web UI includes:

  • Model ID input field with validation
  • reCAPTCHA protection against automated abuse
  • Real-time generation with progress indicators
  • Downloadable results with completeness scoring
  • Field checklist visualization showing extraction results
  • Category-based scoring breakdown

Security Features

Rate Limiting

  • 10 requests per minute per IP address
  • 5 concurrent requests maximum
  • 1MB request size limit

reCAPTCHA Protection

  • Google reCAPTCHA v2 integration for web UI
  • Automated bot detection and prevention
  • Configurable through environment variables

Input Validation

  • Model ID format validation (alphanumeric, hyphens, underscores, forward slashes)
  • XSS protection through HTML escaping
  • SQL injection prevention through parameterized queries

Field Registry System

The AI SBOM Generator uses a configurable field registry system that enables:

29 Configurable Fields across 5 categories:

  • Required Fields (4): bomFormat, specVersion, serialNumber, version
  • Metadata (5): primaryPurpose, suppliedBy, standardCompliance, domain, autonomyType
  • Component Basic (5): name, type, purl, description, licenses
  • Component Model Card (14): energyConsumption, hyperparameter, limitation, safetyRiskAssessment, typeOfModel, modelExplainability, energyQuantity, energyUnit, informationAboutTraining, informationAboutApplication, metric, metricDecisionThreshold, modelDataPreprocessing, useSensitivePersonalInformation
  • External References (1): downloadLocation

Multi-Strategy Extraction:

  1. HuggingFace API β†’ Direct metadata extraction (High confidence)
  2. Model Card β†’ Structured documentation parsing (Medium-high confidence)
  3. Config Files β†’ Technical details from JSON files (High confidence)
  4. Text Patterns β†’ Regex extraction from README (Medium confidence)
  5. Intelligent Inference β†’ Smart defaults from context (Medium confidence)
  6. Fallback Values β†’ Placeholders when no data available (Low confidence)

SPDX 3.0 Compatibility:

  • 100% field coverage with SPDX 3.0 AI Profile specification
  • 59% exact field name matches with official SPDX 3.0 fields
  • Future dual-format support for both CycloneDX and SPDX output
  • Current limitation does not generate output in SPDX format

Completeness Score

The completeness score is calculated using a weighted scoring system across five categories:

Scoring Categories:

  • Required Fields (20%): Essential CycloneDX infrastructure
  • Metadata (20%): AI-specific metadata and provenance
  • Component Basic (20%): Core component identification
  • Component Model Card (30%): Advanced AI model documentation
  • External References (10%): Distribution and reference links

Field Tiers:

  • Critical (C): Essential fields with 3x weight multiplier
  • Important (I): Valuable fields with 2x weight multiplier
  • Supplementary (S): Additional fields with 1x weight multiplier

Score Interpretation:

  • 90-100: Exceptional documentation quality
  • 80-89: Comprehensive documentation
  • 70-79: Good documentation with minor gaps
  • 60-69: Adequate documentation with some missing elements
  • 50-59: Basic documentation with significant gaps
  • Below 50: Insufficient documentation

Confidence-Based Filtering:

  • Only fields extracted with medium or high confidence contribute to the score
  • Low or none confidence extractions are excluded to ensure score reliability
  • Individual field failures don't prevent overall SBOM generation

Notes on Using the API

Model ID Format

  • Use the exact Hugging Face model identifier (e.g., meta-llama/Llama-2-7b-chat-hf)
  • Model IDs are case-sensitive
  • Private models require a valid hf_token

Response Times

  • Simple models: 5-15 seconds
  • Complex models with inference: 30-60 seconds
  • Large models: Up to 2 minutes

File Storage

  • Generated AI SBOMs are stored temporarily (7 days)
  • Download URLs are valid for the file retention period
  • Files are automatically cleaned up to manage storage

Best Practices

  • Use use_best_practices=true for industry-standard scoring
  • Include include_inference=true for enhanced field extraction
  • Cache results locally to avoid repeated API calls for the same model
  • Use the registry status endpoint to verify system configuration

Error Handling

Common HTTP Status Codes:

  • 200 OK: Successful request
  • 400 Bad Request: Invalid model ID format or missing parameters
  • 404 Not Found: Model not found on Hugging Face
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Server-side processing error

Error Response Format:

{
  "detail": "Error description",
  "error_code": "SPECIFIC_ERROR_CODE",
  "timestamp": "2025-07-15T18:31:18Z"
}

Common Error Scenarios:

  • Invalid Model ID: Check format and existence on Hugging Face
  • Private Model Access: Ensure valid hf_token is provided
  • Rate Limiting: Wait before retrying or implement exponential backoff
  • Registry Unavailable: System falls back to basic field extraction

Support and Documentation

For additional support, documentation updates, or feature requests:

  • GitHub Repository: [Link to GitHub Isuses]
  • API Status Page: Use /status and /api/registry/status endpoints
  • Web Interface: Available at the base URL for interactive testing

This API provides comprehensive AI SBOM generation capabilities with industry-leading field coverage, standards compliance, and configurable scoring systems.