a1c00l commited on
Commit
550ed76
Β·
verified Β·
1 Parent(s): 58ff627

Upload 3 files

Browse files
docs/AI_SBOM_API_doc.md CHANGED
@@ -2,7 +2,28 @@
2
 
3
  ## Overview
4
 
5
- The AI SBOM Generator API provides a comprehensive solution for generating CycloneDX-compliant AI Bill of Materials (AI SBOM) for Hugging Face models. This document outlines the available API endpoints, their functionality, and how to interact with them using cURL commands.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
  ## Base URL
8
 
@@ -13,9 +34,11 @@ https://aetheris-ai-aibom-generator.hf.space
13
 
14
  Replace this with your actual deployment URL.
15
 
 
 
16
  ## API Endpoints
17
 
18
- ### Status Endpoint
19
 
20
  **Purpose**: Check if the API is operational and get version information.
21
 
@@ -37,7 +60,47 @@ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/status"
37
  }
38
  ```
39
 
40
- ### Generate AI SBOM Endpoint
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  **Purpose**: Generate an AI SBOM for a specified Hugging Face model.
43
 
@@ -56,7 +119,7 @@ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/status"
56
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate" \
57
  -H "Content-Type: application/json" \
58
  -d '{
59
- "model_id": "meta-llama/Llama-2-7b-chat-hf",
60
  "include_inference": true,
61
  "use_best_practices": true
62
  }'
@@ -68,65 +131,172 @@ curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate" \
68
  "aibom": {
69
  "bomFormat": "CycloneDX",
70
  "specVersion": "1.6",
71
- "serialNumber": "urn:uuid:...",
72
- "version": 1,
73
- "metadata": { ... },
74
- "components": [ ... ],
75
- "dependencies": [ ... ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  },
77
- "model_id": "meta-llama/Llama-2-7b-chat-hf",
78
- "generated_at": "2025-04-24T20:30:00Z",
79
- "request_id": "...",
80
- "download_url": "/output/meta-llama_Llama-2-7b-chat-hf_....json"
81
  }
82
  ```
83
 
84
- ### Generate AI SBOM with Enhancement Report
 
 
85
 
86
- **Purpose**: Generate an AI SBOM with a detailed enhancement report.
87
 
88
  **Endpoint**: `/api/generate-with-report`
89
 
90
  **Method**: POST
91
 
92
- **Parameters**: Same as `/api/generate`
93
 
94
  **cURL Example**:
95
  ```bash
96
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate-with-report" \
97
  -H "Content-Type: application/json" \
98
  -d '{
99
- "model_id": "meta-llama/Llama-2-7b-chat-hf",
100
  "include_inference": true,
101
  "use_best_practices": true
102
  }'
103
  ```
104
 
105
- **Expected Response**: JSON containing the generated AI SBOM, model ID, timestamp, download URL, and enhancement report.
106
  ```json
107
  {
108
  "aibom": { ... },
109
- "model_id": "meta-llama/Llama-2-7b-chat-hf",
110
- "generated_at": "2025-04-24T20:30:00Z",
111
- "request_id": "...",
112
- "download_url": "/output/meta-llama_Llama-2-7b-chat-hf_....json",
113
- "enhancement_report": {
114
- "ai_enhanced": true,
115
- "ai_model": "BERT-base-uncased",
116
- "original_score": {
117
- "total_score": 65.5,
118
- "completeness_score": 65.5
 
 
119
  },
120
- "final_score": {
121
- "total_score": 85.2,
122
- "completeness_score": 85.2
 
 
 
123
  },
124
- "improvement": 19.7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  }
126
  }
127
  ```
128
 
129
- ### Get Model Score
 
 
130
 
131
  **Purpose**: Get the completeness score for a model without generating a full AI SBOM.
132
 
@@ -136,24 +306,25 @@ curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate-with-rep
136
 
137
  **Parameters**:
138
  - `model_id` (path parameter): The Hugging Face model ID
139
- - `hf_token` (query parameter, optional): Hugging Face API token for accessing private models
140
  - `use_best_practices` (query parameter, optional): Whether to use industry best practices for scoring (default: true)
 
141
 
142
  **cURL Example**:
143
  ```bash
144
- curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/models/meta-llama/Llama-2-7b-chat-hf/score?use_best_practices=true"
145
  ```
146
 
147
- **Expected Response**: JSON containing the completeness score information.
148
  ```json
149
  {
150
- "total_score": 85.2,
 
151
  "section_scores": {
152
- "required_fields": 20,
153
- "metadata": 18.5,
154
- "component_basic": 20,
155
- "component_model_card": 20.7,
156
- "external_references": 6
157
  },
158
  "max_scores": {
159
  "required_fields": 20,
@@ -161,136 +332,205 @@ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/models/meta-llama/
161
  "component_basic": 20,
162
  "component_model_card": 30,
163
  "external_references": 10
164
- }
 
 
 
 
 
 
 
 
165
  }
166
  ```
167
 
 
 
168
  ### Download Generated AI SBOM
169
 
170
  **Purpose**: Download a previously generated AI SBOM file.
171
 
172
- **Endpoint**: `/download/{filename}`
173
 
174
  **Method**: GET
175
 
176
- **Parameters**:
177
- - `filename` (path parameter): The filename of the AI SBOM to download
178
-
179
  **cURL Example**:
180
  ```bash
181
- curl -X GET "https://aetheris-ai-aibom-generator.hf.space/download/{filename}" \
182
- -o "downloaded_aibom.json"
183
  ```
184
 
185
- **Expected Response**: The AI SBOM JSON file will be downloaded to your local machine.
186
 
187
  ### Form-Based Generation (Web UI)
188
 
189
- **Purpose**: Generate an AI SBOM using form data (typically used by the web UI).
190
 
191
  **Endpoint**: `/generate`
192
 
193
  **Method**: POST
194
 
 
 
195
  **Parameters**:
196
- - `model_id` (form field, required): The Hugging Face model ID
197
- - `include_inference` (form field, optional): Whether to use AI inference to enhance the AI SBOM
198
- - `use_best_practices` (form field, optional): Whether to use industry best practices for scoring
199
 
200
  **cURL Example**:
201
  ```bash
202
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/generate" \
203
- -F "model_id=meta-llama/Llama-2-7b-chat-hf" \
204
- -F "include_inference=true" \
205
- -F "use_best_practices=true"
206
  ```
207
 
208
- **Expected Response**: HTML page with the generated AI SBOM results.
209
 
210
  ## Web UI
211
 
212
- The API also provides a web user interface for generating AI SBOMs without writing code:
213
 
214
- **URL**: `https://aetheris-ai-aibom-generator.hf.space/`
 
 
 
 
 
215
 
216
- The web UI allows you to:
217
- 1. Enter a Hugging Face model ID
218
- 2. Configure generation options
219
- 3. Generate an AI SBOM
220
- 4. View the results in a human-friendly format
221
- 5. Download the generated AI SBOM as a JSON file
222
 
223
- ## Understanding the Field Checklist
224
 
225
- In the Field Checklist tab of the results page, you'll see a list of fields with check marks (βœ”/✘) and stars (β˜…). Here's what they mean:
 
 
 
226
 
227
- - **Check marks**:
228
- - βœ”: Field is present in the AI SBOM
229
- - ✘: Field is missing from the AI SBOM
 
230
 
231
- - **Stars** (importance level):
232
- - β˜…β˜…β˜… (three stars): Critical fields - Essential for a valid and complete AI SBOM
233
- - β˜…β˜… (two stars): Important fields - Valuable information that enhances completeness
234
- - β˜… (one star): Supplementary fields - Additional context and details (optional)
235
 
236
- ## Security Features
 
 
 
 
 
 
 
 
 
 
 
237
 
238
- The API includes several security features to protect against Denial of Service (DoS) attacks:
 
 
 
 
 
 
239
 
240
- 1. **Rate Limiting**: Limits the number of requests a single IP address can make within a specific time window.
 
 
 
 
241
 
242
- 2. **Concurrency Limiting**: Restricts the total number of simultaneous requests being processed to prevent resource exhaustion.
243
 
244
- 3. **Request Size Limiting**: Prevents attackers from sending extremely large payloads that could consume memory or processing resources.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
 
246
- 4. **API Key Authentication** (optional): When configured, requires an API key for accessing API endpoints, enabling tracking and control of API usage.
 
 
 
 
 
 
247
 
248
- 5. **CAPTCHA Verification** (optional): When configured for the web interface, helps ensure requests come from humans rather than bots.
 
 
 
 
 
249
 
250
  ## Notes on Using the API
251
 
252
- 1. When deployed on Hugging Face Spaces, use the correct URL format as shown in the examples.
253
- 2. Some endpoints may have rate limiting or require authentication.
254
- 3. For large responses, consider adding appropriate timeout settings in your requests.
255
- 4. If you encounter CORS issues, you may need to add appropriate headers.
256
- 5. For downloading files, specify the output file name in your client code.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
 
258
  ## Error Handling
259
 
260
- The API returns standard HTTP status codes:
261
- - 200: Success
262
- - 400: Bad Request (invalid parameters)
263
- - 404: Not Found (resource not found)
264
- - 429: Too Many Requests (rate limit exceeded)
265
- - 500: Internal Server Error (server-side error)
266
- - 503: Service Unavailable (server at capacity)
267
 
268
- Error responses include a detail message explaining the error:
269
  ```json
270
  {
271
- "detail": "Error generating AI SBOM: Model not found"
 
 
272
  }
273
  ```
274
 
275
- ## Completeness Score
276
-
277
- The completeness score is calculated based on the presence and quality of various fields in the AI SBOM. The score is broken down into sections:
278
-
279
- 1. **Required Fields** (20 points): Basic required fields for a valid AI SBOM
280
- 2. **Metadata** (20 points): Information about the AI SBOM itself
281
- 3. **Component Basic Info** (20 points): Basic information about the AI model component
282
- 4. **Model Card** (30 points): Detailed model card information
283
- 5. **External References** (10 points): Links to external resources
284
-
285
- The total score is a weighted sum of these section scores, with a maximum of 100 points.
286
 
287
- ## Enhancement Report
288
 
289
- When AI enhancement is enabled, the API uses an inference model to extract additional information from the model card and other sources. The enhancement report shows:
290
 
291
- 1. **Original Score**: The completeness score before enhancement
292
- 2. **Enhanced Score**: The completeness score after enhancement
293
- 3. **Improvement**: The point increase from enhancement
294
- 4. **AI Model Used**: The model used for enhancement
295
 
296
- This helps you understand how much the AI enhancement improved the AI SBOM's completeness.
 
2
 
3
  ## Overview
4
 
5
+ The AI SBOM Generator API provides a comprehensive solution for generating CycloneDX-compliant AI Software Bill of Materials (AI SBOM) for Hugging Face models. This API uses a configurable field registry system to extract and score AI SBOM fields across 5 categories, providing detailed completeness assessment and standards compliance.
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+ - [Base URL](#base-url)
11
+ - [API Endpoints](#api-endpoints)
12
+ - [API Status](#api-status)
13
+ - [Registry Status](#registry-status)
14
+ - [Generate AI SBOM](#generate-ai-sbom)
15
+ - [Generate AI SBOM with Completeness Score Report](#generate-ai-sbom-with-completeness-score-report)
16
+ - [Get Completeness Score Only](#get-completeness-score-only)
17
+ - [Download Generated AI SBOM](#download-generated-ai-sbom)
18
+ - [Form-Based Generation (Web UI)](#form-based-generation-web-ui)
19
+ - [Web UI](#web-ui)
20
+ - [Security Features](#security-features)
21
+ - [Field Registry System](#field-registry-system)
22
+ - [Completeness Score](#completeness-score)
23
+ - [Notes on Using the API](#notes-on-using-the-api)
24
+ - [Error Handling](#error-handling)
25
+
26
+ ---
27
 
28
  ## Base URL
29
 
 
34
 
35
  Replace this with your actual deployment URL.
36
 
37
+ ---
38
+
39
  ## API Endpoints
40
 
41
+ ### API Status
42
 
43
  **Purpose**: Check if the API is operational and get version information.
44
 
 
60
  }
61
  ```
62
 
63
+ ---
64
+
65
+ ### Registry Status
66
+
67
+ **Purpose**: Check the field registry configuration status and available fields.
68
+
69
+ **Endpoint**: `/api/registry/status`
70
+
71
+ **Method**: GET
72
+
73
+ **cURL Example**:
74
+ ```bash
75
+ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/registry/status"
76
+ ```
77
+
78
+ **Expected Response**:
79
+ ```json
80
+ {
81
+ "registry_available": true,
82
+ "total_fields": 29,
83
+ "categories": [
84
+ "required_fields",
85
+ "metadata",
86
+ "component_basic",
87
+ "component_model_card",
88
+ "external_references"
89
+ ],
90
+ "field_count_by_category": {
91
+ "required_fields": 4,
92
+ "metadata": 5,
93
+ "component_basic": 5,
94
+ "component_model_card": 14,
95
+ "external_references": 1
96
+ },
97
+ "registry_manager_loaded": true
98
+ }
99
+ ```
100
+
101
+ ---
102
+
103
+ ### Generate AI SBOM
104
 
105
  **Purpose**: Generate an AI SBOM for a specified Hugging Face model.
106
 
 
119
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate" \
120
  -H "Content-Type: application/json" \
121
  -d '{
122
+ "model_id": "deepseek-ai/DeepSeek-R1",
123
  "include_inference": true,
124
  "use_best_practices": true
125
  }'
 
131
  "aibom": {
132
  "bomFormat": "CycloneDX",
133
  "specVersion": "1.6",
134
+ "serialNumber": "urn:uuid:deepseek-ai-DeepSeek-R1",
135
+ "version": "1.0.0",
136
+ "metadata": {
137
+ "timestamp": "2025-07-15T18:31:18Z",
138
+ "tools": [
139
+ {
140
+ "vendor": "Aetheris AI",
141
+ "name": "AI SBOM Generator",
142
+ "version": "1.0.0"
143
+ }
144
+ ],
145
+ "properties": [
146
+ {
147
+ "name": "primaryPurpose",
148
+ "value": "text-generation"
149
+ },
150
+ {
151
+ "name": "suppliedBy",
152
+ "value": "deepseek-ai"
153
+ }
154
+ ]
155
+ },
156
+ "components": [
157
+ {
158
+ "type": "machine-learning-model",
159
+ "name": "DeepSeek-R1",
160
+ "purl": "pkg:huggingface/deepseek-ai/DeepSeek-R1",
161
+ "description": "Advanced reasoning model with enhanced capabilities",
162
+ "licenses": [
163
+ {
164
+ "license": {
165
+ "name": "DeepSeek License"
166
+ }
167
+ }
168
+ ],
169
+ "modelCard": {
170
+ "limitation": "Model may have limitations in certain domains"
171
+ }
172
+ }
173
+ ],
174
+ "externalReferences": [
175
+ {
176
+ "type": "distribution",
177
+ "url": "https://huggingface.co/deepseek-ai/DeepSeek-R1"
178
+ }
179
+ ]
180
  },
181
+ "model_id": "deepseek-ai/DeepSeek-R1",
182
+ "generated_at": "2025-07-15T18:31:18Z",
183
+ "request_id": "550e8400-e29b-41d4-a716-446655440000",
184
+ "download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json"
185
  }
186
  ```
187
 
188
+ ---
189
+
190
+ ### Generate AI SBOM with Completeness Score Report
191
 
192
+ **Purpose**: Generate an AI SBOM along with a detailed completeness score report.
193
 
194
  **Endpoint**: `/api/generate-with-report`
195
 
196
  **Method**: POST
197
 
198
+ **Parameters**: Same as Generate AI SBOM
199
 
200
  **cURL Example**:
201
  ```bash
202
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/api/generate-with-report" \
203
  -H "Content-Type: application/json" \
204
  -d '{
205
+ "model_id": "deepseek-ai/DeepSeek-R1",
206
  "include_inference": true,
207
  "use_best_practices": true
208
  }'
209
  ```
210
 
211
+ **Expected Response**: Same as Generate AI SBOM plus completeness score details.
212
  ```json
213
  {
214
  "aibom": { ... },
215
+ "model_id": "deepseek-ai/DeepSeek-R1",
216
+ "generated_at": "2025-07-15T18:31:18Z",
217
+ "request_id": "550e8400-e29b-41d4-a716-446655440000",
218
+ "download_url": "/output/deepseek-ai_DeepSeek-R1_ai_sbom.json",
219
+ "completeness_score": {
220
+ "total_score": 62.3,
221
+ "section_scores": {
222
+ "required_fields": 20.0,
223
+ "metadata": 8.0,
224
+ "component_basic": 20.0,
225
+ "component_model_card": 4.3,
226
+ "external_references": 10.0
227
  },
228
+ "max_scores": {
229
+ "required_fields": 20,
230
+ "metadata": 20,
231
+ "component_basic": 20,
232
+ "component_model_card": 30,
233
+ "external_references": 10
234
  },
235
+ "field_checklist": {
236
+ "bomFormat": "present",
237
+ "specVersion": "present",
238
+ "serialNumber": "present",
239
+ "version": "present",
240
+ "primaryPurpose": "present",
241
+ "suppliedBy": "present",
242
+ "standardCompliance": "missing",
243
+ "domain": "missing",
244
+ "autonomyType": "missing",
245
+ "name": "present",
246
+ "type": "present",
247
+ "purl": "present",
248
+ "description": "present",
249
+ "licenses": "present",
250
+ "energyConsumption": "missing",
251
+ "hyperparameter": "missing",
252
+ "limitation": "present",
253
+ "safetyRiskAssessment": "missing",
254
+ "typeOfModel": "present",
255
+ "modelExplainability": "missing",
256
+ "energyQuantity": "missing",
257
+ "energyUnit": "missing",
258
+ "informationAboutTraining": "missing",
259
+ "informationAboutApplication": "missing",
260
+ "metric": "missing",
261
+ "metricDecisionThreshold": "missing",
262
+ "modelDataPreprocessing": "missing",
263
+ "useSensitivePersonalInformation": "missing",
264
+ "downloadLocation": "present"
265
+ },
266
+ "category_details": {
267
+ "required_fields": {
268
+ "present_fields": 4,
269
+ "total_fields": 4,
270
+ "percentage": 100.0
271
+ },
272
+ "metadata": {
273
+ "present_fields": 2,
274
+ "total_fields": 5,
275
+ "percentage": 40.0
276
+ },
277
+ "component_basic": {
278
+ "present_fields": 5,
279
+ "total_fields": 5,
280
+ "percentage": 100.0
281
+ },
282
+ "component_model_card": {
283
+ "present_fields": 2,
284
+ "total_fields": 14,
285
+ "percentage": 14.3
286
+ },
287
+ "external_references": {
288
+ "present_fields": 1,
289
+ "total_fields": 1,
290
+ "percentage": 100.0
291
+ }
292
+ }
293
  }
294
  }
295
  ```
296
 
297
+ ---
298
+
299
+ ### Get Completeness Score Only
300
 
301
  **Purpose**: Get the completeness score for a model without generating a full AI SBOM.
302
 
 
306
 
307
  **Parameters**:
308
  - `model_id` (path parameter): The Hugging Face model ID
 
309
  - `use_best_practices` (query parameter, optional): Whether to use industry best practices for scoring (default: true)
310
+ - `hf_token` (query parameter, optional): Hugging Face API token for accessing private models
311
 
312
  **cURL Example**:
313
  ```bash
314
+ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/api/models/deepseek-ai/DeepSeek-R1/score?use_best_practices=true"
315
  ```
316
 
317
+ **Expected Response**:
318
  ```json
319
  {
320
+ "model_id": "deepseek-ai/DeepSeek-R1",
321
+ "total_score": 62.3,
322
  "section_scores": {
323
+ "required_fields": 20.0,
324
+ "metadata": 8.0,
325
+ "component_basic": 20.0,
326
+ "component_model_card": 4.3,
327
+ "external_references": 10.0
328
  },
329
  "max_scores": {
330
  "required_fields": 20,
 
332
  "component_basic": 20,
333
  "component_model_card": 30,
334
  "external_references": 10
335
+ },
336
+ "field_checklist": {
337
+ "bomFormat": "present",
338
+ "specVersion": "present",
339
+ "name": "present",
340
+ "downloadLocation": "present"
341
+ },
342
+ "generated_at": "2025-07-15T18:31:18Z",
343
+ "request_id": "550e8400-e29b-41d4-a716-446655440000"
344
  }
345
  ```
346
 
347
+ ---
348
+
349
  ### Download Generated AI SBOM
350
 
351
  **Purpose**: Download a previously generated AI SBOM file.
352
 
353
+ **Endpoint**: `/output/{filename}`
354
 
355
  **Method**: GET
356
 
 
 
 
357
  **cURL Example**:
358
  ```bash
359
+ curl -X GET "https://aetheris-ai-aibom-generator.hf.space/output/deepseek-ai_DeepSeek-R1_ai_sbom.json" \
360
+ -o "deepseek_r1_aibom.json"
361
  ```
362
 
363
+ ---
364
 
365
  ### Form-Based Generation (Web UI)
366
 
367
+ **Purpose**: Generate AI SBOM through the web interface form submission.
368
 
369
  **Endpoint**: `/generate`
370
 
371
  **Method**: POST
372
 
373
+ **Content-Type**: `application/x-www-form-urlencoded`
374
+
375
  **Parameters**:
376
+ - `model_id` (required): The Hugging Face model ID
377
+ - `g-recaptcha-response` (required): reCAPTCHA response token
 
378
 
379
  **cURL Example**:
380
  ```bash
381
  curl -X POST "https://aetheris-ai-aibom-generator.hf.space/generate" \
382
+ -H "Content-Type: application/x-www-form-urlencoded" \
383
+ -d "model_id=deepseek-ai/DeepSeek-R1&g-recaptcha-response=YOUR_RECAPTCHA_TOKEN"
 
384
  ```
385
 
386
+ ---
387
 
388
  ## Web UI
389
 
390
+ The API also provides a user-friendly web interface accessible at the base URL. The web UI includes:
391
 
392
+ - **Model ID input field** with validation
393
+ - **reCAPTCHA protection** against automated abuse
394
+ - **Real-time generation** with progress indicators
395
+ - **Downloadable results** with completeness scoring
396
+ - **Field checklist visualization** showing extraction results
397
+ - **Category-based scoring breakdown**
398
 
399
+ ---
 
 
 
 
 
400
 
401
+ ## Security Features
402
 
403
+ ### Rate Limiting
404
+ - **10 requests per minute** per IP address
405
+ - **5 concurrent requests** maximum
406
+ - **1MB request size limit**
407
 
408
+ ### reCAPTCHA Protection
409
+ - **Google reCAPTCHA v2** integration for web UI
410
+ - **Automated bot detection** and prevention
411
+ - **Configurable through environment variables**
412
 
413
+ ### Input Validation
414
+ - **Model ID format validation** (alphanumeric, hyphens, underscores, forward slashes)
415
+ - **XSS protection** through HTML escaping
416
+ - **SQL injection prevention** through parameterized queries
417
 
418
+ ---
419
+
420
+ ## Field Registry System
421
+
422
+ The AI SBOM Generator uses a configurable field registry system that enables:
423
+
424
+ ### **29 Configurable Fields** across 5 categories:
425
+ - **Required Fields (4)**: bomFormat, specVersion, serialNumber, version
426
+ - **Metadata (5)**: primaryPurpose, suppliedBy, standardCompliance, domain, autonomyType
427
+ - **Component Basic (5)**: name, type, purl, description, licenses
428
+ - **Component Model Card (14)**: energyConsumption, hyperparameter, limitation, safetyRiskAssessment, typeOfModel, modelExplainability, energyQuantity, energyUnit, informationAboutTraining, informationAboutApplication, metric, metricDecisionThreshold, modelDataPreprocessing, useSensitivePersonalInformation
429
+ - **External References (1)**: downloadLocation
430
 
431
+ ### **Multi-Strategy Extraction**:
432
+ 1. **HuggingFace API** β†’ Direct metadata extraction (High confidence)
433
+ 2. **Model Card** β†’ Structured documentation parsing (Medium-high confidence)
434
+ 3. **Config Files** β†’ Technical details from JSON files (High confidence)
435
+ 4. **Text Patterns** β†’ Regex extraction from README (Medium confidence)
436
+ 5. **Intelligent Inference** β†’ Smart defaults from context (Medium confidence)
437
+ 6. **Fallback Values** β†’ Placeholders when no data available (Low confidence)
438
 
439
+ ### **SPDX 3.0 Compatibility**:
440
+ - **100% field coverage** with SPDX 3.0 AI Profile specification
441
+ - **59% exact field name matches** with official SPDX 3.0 fields
442
+ - **Future dual-format support** for both CycloneDX and SPDX output
443
+ - **Current limitation** does not generate output in SPDX format
444
 
445
+ ---
446
 
447
+ ## Completeness Score
448
+
449
+ The completeness score is calculated using a weighted scoring system across five categories:
450
+
451
+ ### **Scoring Categories**:
452
+ - **Required Fields (20%)**: Essential CycloneDX infrastructure
453
+ - **Metadata (20%)**: AI-specific metadata and provenance
454
+ - **Component Basic (20%)**: Core component identification
455
+ - **Component Model Card (30%)**: Advanced AI model documentation
456
+ - **External References (10%)**: Distribution and reference links
457
+
458
+ ### **Field Tiers**:
459
+ - **Critical (C)**: Essential fields with 3x weight multiplier
460
+ - **Important (I)**: Valuable fields with 2x weight multiplier
461
+ - **Supplementary (S)**: Additional fields with 1x weight multiplier
462
 
463
+ ### **Score Interpretation**:
464
+ - **90-100**: Exceptional documentation quality
465
+ - **80-89**: Comprehensive documentation
466
+ - **70-79**: Good documentation with minor gaps
467
+ - **60-69**: Adequate documentation with some missing elements
468
+ - **50-59**: Basic documentation with significant gaps
469
+ - **Below 50**: Insufficient documentation
470
 
471
+ ### **Confidence-Based Filtering**:
472
+ - Only fields extracted with **medium** or **high** confidence contribute to the score
473
+ - **Low** or **none** confidence extractions are excluded to ensure score reliability
474
+ - Individual field failures don't prevent overall SBOM generation
475
+
476
+ ---
477
 
478
  ## Notes on Using the API
479
 
480
+ ### **Model ID Format**
481
+ - Use the exact Hugging Face model identifier (e.g., `meta-llama/Llama-2-7b-chat-hf`)
482
+ - Model IDs are case-sensitive
483
+ - Private models require a valid `hf_token`
484
+
485
+ ### **Response Times**
486
+ - **Simple models**: 5-15 seconds
487
+ - **Complex models with inference**: 30-60 seconds
488
+ - **Large models**: Up to 2 minutes
489
+
490
+ ### **File Storage**
491
+ - Generated AI SBOMs are stored temporarily (7 days)
492
+ - Download URLs are valid for the file retention period
493
+ - Files are automatically cleaned up to manage storage
494
+
495
+ ### **Best Practices**
496
+ - Use `use_best_practices=true` for industry-standard scoring
497
+ - Include `include_inference=true` for enhanced field extraction
498
+ - Cache results locally to avoid repeated API calls for the same model
499
+ - Use the registry status endpoint to verify system configuration
500
+
501
+ ---
502
 
503
  ## Error Handling
504
 
505
+ ### **Common HTTP Status Codes**:
506
+ - **200 OK**: Successful request
507
+ - **400 Bad Request**: Invalid model ID format or missing parameters
508
+ - **404 Not Found**: Model not found on Hugging Face
509
+ - **429 Too Many Requests**: Rate limit exceeded
510
+ - **500 Internal Server Error**: Server-side processing error
 
511
 
512
+ ### **Error Response Format**:
513
  ```json
514
  {
515
+ "detail": "Error description",
516
+ "error_code": "SPECIFIC_ERROR_CODE",
517
+ "timestamp": "2025-07-15T18:31:18Z"
518
  }
519
  ```
520
 
521
+ ### **Common Error Scenarios**:
522
+ - **Invalid Model ID**: Check format and existence on Hugging Face
523
+ - **Private Model Access**: Ensure valid `hf_token` is provided
524
+ - **Rate Limiting**: Wait before retrying or implement exponential backoff
525
+ - **Registry Unavailable**: System falls back to basic field extraction
 
 
 
 
 
 
526
 
527
+ ---
528
 
529
+ ## Support and Documentation
530
 
531
+ For additional support, documentation updates, or feature requests:
532
+ - **GitHub Repository**: [[Link to GitHub Isuses](https://github.com/aetheris-ai/aibom-generator/issues)]
533
+ - **API Status Page**: Use `/status` and `/api/registry/status` endpoints
534
+ - **Web Interface**: Available at the base URL for interactive testing
535
 
536
+ This API provides comprehensive AI SBOM generation capabilities with industry-leading field coverage, standards compliance, and configurable scoring systems.
docs/AI_SBOM_Fields_Mapping_Reference.md ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI SBOM Fields Mapping Reference
2
+
3
+ ## Table of Contents
4
+
5
+ - [Overview](#overview)
6
+ - [Legend](#legend)
7
+ - [Field Categories](#field-categories)
8
+ - [Required Fields Category](#required-fields-category)
9
+ - [Metadata Category](#metadata-category)
10
+ - [Component Basic Category](#component-basic-category)
11
+ - [Component Model Card Category](#component-model-card-category)
12
+ - [External References Category](#external-references-category)
13
+ - [Scoring Summary](#scoring-summary)
14
+ - [Field Extraction Strategies](#field-extraction-strategies)
15
+ - [Standards Compatibility](#standards-compatibility)
16
+ - [Usage Notes](#usage-notes)
17
+
18
+ ---
19
+
20
+ ## Overview
21
+
22
+ This document provides a comprehensive mapping of all 29 fields used in the AI SBOM Generator, organized by category to match the UI structure. Each field includes its CycloneDX 1.6 location, scoring weight, tier classification, and description. SPDX 3.0 compatibility information is included for reference.
23
+
24
+ The AI SBOM Generator uses a configurable field registry to extract, validate, and score AI model documentation across multiple sources, providing comprehensive Bill of Materials for AI systems.
25
+
26
+ ---
27
+
28
+ ## Legend
29
+
30
+ ### Tiers
31
+ - **C**: Critical - Essential fields (weight: 3x, 4.0-10.0 points)
32
+ - **I**: Important - Valuable fields (weight: 2x, 2.0-3.0 points)
33
+ - **S**: Supplementary - Additional fields (weight: 1x, 1.0-2.0 points)
34
+
35
+ ### SPDX 3.0 Alignment Status (AS)
36
+ - **🎯**: Exact Match - Matched field name and type
37
+ - **βœ…**: Standard Field - Core SPDX compatibility
38
+ - **πŸ”„**: Semantic Match - Same concept, different name
39
+
40
+ ---
41
+
42
+ ## Field Categories
43
+
44
+ ### Required Fields Category
45
+
46
+ Essential CycloneDX infrastructure fields that form the foundation of every SBOM document. These fields are mandatory for proper SBOM identification and compliance.
47
+
48
+ | # | Field Name | CycloneDX Location | SPDX 3.0 Equivalent | Tier | AS | Points | Description |
49
+ |---|------------|-------------------|---------------------|------|--------|--------|-------------|
50
+ | 1 | **bomFormat** | `$.bomFormat` | Core SPDX field | C | βœ… | 4.0 | Format identifier for the SBOM (always "CycloneDX") |
51
+ | 2 | **specVersion** | `$.specVersion` | `spdxVersion` | C | βœ… | 4.0 | CycloneDX specification version (e.g., "1.6") |
52
+ | 3 | **serialNumber** | `$.serialNumber` | `spdxId` | C | βœ… | 4.0 | Unique identifier for this SBOM instance |
53
+ | 4 | **version** | `$.version` | `releaseTime` | C | βœ… | 4.0 | Version of this SBOM document |
54
+
55
+ **Category Result:** 4/4 fields β€’ **20.0/20 points** β€’ **100% weight**
56
+
57
+ ---
58
+
59
+ ### Metadata Category
60
+
61
+ AI-specific metadata and provenance information that provides context about the model's purpose, supply chain, and compliance. These fields help establish the model's intended use and regulatory context.
62
+
63
+ | # | Field Name | CycloneDX Location | SPDX 3.0 Equivalent | Tier | AS | Points | Description |
64
+ |---|------------|-------------------|---------------------|------|--------|--------|-------------|
65
+ | 5 | **primaryPurpose** | `$.metadata.properties[name="primaryPurpose"]` | `ai_intendedUse` | C | πŸ”„ | 4.0 | Primary intended use of the AI model |
66
+ | 6 | **suppliedBy** | `$.metadata.properties[name="suppliedBy"]` | `supplier` | C | βœ… | 4.0 | Organization or individual who supplied the model |
67
+ | 7 | **standardCompliance** | `$.metadata.properties[name="standardCompliance"]` | `ai_standardCompliance` | S | 🎯 | 1.0 | Compliance with AI/ML standards and regulations |
68
+ | 8 | **domain** | `$.metadata.properties[name="domain"]` | `ai_domain` | S | 🎯 | 1.0 | Application domain or industry vertical |
69
+ | 9 | **autonomyType** | `$.metadata.properties[name="autonomyType"]` | `ai_autonomyType` | S | 🎯 | 1.0 | Level of autonomy in decision-making |
70
+
71
+ **Category Result:** 5/5 fields β€’ **20.0/20 points** β€’ **100% weight**
72
+
73
+ ---
74
+
75
+ ### Component Basic Category
76
+
77
+ Core component identification and description fields that define the essential characteristics of the AI model. These fields provide fundamental information needed for model identification and basic documentation.
78
+
79
+ | # | Field Name | CycloneDX Location | SPDX 3.0 Equivalent | Tier | AS | Points | Description |
80
+ |---|------------|-------------------|---------------------|------|--------|--------|-------------|
81
+ | 10 | **name** | `$.components[0].name` | `name` | C | βœ… | 4.0 | Human-readable name of the model |
82
+ | 11 | **type** | `$.components[0].type` | `ai_AIPackage` type | I | βœ… | 2.0 | Component type (always "machine-learning-model") |
83
+ | 12 | **purl** | `$.components[0].purl` | `externalRefs[type="purl"]` | I | βœ… | 2.0 | Package URL for unique identification |
84
+ | 13 | **description** | `$.components[0].description` | `summary` | I | βœ… | 2.0 | Brief description of the model's purpose |
85
+ | 14 | **licenses** | `$.components[0].licenses` | `licenseConcluded` | I | βœ… | 2.0 | License information for the model |
86
+
87
+ **Category Result:** 5/5 fields β€’ **20.0/20 points** β€’ **100% weight**
88
+
89
+ ---
90
+
91
+ ### Component Model Card Category
92
+
93
+ Advanced AI model documentation fields that provide detailed information about model characteristics, training, performance, and usage considerations. This category represents the most comprehensive AI-specific documentation.
94
+
95
+ | # | Field Name | CycloneDX Location | SPDX 3.0 Equivalent | Tier | AS | Points | Description |
96
+ |---|------------|-------------------|---------------------|------|--------|--------|-------------|
97
+ | 15 | **energyConsumption** | `$.components[0].modelCard.properties[name="energyConsumption"]` | `ai_energyConsumption` | I | 🎯 | 2.0 | Energy consumption information |
98
+ | 16 | **hyperparameter** | `$.components[0].modelCard.properties[name="hyperparameter"]` | `ai_hyperparameter` | I | 🎯 | 2.0 | Key hyperparameters used in training |
99
+ | 17 | **limitation** | `$.components[0].modelCard.limitation` | `ai_limitation` | I | 🎯 | 2.0 | Known limitations and constraints |
100
+ | 18 | **safetyRiskAssessment** | `$.components[0].modelCard.properties[name="safetyRiskAssessment"]` | `ai_safetyRiskAssessment` | I | 🎯 | 2.0 | Safety and risk assessment information |
101
+ | 19 | **typeOfModel** | `$.metadata.properties[name="typeOfModel"]` | `ai_typeOfModel` | I | 🎯 | 2.0 | Technical classification of the model type |
102
+ | 20 | **modelExplainability** | `$.components[0].modelCard.properties[name="modelExplainability"]` | `ai_modelExplainability` | S | 🎯 | 1.0 | Information about model interpretability |
103
+ | 21 | **energyQuantity** | `$.components[0].modelCard.properties[name="energyQuantity"]` | `ai_energyQuantity` | S | 🎯 | 1.0 | Quantitative energy consumption metrics |
104
+ | 22 | **energyUnit** | `$.components[0].modelCard.properties[name="energyUnit"]` | `ai_energyUnit` | S | 🎯 | 1.0 | Units for energy consumption measurements |
105
+ | 23 | **informationAboutTraining** | `$.components[0].modelCard.properties[name="informationAboutTraining"]` | `ai_informationAboutTraining` | S | 🎯 | 1.0 | Details about the training process |
106
+ | 24 | **informationAboutApplication** | `$.components[0].modelCard.properties[name="informationAboutApplication"]` | `ai_informationAboutApplication` | S | 🎯 | 1.0 | Information about intended applications |
107
+ | 25 | **metric** | `$.components[0].modelCard.properties[name="metric"]` | `ai_metric` | S | 🎯 | 1.0 | Performance metrics and evaluation results |
108
+ | 26 | **metricDecisionThreshold** | `$.components[0].modelCard.properties[name="metricDecisionThreshold"]` | `ai_metricDecisionThreshold` | S | 🎯 | 1.0 | Decision thresholds for model outputs |
109
+ | 27 | **modelDataPreprocessing** | `$.components[0].modelCard.properties[name="modelDataPreprocessing"]` | `ai_modelDataPreprocessing` | S | 🎯 | 1.0 | Data preprocessing and preparation steps |
110
+ | 28 | **useSensitivePersonalInformation** | `$.components[0].modelCard.properties[name="useSensitivePersonalInformation"]` | `ai_useSensitivePersonalInformation` | S | 🎯 | 1.0 | Information about sensitive data usage |
111
+
112
+ **Category Result:** 14/14 fields β€’ **30.0/30 points** β€’ **100% weight**
113
+
114
+ ---
115
+
116
+ ### External References Category
117
+
118
+ Links and distribution information that provide access to the model and related resources. These fields enable model discovery and access.
119
+
120
+ | # | Field Name | CycloneDX Location | SPDX 3.0 Equivalent | Tier | AS | Points | Description |
121
+ |---|------------|-------------------|---------------------|------|--------|--------|-------------|
122
+ | 29 | **downloadLocation** | `$.externalReferences[type="distribution"]` | `downloadLocation` | C | βœ… | 10.0 | Primary location to download the model |
123
+
124
+ **Category Result:** 1/1 fields β€’ **10.0/10 points** β€’ **100% weight**
125
+
126
+ ---
127
+
128
+ ## Scoring Summary
129
+
130
+ The AI SBOM Generator uses a weighted scoring system to assess documentation completeness across five categories:
131
+
132
+ | Category | Fields | Max Points | Weight | Description |
133
+ |----------|--------|------------|--------|-------------|
134
+ | **Required Fields** | 4 | 20.0 | 20% | Essential CycloneDX infrastructure |
135
+ | **Metadata** | 5 | 20.0 | 20% | AI-specific metadata and provenance |
136
+ | **Component Basic** | 5 | 20.0 | 20% | Core component identification |
137
+ | **Component Model Card** | 14 | 30.0 | 30% | Advanced AI model documentation |
138
+ | **External References** | 1 | 10.0 | 10% | Distribution and reference links |
139
+ | **TOTAL** | **29** | **100.0** | **100%** | Maximum possible completeness score |
140
+
141
+ ### Tier Impact on Scoring
142
+ - **Critical fields** (C) have 3x weight multiplier and significantly impact scoring
143
+ - **Important fields** (I) have 2x weight multiplier and enhance documentation quality
144
+ - **Supplementary fields** (S) have 1x weight multiplier and provide additional context
145
+
146
+ ---
147
+
148
+ ## Field Extraction Strategies
149
+
150
+ The AI SBOM Generator employs a multi-strategy extraction approach for each field, attempting extraction in the following priority order:
151
+
152
+ 1. **HuggingFace API** β†’ Direct metadata extraction (High confidence)
153
+ 2. **Model Card** β†’ Structured documentation parsing (Medium-high confidence)
154
+ 3. **Config Files** β†’ Technical details from JSON files (High confidence)
155
+ 4. **Text Patterns** β†’ Regex extraction from README (Medium confidence)
156
+ 5. **Intelligent Inference** β†’ Smart defaults from context (Medium confidence)
157
+ 6. **Fallback Values** β†’ Placeholders when no data available (Low/no confidence)
158
+
159
+ This multi-strategy approach ensures maximum field coverage while maintaining confidence scoring for each extracted value.
160
+
161
+ ---
162
+
163
+ ## Standards Compatibility
164
+
165
+ ### CycloneDX 1.6 (Primary Format)
166
+ - **Primary structure** follows CycloneDX 1.6 specification
167
+ - **Model Card extension** provides AI-specific documentation
168
+ - **Properties mechanism** allows flexible field addition
169
+ - **JSON Schema validation** ensures structural compliance
170
+
171
+ ### SPDX 3.0 AI Profile (Reference Compatibility)
172
+ - **100% field coverage** with official SPDX 3.0 AI Profile specification
173
+ - **17/29 fields (59%)** have exact field name matches
174
+ - **Compatible data types** aligned with SPDX type system
175
+ - **Future dual-format support** enables SPDX 3.0 output
176
+
177
+ ### Interoperability
178
+ - **Standards-compliant output** can be converted between formats
179
+ - **AI field preservation** maintains semantic meaning across standards
180
+ - **Tool compatibility** with both CycloneDX and SPDX ecosystems
181
+
182
+ ---
183
+
184
+ ## Usage Notes
185
+
186
+ ### Configuration and Customization
187
+ - **Registry-driven extraction**: All fields are configurable via JSON registry
188
+ - **Scoring weights**: Adjustable per field and category
189
+ - **Tier assignments**: Customizable based on use case requirements
190
+ - **Extraction strategies**: Configurable priority and methods
191
+
192
+ ### Field Addition and Modification
193
+ - **New fields**: Can be added to registry without code changes
194
+ - **Weight adjustments**: Modify scoring impact through configuration
195
+ - **Category organization**: Fields can be reorganized by category
196
+ - **Validation rules**: Configurable per field
197
+
198
+ ### Performance Characteristics
199
+ - **Automatic field discovery**: System attempts extraction for all registry fields
200
+ - **Graceful degradation**: Individual field failures don't stop overall extraction
201
+ - **Confidence scoring**: Each field extraction includes confidence assessment
202
+ - **Comprehensive logging**: Detailed extraction results for debugging
203
+
204
+ This comprehensive field mapping serves as the definitive reference for the AI SBOM Generator's field extraction, scoring, and documentation capabilities, with full standards compatibility for future interoperability.
docs/AI_SBOM_Generator_System_Architecture.md ADDED
@@ -0,0 +1,423 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI SBOM Generator System Architecture
2
+
3
+ ## Overview
4
+
5
+ The AI SBOM Generator is a configurable system that automatically generates Software Bill of Materials (SBOM) documents for AI models hosted on HuggingFace. The system uses a registry-driven architecture that allows for dynamic field configuration without code changes.
6
+
7
+ ## System Architecture
8
+
9
+ ### Core Components
10
+
11
+ ```
12
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
13
+ β”‚ AI SBOM Generator β”‚
14
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
15
+ β”‚ Web Interface (FastAPI + HTML Templates) β”‚
16
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
17
+ β”‚ API Layer β”‚
18
+ β”‚ β”œβ”€β”€ Generation Endpoints β”‚
19
+ β”‚ β”œβ”€β”€ Scoring Endpoints β”‚
20
+ β”‚ └── Batch Processing β”‚
21
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
22
+ β”‚ Core Generation Engine β”‚
23
+ β”‚ β”œβ”€β”€ AIBOMGenerator (generator.py) β”‚
24
+ β”‚ β”œβ”€β”€ Enhanced Extractor (enhanced_extractor.py) β”‚
25
+ β”‚ └── Field Registry Manager (field_registry_manager.py)β”‚
26
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
27
+ β”‚ Configuration Layer β”‚
28
+ β”‚ β”œβ”€β”€ Field Registry (field_registry.json) β”‚
29
+ β”‚ β”œβ”€β”€ Scoring Configuration β”‚
30
+ β”‚ └── AIBOM Generation Rules β”‚
31
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
32
+ β”‚ Data Sources β”‚
33
+ β”‚ β”œβ”€β”€ HuggingFace API β”‚
34
+ β”‚ β”œβ”€β”€ Model Cards β”‚
35
+ β”‚ β”œβ”€β”€ Configuration Files β”‚
36
+ β”‚ └── README Content β”‚
37
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
38
+ ```
39
+
40
+ ### Key Features
41
+
42
+ - **Registry-Driven Configuration**: All fields and scoring rules defined in JSON
43
+ - **Multi-Strategy Extraction**: 6 different extraction methods per field
44
+ - **Standards Compliance**: CycloneDX 1.6 compatible output
45
+ - **Configurable Scoring**: Weighted scoring system with tier-based multipliers
46
+ - **Automatic Field Discovery**: New fields added to registry are automatically processed
47
+ - **Comprehensive Logging**: Detailed extraction and scoring logs for debugging
48
+
49
+ ## Process Workflow
50
+
51
+ ### 1. System Initialization
52
+
53
+ ```
54
+ System Initialization Process:
55
+
56
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
57
+ β”‚ System Startup β”‚
58
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
59
+ β”‚
60
+ β–Ό
61
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
62
+ β”‚ Load Field β”‚
63
+ β”‚ Registry β”‚
64
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
65
+ β”‚
66
+ β–Ό
67
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
68
+ β”‚ Initialize β”‚
69
+ β”‚ Registry Manager β”‚
70
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
71
+ β”‚
72
+ β–Ό
73
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
74
+ β”‚ Load Scoring β”‚
75
+ β”‚ Configuration β”‚
76
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
77
+ β”‚
78
+ β–Ό
79
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
80
+ β”‚ Initialize β”‚
81
+ β”‚ Enhanced β”‚
82
+ β”‚ Extractor β”‚
83
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
84
+ β”‚
85
+ β–Ό
86
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
87
+ β”‚ System Ready β”‚
88
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
89
+ ```
90
+
91
+ **Steps:**
92
+ 1. **Load Field Registry**: Read `field_registry.json` containing all field definitions
93
+ 2. **Initialize Registry Manager**: Create manager instance with loaded configuration
94
+ 3. **Load Scoring Configuration**: Parse scoring weights, tiers, and category definitions
95
+ 4. **Initialize Enhanced Extractor**: Create extractor with registry-driven field discovery
96
+ 5. **System Ready**: All components initialized and ready for SBOM generation
97
+
98
+ ### 2. SBOM Generation Process
99
+
100
+ ```
101
+ SBOM Generation Workflow:
102
+
103
+ User Request ──┐
104
+ β”‚
105
+ β–Ό
106
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
107
+ β”‚ Validate Model │─────▢│ Fetch Model Info │───▢│ Initialize β”‚
108
+ β”‚ ID β”‚ β”‚ β”‚ β”‚ Enhanced β”‚
109
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Extractor β”‚
110
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
111
+ β”‚
112
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
113
+ β”‚ Return SBOM + │◀───│ Calculate β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
114
+ β”‚ Score β”‚ β”‚ Completeness β”‚
115
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Score β”‚
116
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
117
+ β–²
118
+ β”‚
119
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
120
+ β”‚ Generate AIBOM β”‚
121
+ β”‚ Structure β”‚
122
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
123
+ β–²
124
+ β”‚
125
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
126
+ β”‚ Multi-Strategy β”‚
127
+ β”‚ Field Processing β”‚
128
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
129
+ β–²
130
+ β”‚
131
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
132
+ β”‚ Registry-Driven β”‚
133
+ β”‚ Extraction β”‚
134
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
135
+ ```
136
+
137
+ #### 2.1 Model Information Gathering
138
+
139
+ **Input**: HuggingFace model ID (e.g., `deepseek-ai/DeepSeek-R1`)
140
+
141
+ **Process**:
142
+ 1. **Validate Model ID**: Check format and accessibility
143
+ 2. **Fetch Model Info**: Retrieve metadata from HuggingFace API
144
+ 3. **Download Model Card**: Get structured model documentation
145
+ 4. **Fetch Configuration Files**: Download `config.json`, `tokenizer_config.json`
146
+ 5. **Extract README Content**: Parse model description and documentation
147
+
148
+ #### 2.2 Registry-Driven Field Extraction
149
+
150
+ **For each of the 29 registry fields:**
151
+
152
+ ```
153
+ Multi-Strategy Field Extraction:
154
+
155
+ Field from Registry
156
+ β”‚
157
+ β–Ό
158
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” Success?
159
+ β”‚ Strategy 1: │────────┐
160
+ β”‚ HuggingFace API β”‚ β”‚
161
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
162
+ β”‚ β”‚
163
+ β”‚ Failure β”‚
164
+ β–Ό β”‚
165
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
166
+ β”‚ Strategy 2: β”‚ β”‚
167
+ β”‚ Model Card β”‚ β”‚
168
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
169
+ β”‚ β”‚
170
+ β”‚ Failure β”‚
171
+ β–Ό β”‚
172
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
173
+ β”‚ Strategy 3: β”‚ β”‚
174
+ β”‚ Config Files β”‚ β”‚
175
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
176
+ β”‚ β”‚
177
+ β”‚ Failure β”‚
178
+ β–Ό β”‚
179
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
180
+ β”‚ Strategy 4: β”‚ β”‚
181
+ β”‚ Text Patterns β”‚ β”‚
182
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
183
+ β”‚ β”‚
184
+ β”‚ Failure β”‚
185
+ β–Ό β”‚
186
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
187
+ β”‚ Strategy 5: β”‚ β”‚
188
+ β”‚ Intelligent β”‚ β”‚
189
+ β”‚ Inference β”‚ β”‚
190
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
191
+ β”‚ β”‚
192
+ β”‚ Failure β”‚
193
+ β–Ό β”‚
194
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
195
+ β”‚ Strategy 6: β”‚ β”‚
196
+ β”‚ Fallback Value β”‚ β”‚
197
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
198
+ β”‚ β”‚
199
+ β–Ό β”‚
200
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β—€β”€β”€β”€β”€β”€β”€β”€β”˜
201
+ β”‚ Store Result & β”‚
202
+ β”‚ Log Outcome β”‚
203
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
204
+ ```
205
+
206
+ **Extraction Strategies**:
207
+
208
+ 1. **HuggingFace API Extraction**
209
+ - Direct field mapping from API response
210
+ - High confidence, structured data
211
+ - Fields: `name`, `author`, `license`, `tags`, etc.
212
+
213
+ 2. **Model Card Extraction**
214
+ - Parse structured model card YAML/metadata
215
+ - Medium-high confidence
216
+ - Fields: `limitation`, `metrics`, `datasets`, etc.
217
+
218
+ 3. **Configuration File Extraction**
219
+ - Mine technical details from config files
220
+ - High confidence for technical fields
221
+ - Fields: `typeOfModel`, `hyperparameter`, etc.
222
+
223
+ 4. **Text Pattern Extraction**
224
+ - Regex-based extraction from README content
225
+ - Medium confidence, requires validation
226
+ - Fields: `safetyRiskAssessment`, `informationAboutTraining`, etc.
227
+
228
+ 5. **Intelligent Inference**
229
+ - Smart defaults based on model characteristics
230
+ - Medium confidence, contextual
231
+ - Fields: `primaryPurpose`, `domain`, etc.
232
+
233
+ 6. **Fallback Values**
234
+ - Placeholder values when no data available
235
+ - Low/no confidence, maintains structure
236
+ - Ensures complete SBOM structure
237
+
238
+ #### 2.3 AIBOM Structure Generation
239
+
240
+ **Process**:
241
+ 1. **Create Base Structure**: Initialize CycloneDX 1.6 compliant structure
242
+ 2. **Populate Metadata Section**: Add extracted metadata fields
243
+ 3. **Build Component Section**: Create model component with extracted data
244
+ 4. **Add Model Card**: Include AI-specific model card information
245
+ 5. **Generate External References**: Add distribution and repository links
246
+ 6. **Create Dependencies**: Define model dependencies and relationships
247
+ 7. **Validate Structure**: Ensure CycloneDX compliance
248
+
249
+ **Output Structure**:
250
+ ```json
251
+ {
252
+ "bomFormat": "CycloneDX",
253
+ "specVersion": "1.6",
254
+ "serialNumber": "urn:uuid:...",
255
+ "version": 1,
256
+ "metadata": {
257
+ "timestamp": "...",
258
+ "tools": [...],
259
+ "component": {...},
260
+ "properties": [...]
261
+ },
262
+ "components": [{
263
+ "type": "machine-learning-model",
264
+ "name": "...",
265
+ "modelCard": {...},
266
+ "properties": [...]
267
+ }],
268
+ "externalReferences": [...],
269
+ "dependencies": [...]
270
+ }
271
+ ```
272
+
273
+ ### 3. Completeness Scoring Process
274
+
275
+ ```
276
+ Completeness Scoring Process:
277
+
278
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
279
+ β”‚ Extracted Fields β”‚
280
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
281
+ β”‚
282
+ β–Ό
283
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
284
+ β”‚ Categorize β”‚
285
+ β”‚ Fields β”‚
286
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
287
+ β”‚
288
+ β–Ό
289
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
290
+ β”‚ Apply Tier β”‚
291
+ β”‚ Weights β”‚
292
+ β”‚ β€’ Critical: 3x β”‚
293
+ β”‚ β€’ Important: 2x β”‚
294
+ β”‚ β€’ Supplement: 1x β”‚
295
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
296
+ β”‚
297
+ β–Ό
298
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
299
+ β”‚ Calculate β”‚
300
+ β”‚ Category Scores β”‚
301
+ β”‚ β€’ Required: 20 β”‚
302
+ β”‚ β€’ Metadata: 20 β”‚
303
+ β”‚ β€’ Basic: 20 β”‚
304
+ β”‚ β€’ ModelCard: 30 β”‚
305
+ β”‚ β€’ ExtRefs: 10 β”‚
306
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
307
+ β”‚
308
+ β–Ό
309
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
310
+ β”‚ Sum Weighted β”‚
311
+ β”‚ Scores β”‚
312
+ β”‚ (Max: 100) β”‚
313
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
314
+ β”‚
315
+ β–Ό
316
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
317
+ β”‚ Generate Score β”‚
318
+ β”‚ Report β”‚
319
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
320
+ ```
321
+
322
+ **Scoring Algorithm**:
323
+
324
+ 1. **Field Categorization**: Group fields by category (required_fields, metadata, etc.)
325
+ 2. **Tier Weight Application**: Apply multipliers (Critical: 3x, Important: 2x, Supplementary: 1x)
326
+ 3. **Category Score Calculation**: `(Fields Present / Total Fields) Γ— Category Weight`
327
+ 4. **Final Score**: Sum of all category scores (max 100)
328
+
329
+ **Category Weights**:
330
+ - Required Fields: 20 points
331
+ - Metadata: 20 points
332
+ - Component Basic: 20 points
333
+ - Component Model Card: 30 points
334
+ - External References: 10 points
335
+
336
+ ### 4. Output Generation
337
+
338
+ **Generated Artifacts**:
339
+ 1. **AIBOM JSON**: CycloneDX 1.6 compliant SBOM document
340
+ 2. **Completeness Score**: Numerical score (0-100) with breakdown
341
+ 3. **Field Checklist**: Detailed field-by-field analysis
342
+ 4. **Extraction Report**: Confidence levels and data sources
343
+ 5. **Validation Results**: Compliance and quality checks
344
+
345
+ ## Configuration Management
346
+
347
+ ### Field Registry Structure
348
+
349
+ The system is driven by `field_registry.json` which defines:
350
+
351
+ - **Field Definitions**: All 29 extractable fields
352
+ - **Scoring Configuration**: Weights, tiers, and categories
353
+ - **AIBOM Generation Rules**: Structure and validation rules
354
+ - **Extraction Strategies**: How each field should be extracted
355
+
356
+ ### Dynamic Configuration
357
+
358
+ **Adding New Fields**:
359
+ 1. Add field definition to `field_registry.json`
360
+ 2. System automatically discovers and attempts extraction
361
+ 3. No code changes required
362
+
363
+ **Updating Scoring**:
364
+ 1. Modify weights in registry configuration
365
+ 2. Changes take effect immediately
366
+ 3. Consistent scoring across all models
367
+
368
+ ## Quality Assurance
369
+
370
+ ### Validation Layers
371
+
372
+ 1. **Input Validation**: Model ID format and accessibility
373
+ 2. **Extraction Validation**: Data type and format checking
374
+ 3. **Structure Validation**: CycloneDX schema compliance
375
+ 4. **Scoring Validation**: Mathematical correctness
376
+ 5. **Output Validation**: JSON schema and completeness
377
+
378
+ ### Error Handling
379
+
380
+ - **Individual Field Failures**: Don't stop overall processing
381
+ - **Graceful Degradation**: Fallback to lower-confidence strategies
382
+ - **Comprehensive Logging**: Detailed error tracking and debugging
383
+ - **Recovery Mechanisms**: Automatic retry and alternative approaches
384
+
385
+ ## Performance Characteristics
386
+
387
+ ### Typical Processing Times
388
+
389
+ - **Single Model**: 2-5 seconds
390
+ - **Batch Processing**: 10-50 models/minute
391
+ - **Registry Loading**: <1 second
392
+ - **Field Extraction**: 1-3 seconds per model
393
+
394
+ ### Scalability Features
395
+
396
+ - **Concurrent Processing**: Multiple models processed simultaneously
397
+ - **Caching**: Model metadata and configuration caching
398
+ - **Rate Limiting**: Respectful API usage
399
+ - **Resource Management**: Memory and connection pooling
400
+
401
+ ## Integration Points
402
+
403
+ ### APIs
404
+
405
+ - **Generation API**: `/api/generate` - Single model AI SBOM generation, with download URL
406
+ - **Generation with Completness Score Report API**: `/api/generate-with-report` - Generation API with completness scoring report
407
+ - **Completness Score Report Only API**: `/api/models/{model_id}/score` - Get the completeness score for a model without generating AI SBOM
408
+
409
+ ### Data Sources
410
+
411
+ - **HuggingFace Hub**: Primary model metadata source
412
+ - **Model Repositories**: Direct file access for configurations
413
+ - **Model Cards**: Structured documentation parsing
414
+
415
+ ### Output Formats
416
+
417
+ - **CycloneDX JSON**: Primary SBOM format
418
+ - **Field Reports**: Human-readable analysis
419
+ - **CSV Exports**: Batch processing results
420
+ - **API Responses**: Structured JSON for integration
421
+
422
+ This architecture provides a robust, configurable, and standards-compliant solution for AI model SBOM generation with comprehensive field extraction and scoring capabilities.
423
+