AI Security Wire

Published

- 3 min read

AI Software Bill of Materials: Tracking Model Components

img of AI Software Bill of Materials: Tracking Model Components

The software bill of materials (SBOM) — a machine-readable inventory of software components and their dependencies — is now a well-established security practice for traditional software. The AI equivalent, the AI-SBOM, applies the same principle to the more complex component graph of deployed AI systems. EU AI Act Article 11 explicitly requires technical documentation covering model provenance; the NIST AI RMF 2.0 supply chain section treats model component tracking as a baseline control. This guide covers what an AI-SBOM should contain and how to implement it.

Why AI Systems Need an Extended SBOM

Traditional SBOMs cover code and library dependencies. An AI system has additional components that standard SBOM tools don’t capture:

ComponentWhy It Matters
Base model (weights + architecture)Backdoors, biases, and capabilities are properties of the weights
Fine-tuning datasetDataset provenance affects copyright, PII, and poisoning risk
RLHF / alignment dataDetermines safety behaviour; manipulation here affects all downstream uses
LoRA / adapter weightsCan override base model behaviour; need independent provenance
Prompt templates / system promptsDefine application behaviour; versioning and integrity matter
Inference frameworkSerialisation vulnerabilities, hardware-specific behaviour
Embedding model (for RAG)Affects retrieval; poisoning here affects all downstream queries

A security incident or compliance audit may require answering questions about any of these components. Without an AI-SBOM, organisations cannot answer them.

Minimum AI-SBOM Fields

Drawing on CycloneDX ML extensions and emerging regulatory guidance, a minimum AI-SBOM for a deployed model should capture:

Base Model Record

   {
  "type": "machine-learning-model",
  "name": "llama-3-70b-instruct",
  "version": "3.0",
  "purl": "pkg:huggingface/meta-llama/[email protected]",
  "hashes": [
    { "alg": "SHA-256", "content": "a3f2...b91c" }
  ],
  "supplier": { "name": "Meta AI", "url": "https://ai.meta.com" },
  "licenses": [{ "id": "LLAMA-3-COMMUNITY" }],
  "properties": [
    { "name": "training-compute-flops", "value": "1.8e24" },
    { "name": "training-data-cutoff", "value": "2023-12" },
    { "name": "parameters", "value": "70000000000" }
  ]
}

Fine-Tune / Adapter Record

   {
  "type": "machine-learning-model",
  "name": "llama-3-70b-customer-service-lora",
  "version": "1.4.2",
  "hashes": [{ "alg": "SHA-256", "content": "7c4a...e230" }],
  "dependencies": ["pkg:huggingface/meta-llama/[email protected]"],
  "properties": [
    { "name": "adapter-type", "value": "LoRA" },
    { "name": "training-dataset-id", "value": "ds-customer-service-v3" },
    { "name": "training-date", "value": "2026-03-15" },
    { "name": "trainer", "value": "[email protected]" }
  ]
}

Training Dataset Record

   {
  "type": "data",
  "name": "customer-service-training-v3",
  "version": "3.0",
  "hashes": [{ "alg": "SHA-256", "content": "1b9f...4d72" }],
  "properties": [
    { "name": "record-count", "value": "142000" },
    { "name": "pii-assessed", "value": "true" },
    { "name": "pii-assessment-date", "value": "2026-02-28" },
    { "name": "data-sources", "value": "internal-crm,synthetic-generation" },
    { "name": "collection-date-range", "value": "2024-01/2026-01" },
    { "name": "data-controller", "value": "example-corp" }
  ]
}

Tooling

CycloneDX ML

The CycloneDX specification includes machine learning extensions (cdx:ml) that extend the standard BOM format. The cyclonedx-python-lib supports generating AI-SBOMs programmatically:

   from cyclonedx.model.bom import Bom
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.model import HashType, HashAlgorithm, XsUri
from packageurl import PackageURL

bom = Bom()

model_component = Component(
    component_type=ComponentType.MACHINE_LEARNING_MODEL,
    name='llama-3-70b-instruct',
    version='3.0',
    purl=PackageURL(
        type='huggingface',
        namespace='meta-llama',
        name='Meta-Llama-3-70B-Instruct',
        version='3.0'
    ),
    hashes=[HashType(
        alg=HashAlgorithm.SHA_256,
        content='a3f2...b91c'
    )]
)

bom.components.add(model_component)

Model Registry Integration

AI-SBOMs should be generated at model registration time and stored alongside the model artefact. An example MLflow integration:

   import mlflow
import json

def register_model_with_sbom(model_path: str, sbom: dict, model_name: str):
    with mlflow.start_run():
        mlflow.log_artifact(model_path, "model")
        mlflow.log_dict(sbom, "ai-sbom.json")
        mlflow.set_tags({
            "sbom.version": sbom["version"],
            "sbom.base-model": sbom["components"][0]["name"],
            "sbom.training-date": sbom["metadata"]["timestamp"]
        })
        mlflow.register_model(
            f"runs:/{mlflow.active_run().info.run_id}/model",
            model_name
        )

Using the AI-SBOM

Incident Response

When a vulnerability is disclosed in a base model or dependency, query the AI-SBOM registry to identify all deployed systems using that component:

   def find_deployments_using_model(base_model_purl: str, registry) -> list:
    return [
        deployment for deployment in registry.all_deployments()
        if base_model_purl in deployment.sbom.dependency_graph()
    ]

Regulatory Compliance

The EU AI Act requires documentation of training data and model provenance for high-risk systems. An AI-SBOM that captures the dataset records above, including PII assessment status and data controller identity, directly satisfies Article 11 documentation requirements.

Supply Chain Auditing

Before deploying a third-party model or adapter, require a signed AI-SBOM from the supplier. Verify that:

  • The base model hash matches the published release
  • The training dataset provenance is documented
  • No known-vulnerable model versions are referenced