Source: Cagkan Sayin via Alamy Stock Photo
COMMENTARY
The days of large, monolithic apps are withering. Today's applications rely on microservices and code reuse, which makes development easier but creates complexity when it comes to tracking and managing the components they use.
This is why the software bill of materials (SBOM) has emerged as an indispensable tool for identifying what's in a software app, including the components, versions, and dependencies that reside within systems. SBOMs also deliver deep insights into dependencies, vulnerabilities, and risks that factor into cybersecurity.
An SBOM allows CISOs and other enterprise leaders to focus on what really matters by providing an up-to-date inventory of software components. This makes it easier to establish and enforce strong governance and spot potential problems before they spiral out of control.
Yet in the age of artificial intelligence (AI), the classic SBOM has some limitations. Emerging machine learning (ML) frameworks introduce remarkable opportunities, but they also push the envelope on risk and introduce a new asset to organizations: the machine learning model. Without strong oversight and controls over these models, an array of practical, technical, and legal problems can arise.
That's where machine learning bills of materials (MLBOMs) enter the picture. The framework tracks names, locations, versions, and licensing for assets that comprise an ML model. It also includes overarching information about the nature of the model, training configurations embedded in metadata, who owns it, various feature sets, hardware requirements, and more.
Why MLBOMs Matter
CISOs are realizing that AI and ML require a different security model — and the underlying training data and models that run them are frequently not tracked or governed. An MLBOM can help an organization avoid security risks and failures. It addresses critical factors like model and data provenance, safety ratings, and dynamic changes that extend beyond the scope of SBOM.
Because ML environments are in a constant state of flux and changes can take place with little or no human interaction, issues related to data consistency — including where it originated, how it was cleaned, and how it was labeled — are a constant concern.
For example, if a business analyst or data scientist determines that a data set is poisoned, the MLBOM simplifies the task of finding all the various touch points and models that were trained with that data.
MLBOMs Can Elevate Protection
Transparency, auditability, control, and forensic insight are all hallmarks of an MLBOM. With a comprehensive view of the "ingredients" that go into an ML model, an organization is equipped to manage its ML models safely.
Here are some ways to build a best practice framework around an MLBOM:
Recognize the need for an MLBOM: It's no secret that ML fuels business innovation and even disruption. Yet it also introduces significant risks that can extend to reputation, regulatory compliance, and legal issues. Having visibility into ML models is critically important.
Conduct essential due diligence: An MLBOM should integrate with the CI/CD pipeline and deliver a high level of clarity. Support for standard frameworks like JSON or OWASP's CycloneDX can unify SBOM and MLBOM processes.
Analyze policies, processes, and governance: It's essential to sync an MLBOM with an organization's workflows and business processes. This increases the odds that ML pipelines will work as intended, while minimizing risks related to cybersecurity, data privacy, compliance, and other risk-associated areas.
Use an MLBOM with machine learning gates: Rigorous controls and gateways lead to essential AI and ML guardrails. In this way, the business and the CSO can build on successes and harness ML to unlock greater cost savings, performance gains, and business value.
Machine learning is radically changing the business and IT landscape. By extending proven SBOM methodologies to ML through MLBOMs, it's possible to take a giant step toward boosting machine learning performance and protecting data and assets.