It's 10 p.m. Do You Know Where Your AI Models Are Tonight?

9 months ago 46
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

A silvery robot looking at a pocketwatch which shows 10pm.

Source: Kirsty Pargeter via Alamy Stock Photo

If you thought the software supply chain security problem was difficult enough today, buckle up. The explosive growth in AI use is about to make those supply chain issues exponentially harder to navigate in the years to come. 

Developers, application security pros and DevSecOps professionals are called to fix the highest risk flaws that lurk in what seems like the endless combinations of open source and proprietary components that are woven into their applications and cloud infrastructure. But it's a constant battle trying to even understand which components they have, which ones are vulnerable, and which flaws put them most at risk. Clearly, they're already struggling to sanely manage these dependencies in their software as it is.

What's going to get harder is the multiplier effect that AI stands to add to the situation.

AI Models as Self-Executing Code

AI and machine learning (ML)-enabled tools are software just the same as any other kind of application—and their code is just as likely to suffer from supply chain insecurities. However, they add another asset variable to the mix that greatly increases the attack surface of the AI software supply chain: AI/ML models.

"What separates AI applications from every other form of software is that it relies in some way or fashion on a thing called a machine learning model," explains Daryan Dehghanpisheh, co-founder of Protect AI. "As a result, that machine learning model itself is now an asset in your infrastructure. When you have an asset in your infrastructure, you need the ability to scan your environment, identify where they are, what they contain, who has permissions, and what they do. And if you can't do that with models today, you can't manage them."

AI/ML models provide the foundation for an AI system's ability to recognize patterns, make predictions, make decisions, trigger actions, or create content.  But the truth is that most organizations don't even know how to even start gaining visibility into all of the AI models embedded in their software. Models and the infrastructure around them are built differently than other software components and traditional security and software tooling isn't built to scan for or to understand how AI models work or how they're flawed. This is what makes them unique, says Dehghanpisheh, who explains that they're essentially hidden pieces of self-executing code.

"A model, by design, is a self-executing piece of code. It has a certain amount of agency," says Dehghanpisheh. "If I told you that you have assets all over your infrastructure that you can't see, you can't identify, you don't know what they contain, you don't know what the code is, and they self-execute and have outside calls, that sounds suspiciously like a permission virus, doesn't it?"

An Early Observer of AI Insecurities

Getting ahead of this issue was the big impetus behind him and his co-founders launching Protect AI in 2022, which is one of a spate of new firms cropping up to address model security and data lineage issues that are looming in the AI era. Dehghanpisheh and co-founder Ian Swanson, saw a glimpse of the future when they worked previously together building AI/ML solutions at AWS. Dehghanpisheh was the global leader for AI/ML solution architects at the firm.

"During the time that we spent together at AWS we saw customers building AI/ML systems at an incredibly rapid pace long before generative AI captured the hearts and minds of everyone from the C-suite to Congress," he says, explaining that he worked with a range of engineers and business development experts, and also worked with customers extensively. "That's when we realized how and where the security vulnerabilities unique to AI/ML systems are."

He says that they observed three basic things about AI/ML that had incredible implications for the future of cybersecurity. The first was that the pace of adoption was so fast that they saw firsthand how quickly shadow IT entities were cropping up around AI development and business use that escaped the kind of governance that would oversee any other kind of development in the enterprise.

The second was that the majority of tools that were being used—whether commercial or open source—were built by data scientists and up-and-coming machine learning engineers who had never been trained in security concepts.

"As a result, you had really useful, very popular, very distributed, widely adopted tools that weren't built with a security-first mindset," he says.

AI Systems Not Built 'Security-First'

As a result, many AI/ML systems and shared tools lack the basics in authentication and authorization and often grant too much read and write access in file systems, he explains. Coupled with insecure network configurations and then those inherent problems in the models and organizations start getting bogged down cascading security issues in these highly complex, difficult to understand systems.

"That made us realize that the existing security tools, processes, frameworks, no matter how shift left you went, were missing the context that machine learning engineers, data scientists, and AI builders would need," he says

Finally, the third major observation he and Swanson made during those AWS days was that AI breaches weren't coming. They had already arrived.

"We saw customers have breaches on a variety of AI/ML systems that should have been caught but weren't," he says. "What that told us is that the set and the processes, as well as the incident response management elements, were not purpose-built for the way AI/ML was being architected. That problem has become much worse as generative AI picked up momentum"

Dehghanpisheh and Swanson also started seeing how models and training data were creating a unique new AI supply chain that would need to be considered just as seriously as the rest of the software supply chain. Just like with the rest of modern software development and cloud-native innovation, data scientists and AI experts have fueled advancements in AI/ML systems through rampant use of open source and shared componentry—including AI models and the data used to train them. So many AI systems—whether academic or commercial—are built using someone else's model. And like with the rest of modern development, the explosion in AI development keeps driving a huge daily influx of new model assets proliferated across the supply chain, which means keeping track of them all just keeps getting harder.

Take Hugging Face, for example. This is one of the most widely used repositories of open source AI models online today—its founders say they want to be the GitHub of AI. Back in November 2022, Hugging Face users had shared 93,501 different models with the community. A year later, in November 2023, that had blown up to 414,695 models. Just three months later, that number has expanded to 527,244. This is an issue whose scope is snowballing by the day. And it is going to put the software supply chain security problem 'on steroids,' says Dehghanpisheh.

A recent analysis by his firm found thousands of models that are openly shared on Hugging Face that can execute arbitrary code on model load or inference. While Hugging Face does some basic scanning of its repository for security issues, there are many models missed along the way—at least half of the highly risk models discovered in the research were not deemed unsafe by the platform and Hugging Face makes it clear in documentation that determining the safety of a model is ultimately the responsibility of its users. 

Steps for Tackling AI Supply Chain

Dehghanpisheh believes the lynchpin of cybersecurity in the AI era will start first by creating a structured understanding of AI lineage. That includes model lineage and data lineage, which are essentially the origin and history of these assets, how they've been changed, the metadata associated with them.

"That's the first place to start. You can't fix what you can't see and what you can't know and what you can't define, right?" he says.

Meantime, on the daily operational level he believes organizations need to build out capabilities to scan their models, looking for flaws that can impact not only the hardening of the system but the integrity of its output. This includes issues like AI bias and malfunction that could cause real-world physical harm from, say, an autonomous car crashing into a pedestrian.

"The first thing is you need to scan," he says. "The second thing is you need to understand those scans. And the third is then once you have something that's flagged, you essentially need to stop that model from activating. You need to restrict its agency."

The Push for MLSecOps

MLSecOps is a vendor-neutral movement that mirrors the DevSecOps movement in the traditional software world.

"Similar to the move from DevOps to DevSecOps, you've got to do two things at once. The first thing you've got to do is make the practitioners aware that security is a challenge and that it is a shared responsibility," he says. "The second thing you've got to do is give context and put security into tools that keep data scientists, machine learning engineers, AI builders on the bleeding edge and constantly innovating, but allowing the security concerns to disappear into the background."

In addition, he says organizations are going to have to start adding governance, risk and compliance policies and enforcement capabilities and incident response procedures that help govern the actions and processes that take place when insecurities are discovered. This means like with a solid DevSecOps ecosystem, MLSecOps will need strong involvement from business stakeholders all the way up the executive ladder.

The good news is that AI/ML security is benefiting from one thing that no other rapid technology innovation has had right out of the gate—namely, regulatory mandates right out of the gate. 

"Think about any other technology transition. Name one time that a federal regulator or even state regulators have said this early on, 'Whoa, whoa, whoa, you've got to tell me everything that's in it. You've got to prioritize knowledge of that system. You have to prioritize a bill of materials,'" he says. "There isn't any."

This means that many security leaders are more likely to get buy-in to build out AI security capabilities a lot earlier in the innovation lifecycle.  One of the most obvious signs of this support is the rapid shift to sponsor new job functions at organizations.

"The biggest difference that the regulatory mentality has brought to the table is that in January of 2023, the concept of a director of AI security was novel and didn't exist. But by June, you started seeing those roles," he says. "Now, they're everywhere and they're funded."

Read Entire Article