…a security model, a set of system design principles, and a coordinated cybersecurity and system management strategy based on an acknowledgement that threats exist both inside and outside traditional network boundaries. The Zero Trust security model eliminates implicit trust in any one element, node, or service…. In essence, a Zero Trust Architecture allows users full access but only to the bare minimum they need to perform their jobs. If a device is compromised, zero trust can ensure that the damage is contained. The Zero Trust Architecture security model assumes that a breach is inevitable or has likely already occurred, so it constantly limits access to only what is needed and looks for anomalous or malicious activity.”

It is something along the lines of common wisdom to describe cybersecurity as one of the biggest strategic challenges confronting the United States. And that challenge is growing in scope; In 2023, US federal agencies experienced a 5% increase in cybersecurity incidents — totaling over 32,000 — compared to the previous year.
The US Intelligence Community’s annual global threat assessment has repeatedly ranked cybersecurity as a top threat facing the nation, and it now includes “disruptive technology” like AI as well.
There is, however, one major fault with the commonly accepted wisdom about cybersecurity in the AI era: It has a blind spot.
More specifically, traditional cybersecurity measures — and thereby data governance as a whole — all too frequently fail to take into account data science and the security vulnerabilities that are unique to AI systems. Put simply, the policies being developed and deployed to secure our software systems do not clearly apply to data science activities and the AI systems they give rise to.
This means that just as we are collectively encouraging the integration of next-gen AI technologies, we are also seeking to secure software in ways that are fundamentally blind to the challenges they create. This paradox is reflected not just in our policies, but also in the software we are collectively adopting: We now spend more than ever on both AI systems and cybersecurity tools, and yet the state of our information security has never been worse. It seems that every day brings the announcement of a new vulnerability, hack, or breach.
Analyzing our blind spots
Without significant adjustment, the fact is that we cannot have both more AI and more security — at least not at the same time — without a change in the way we approach securing software and data. We therefore believe that an analysis of our most prominent blind spots should be the first step towards a solution. Perhaps most importantly, that is what we hope to accomplish in this article.
Specifically, we use the Executive Order (EO) on Improving the Nation’s Cybersecurity as our inspiration. It’s a document that’s ambitious and intelligently executed…but also symptomatic of the ways in which data science’s impact on cybersecurity is more generally overlooked. While our goal is not to criticize the EO — which we believe to be a laudable attempt to improve our collective cybersecurity — it does contain significant gaps, a discussion of which will help drive future improvements.
Ultimately, we hope to help the right hand of cybersecurity, so to speak, develop a better understanding of what the left hand of data science is doing.
We begin with the idea of zero trust principles.
Zero trust principles in practice
How can you maintain security in an environment plagued by, and asymmetrically friendly to, threat actors? The current, widely accepted answer is to assume “zero trust,” which requires assuming breaches in nearly all scenarios. Here is how the EO defines zero trust:
Exactly what this means in practice is clear in the environment of traditional software and traditional software controls: implementing risk-based access controls, ensuring that the least privileged access is enforced by default, embedding resiliency requirements into network architectures to minimize single points of failure, and more. Conceptually, zero trust can be thought of as a culmination of years of experience developing IT infrastructure and all the while watching attackers succeed.
The problem, however, is that none of this applies clearly to data science, which requires continuous access to data — and lots of it. Indeed, it’s rare that a data scientist even knows all the data they require at the beginning of any one analytics project. Instead, and in practice, data scientists frequently demand all the data available, and only then will they be able to deliver a model that sufficiently solves the problem at hand. “Give me all the data, and then I’ll tell you what I’ll do with it” might as well be the motto of every data scientist.
And this dynamic makes sense: Analytics models in general, and AI more specifically, require data to train upon. As one of us has written elsewhere, “Machine learning models are shaped by the data they train on. In simpler terms, they eat data for breakfast, lunch, and supper too” (with credit to co-author Dan Geer).
So how does zero trust fit into this environment, where users building AI systems actively require access to massive amounts of data in various formats? The simple answer is that it does not. The more complicated answer is that zero trust works for applications and production-ready AI models, but it does not work for training AI — a pretty significant carve-out, if we are serious about our investment in and adoption of AI.
A new kind of supply chain
The idea that software systems suffer from a supply chain issue is also common wisdom: Software systems are complex, and it can be easy to hide or obscure vulnerabilities within this complexity. Commendable studies, such as Olav Lysne’s examination of Huawei and the difficulty of fully certifying third party software, have been conducted on the subject. This is, at least in part, why the EO so forcefully emphasizes the importance of supply chain management, both the physical hardware and the software running on it.
Here’s how the EO summarizes the problem:
The development of commercial software often lacks transparency, sufficient focus on the ability of the software to resist attack, and adequate controls to prevent tampering by malicious actors. There is a pressing need to implement more rigorous and predictable mechanisms for ensuring that products function securely, and as intended. The security and integrity of “critical software” — software that performs functions critical to trust (such as affording or requiring elevated system privileges or direct access to networking and computing resources) — is a particular concern. Accordingly, the Federal Government must take action to rapidly improve the security and integrity of the software supply chain, with a priority on addressing critical software.”
The problem, however, is again one of mismatch: Efforts to focus on the software security do not apply to data science environments, which are predicated on access to data that in turn form the foundation for AI code. Whereas humans program software in creating traditional software systems, line-by-line and through painstaking effort, AI is largely “programmed” by the data it is trained upon, creating both new security vulnerabilities and challenges from a cybersecurity perspective.
Mitre, Microsoft, and a host of other organizations released a well-known adversarial threat matrix outlining all the ways in which machine learning systems can be attacked, which has evolved to include AI. A few highlights of AI risks include:
- Model poisoning, designed to undermine AI system performance based on certain triggers.
- Data poisoning, used to undermine the system’s performance by inserting malicious data into the underlying dataset.
- Model extraction, referring to stealing underlying training data or the model itself.
Georgetown University’s CSET has documented risks to the AI supply chain as well, underscoring that the ways in which AI systems can be attacked are expansive — and likely to grow over time.
So, what can we do about these types of AI security issues? The answer, like so many other things in the world of AI, is to focus on the data.
Knowing where the data came from, monitoring how it has been accessed and by whom, and tracking access in real time are the only long-term ways to keep tabs on these types of systems and proactively address new and evolving vulnerabilities. In other words, we must add efforts to track data to the already-complicated supply chain if we really seek to ensure that both our software and our AI is secure.
A new kind of scale — and urgency
Perhaps most importantly, as AI becomes more widely adopted, cybersecurity vulnerabilities are unlikely to grow in proportion to their underlying code base. Instead, they will scale in proportion to the data the AI systems are trained upon — meaning threats will grow exponentially.
At a high level, as bad as things seem today in the world of cybersecurity, they’re bound to get worse: Software systems have been limited in their size and complexity by the time it takes humans to manually write the code. Manual programming requires painstaking and careful effort, as any software developer will tell you, and as we noted above.
But as we move to a world in which data itself is the code — and therefore, more humans without technical expertise can access, use, and manipulate data — this limiting factor is likely to disappear. Based simply on the growing volume of data we generate, the opportunities to exploit digital systems are likely to increase beyond our imagination. We are approaching a world where there is no boundary between safe and unsafe systems, no clear tests to determine that any system is trustworthy. Instead, huge amounts of data are creating an ever-expanding attack surface as we deploy more AI.
The good news is that this new AI-driven world will give rise to boundless opportunities for innovation. Our Intelligence Community will know more about our adversaries in as close to real time as possible. Our Armed Forces will benefit from a new type of strategic intelligence, which will reshape the speed, and even the boundaries, associated with the battlefield. But for the reasons we’ve described above, this future is also likely to be afflicted with insecurities that are destined to grow at rates faster than human comprehension allows.
Which brings us back to the central thesis of this short piece: If we are to take cybersecurity seriously, we must understand and address how AI creates and exacerbates all these vulnerabilities. The same goes with our strategic investments in AI. We simply cannot have AI without a better understanding of its impact on security.
More simply stated, the long-term success of our cybersecurity policies will rest on how clearly they apply to the world of AI.
See how Immuta is enabling public sector agencies with greater visibility over data and protection for RAG-based generative AI systems here.
Immuta for Public Sector
See how our team is uniquely qualified to help provision secure data access for public sector agencies.