How Instacart Streamlined Its Approach to Data Policy Authoring

PETER KEOUGH
on August 15, 2023
Last edited: November 4, 2024
Default alt text

As organizations continue migrating data to the cloud, they often focus on the anticipated benefits: enhanced analytics, scalability, and data democratization, to name a few. With a results-driven focus, the challenges associated with this transformation can sometimes be overlooked.

One such challenge is the need to maintain an access management framework that streamlines comprehensive and dynamic data access control at scale. During their cloud migration and optimization process, the Instacart team found themselves facing this exact challenge. In his recent webinar Why Engineers Shouldn’t Write Data Governance Policies, Instacart Senior Software Engineer Kieran Taylor highlighted the data access challenges faced by his team–and provided insight into how they surmounted them.

[Learn More] The Next Era of Data at Instacart

Instacart’s Cloud Data Challenge

In the pursuit of providing customers access to the food they love, the Instacart team found themselves accruing a massive volume of data. To be specific, Taylor shared “one of the unpredicted side effects we’ve observed…is that you end up accruing a lot of data. We have over 250,000 tables and views, and that is in a single [database].”

You end up accruing a lot of data. We have over 250,000 tables and views, and that is in a single [database].”

Kieran Taylor, Senior Software Engineer, Instacart

The Instacart team needed to find ways to control access across many data warehouses, platforms, and tools, including both Databricks and Snowflake. This all had to be done in a nascent cloud environment that was subject to internal and external stakeholders’ evolving data sharing and consumption needs. According to Taylor, the fundamental question they faced was “Should this consumer be able to access this data, and how can we build a system which actually does that?”

While seemingly straightforward, this question posed a greater issue than the team anticipated. Relying on traditional, manual access management processes, the team soon fell victim to what Taylor refers to as the “get a manager’s approval flow.” This is a system that requires requests and approvals for every data access scenario, leading to approval bottlenecks, deferred responsibilities, and overall lengthy time-to-data. This kind of manual system and slow time-to-data  limited Instacart’s ability to provide a timely and tailored experience for their millions of active users. Without eliminating these obstacles, the team could not make effective and efficient use of their abundant data resources.

The Instacart Access Identity Crisis

These challenges came to a head when they noticed that their access control model was causing a technological identity crisis for team members.

The team’s RBAC (role-based access control) approach required users to adopt a range of different “identities” within their different platforms in order to access particular data sets. Taylor compared their process to the Jason Bourne film series, which follows an amnesia-stricken spy who takes on countless identities in order to fulfill his missions. But when Taylor framed their access permissions in this way to a developer on his team, the blood drained from their face. This convoluted means of access was clearly having a more detrimental effect than they’d anticipated.

The team needed to find a solution. To start, they recognized the most crucial factors on which they should be determining data access, including:

  • Business-Meaningful Attributes: Information about the data itself, understood best by data owners who have intimate knowledge of data and schemas.
  • User Attributes: Includes factors like users’ geographic location, title, position in the organization, and management.
  • Use Case Attributes: Includes factors like project timelines, purposes for data use, and context relative to company objectives.
  • Business Rules: Compliance laws and regulations, business rules and agreements, and any other regulatory concerns from governance, risk, and compliance (GRC) teams.

With dynamic, multifaceted access requirements and a role-based framework that could not scale with their massive amounts of data, Instacart needed to reconsider their data access model. Taylor recognized that access management–a process involving factors beyond just data attributes–requires a collaborative approach, not just engineering efforts. How could this kind of collaboration be built and maintained in Instacart’s data ecosystem?

How Instacart Streamlined Data Policy Authoring

Taylor and the Instacart team knew they needed to build an access control model that would involve an array of crucial stakeholders without compromising the ability to scale with business and data needs. Describing this scalable model, Taylor specified “When I say scalable, I mean that when your team and your data sets grow beyond the size where any single person or any single team can be intimately familiar with each of the data sources that you’re working with, the approach still works and doesn’t fall apart.”

When I say scalable, I mean that when your team and your data sets grow beyond the size where any single person or any single team can be intimately familiar with each of the data sources that you're working with, the approach still works and doesn't fall apart.”

Kieran Taylor, Senior Software Engineer, Instacart

In order to achieve this access management scalability, they focused on:

Separating Policy from Platform

When working with a data ecosystem that spans multiple platforms and databases, achieving consistent access control enforcement can be complicated. Many platforms have native controls, which are typically not written or applied in the same manner. The more platforms, users, and data sets you add to this system, the less confident your team can be about the consistency of your controls.

To address this challenge, Taylor’s team viewed policy authoring and implementation through the lens of a popular software engineering principle: the separation of concerns. As defined by SAP, this is a concept used to “separate an application into units, with minimal overlapping between the functions of the individual units.”

In Instacart’s case, “what we really want to separate out here is the individual mechanisms that we use for making requests on each of the data platforms,” said Taylor, “So a grant statement on Snowflake, to a grant to an individual user on Databricks, maybe an AWS IAM access policy–we don’t want any of that to be involved with answering the question of should this consumer be able to access this data.”

What we really want to separate out here is the individual mechanisms that we use for making requests on each of the data platforms."

Kieran Taylor, Senior Software Engineer, Instacart

With access management controlled on a higher plane that spans databases, the team was prepared to implement dynamic policies across the data ecosystem.

Authoring Dynamic, Comprehensive & Scalable Policies

Instacart’s team needed to be sure that the policies they wrote were understandable by all stakeholders, dynamically applicable across platforms, and manageable at scale. “When we’re writing these policies,” shared Taylor, “sometimes it’s going to be non-technical folks who are going to be involved in that process. We want to give them as many guardrails as possible and a nice GUI environment for actually making changes to that policy.”

To achieve this, the team focused on authoring and applying two types of policies:

  1. Subscription Policies: These policies are written and applied at the data level, and are focused on whether or not a given user should have access to a specific data set, rather than the specific types of information within that data set.
  2. Data Policies: These are slightly more technical controls, determining which actual types of data a user should be able to see once they’ve been granted access to a table or data set. This granular policy helps ensure that users see just what they need, and nothing more.

For example, an online marketplace service might have a data set containing users’ demographic data, such as names, addresses, and payment information. Users across departments might need access to this table in order to fulfill their customers’ needs, from paying for products/services to shipping and tracking. Access to the data set would be determined by a subscription policy, but visibility to specific rows would be contingent on intended usage. While a user from the finance team would need to see the payment information within this data set in order to bill the users, someone in the shipping department would have no right to see that information. This is where a data policy would restrict the shipping users from seeing the information they do not need.

Achieving a Unified View of Access & Activity

With policies decoupled from platforms and comprehensively authored, the Instacart team had one final requirement: proper surveillance. “We wanted to have that single pane of glass,” noted Taylor, “so instead of having to go onto each of the platforms that we have [and] listing out all of the roles that each of the grants made to them, having a single view across all platforms.”

With such a diverse data ecosystem, the team needed a way to track the application of their access controls across platforms, users, and tools for security and auditing purposes. It was also crucial that all stakeholders – not just the technical ones – had visibility into policy creation and enforcement. The GRC team needed to know that policies aligned with regulatory requirements, security stakeholders needed to understand threat mitigation efforts, and the data users themselves needed to know why their access was being restricted from certain data. By offering this visibility throughout the organization, the Instacart team had a better understanding of how and why their data was being used.

How to Achieve Scalable Data Policy Authoring & Application

By recognizing the need to scale data policy authoring and access controls, and gaining an understanding of their organization’s access needs, the Instacart team has created an access control model that can keep up with the speed of their business and users.

“If we can build the policies upfront which give default access to developers [for] the data which they are intended to have access to,” said Taylor, “we can actually reduce it to zero for time-to-data.” Even in cases where access requirements are outside of existing policies, the Instacart team has “a very clear path in place for people to get that access, and that reduces [time-to-data] as well. That makes the developers happy. It increases their productivity. It’s a good thing all around.”

If we can build the policies upfront which give default access to developers [for] the data which they are intended to have access to, we can actually reduce it to zero for time-to-data. If not, and it's outside of a policy which we've written, we have a very clear path in place for people to get that access, and that reduces [time-to-data] as well. That makes the developers happy. It increases their productivity. It's a good thing all around.”

Kieran Taylor, Senior Software Engineer, Instacart

This allows Instacart to avoid the dreaded “get a manager’s approval flow,” bypassing the bottlenecks of manual approvals and enabling automated data access management through dynamic subscription and data policies. All of this is done in a way that provides necessary stakeholders–from users, to managers, GRC, security, and engineers–with visibility into the policy authoring and application process.

To achieve this workflow, the Instacart team adopted the Immuta Data Security Platform to act as a unified platform for policy authoring, implementation, and continuous data monitoring. This enabled the team to separate policies from various ecosystem platforms, author plain-language controls that all stakeholders can understand, and maintain a constant view of access activity to ensure compliance with legal and business rules–all without impeding time-to-data for users.

To learn more about Instacart’s data access control journey, check out the full Why Engineers Shouldn’t Write Data Governance Policies webinar with Kieran Taylor. You can also find out more about how Instacart is evolving their data use in their blog The Next Era of Data at Instacart. If you’d like to see dynamic access controls in action, schedule a demo of Immuta today.

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.