Blog
How To: Access Controls in Databricks Using Immuta

How To: Access Controls in Databricks Using Immuta

Sumit Sarkar

Published January 20, 2021

Last edited: November 4, 2024

The pros and cons of role-based access control (RBAC) and attribute-based access control (ABAC) have been well documented and debated. There are even different implementations of ABAC that use static attributes, defeating the intent of safely scaling user adoption. But understanding the delineations between different approaches and soundly implementing one or both is equivalent to reading a recipe and having a fully cooked meal on the table.

Immuta integrates with Databricks to enable customers to dynamically control data access using fine-grained access controls. This allows Databricks customers using Immuta to ensure the right people have access to the right data at the right time – for only appropriate and approved purposes.

A Guide to Data Access Governance with Immuta and Databricks spells out in detail how exactly this works. But, given the potential value that Databricks users can unlock from their data when they use attribute-based access control — not to mention the scalability and efficiency it can provide — we wanted to share a sneak peek of what you can accomplish in Databricks with Immuta’s native access controls.

Databricks Architecture

To understand how Immuta’s attribute-based access control policy is deployed within Databricks, it’s important to understand Databricks’ architecture.

Databricks allows organizations to separate compute and storage, so they can store data in a cloud-based storage platform of their choice and spin up Databricks clusters on-demand when they’re ready to process the data. Decoupling storage and compute increases flexibility and cost efficiency, in addition to being more friendly to a data ecosystem that includes multiple cloud data platforms.

A Guide to Automated Data Access

In Databricks Using Immuta

Download ebook

Access Control Options

With compute and storage decoupled, Databricks users have three points at which to control data access:

Credential passthrough, or the cloud provider’s storage layer
Databricks table ACL, or the Databricks compute layer
Immuta

The credential passthrough option allows users to spin up Databricks and continue using existing access controls from cloud storage platforms natively in Databricks, without requiring any new role decisions or updates.

This is a good option for organizations that have already created cloud provider roles to the level of granularity required for accessing data in ADLS or S3, have simplistic access control requirements, or are working with a single cloud provider, through which they can manage all roles. For organizations that plan to adopt multiple cloud data platforms — which is more than half of data teams — and/or manage identities across cloud and on-prem data sources, the credential passthrough option is less attractive since it works with individual providers, but not across data sources or platforms. This option also does not enable dynamic data access control, including row-, column-, or cell-level security, or data masking techniques, which are core aspects of secure sensitive data use.

Meanwhile, the Databricks table ACL approach allows users to manage table access in a manner similar to relational database systems, with the only difference being that the data exists in external cloud storage.

This is a good option for organizations starting from scratch in the cloud that do not have established access controls on ADLS or S3, have only table-level access control requirements, or do not anticipate using compute services other than Databricks. For organizations with access control requirements beyond just table-level access controls — row- or column-level access controls, for instance — and/or dynamic data masking needs, the table ACL approach can be limiting. Additionally, if organizations plan to use a compute platform in addition to Databricks, this approach requires recreating and managing policies in each of those platforms.

Finally, Immuta enables customers with data access control needs — which, in this day and age, is essentially all organizations — to abstract policy decisions from the policy enforcement point(s). This allows user identities to be managed wherever the customer prefers, policy logic to be curated using attributes from other systems, and policies to be native enforced at query time.

Unlike the credential passthrough and table ACL approaches, Immuta doesn’t restrict policy enforcement to Databricks or require users to create policies for each individual cloud data platform. Dynamic attribute-based access control enables row-, column-, and cell-level access controls, while dynamic data masking capabilities, such as k-anonymization, randomized response, and differential privacy, allow sensitive data to be protected and used securely at scale, without risking inconsistent implementation or dealing with manual protection techniques. Immuta’s flexibility and scalability across Databricks and other cloud data platforms in an organization’s ecosystem make it the most dynamic, secure option.

Let’s take a closer look at how to implement attribute-based access controls in Databricks using Immuta.

Access Control Implementation

From the point that new data is introduced into Databricks, Immuta’s native integration streamlines access control implementation without requiring manual, risk-prone processes.

First, when a Databricks user uploads any data source to their Databricks workspace, their workflow management solution detects and moves the new data so it can be analyzed in Notebooks.
Next, once the analysis in Notebooks is complete, Databricks notifies the user with a link to a newly created Dashboard, which the user can access to verify the results.
Upon verification of the Notebooks analysis, the data is moved into raw and trusted data stages.
At this point, Databricks and Immuta create data sources and scan them for sensitive data. Sensitive data is automatically detected and tagged using Immuta’s sensitive data discovery feature, streamlining the process of flagging sensitive data for human inspection.
Finally, attribute-based access controls specific to the sensitive data’s attributes are dynamically enforced on Spark jobs.

This process means that data access is determined at query time. Applying data access controls at runtime eliminates the need to have all data users defined in a specific system, which vastly reduces the burden on data engineers and architects. Consequently, data teams can quickly and securely scale data access for both internal and external data consumers. They are also able to monitor and audit data usage.

To learn more about access controls and the other capabilities of Immuta’s native integration with Databricks, download A Guide to Data Access Governance with Immuta and Databricks.

If you’re a Databricks user, experience Immuta for yourself by starting a free trial today.

Give Immuta SaaS a Try

Request a demo

Snowflake Names Immuta 2024 Data Cloud Product Data Security Partner of the Year

We are thrilled to announce that Immuta has been named Snowflake’s Data Cloud Product Data Security Partner of the Year for the second year in a row. This recognition is deeper than the technical aspects of our product integration – it highlights our shared commitment to de-risking our customers’ data and delivering...

8 Reasons to Choose Immuta’s SaaS Data Security Platform

Data is the lifeblood of any organization, and keeping it secure is of the utmost importance. With the ever-increasing amount of data being generated and shared, organizations are facing more challenging data security threats than ever before. The rise of cyber-attacks, data breaches, and regulatory compliance requirements has made data security a...

3 Best Practices for Maximizing Data Management Efficiency

In 2020, global spending on cloud data services reached $312 billion. In 2022, Gartner estimates that this number will rise to a staggering $482 billion. This immense increase proves that the migration to and adoption of cloud platforms is the bona fide standard for contemporary information services and analysis. With...

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.

Book a demo

Platform Services

Metadata Registry

Data Discovery & Classification

Policy Entitlement Engine

Unified Audit

Data Domains

Apps

Data Marketplace

Data Access Governance

Ecosystem Partners

Native and API Integrations

Get Started

Take a tour of Access Governance

Take a tour of Data Marketplace

Schedule a live demo

Find a consulting partner

Data problems we solve

Unify data access control

Publish & find data products

Create & enforce policy

Monitor & audit data usage

Speed business innovation

Roles we empower

Data Product Owner

Data Consumer

Data Steward

Data Governor

Data IT

Industries we transform

Financial Services

Health & Life Sciences

Public Sector

The 10 Commandments of an Internal Data Marketplace

Get in the know

Blog

Resource Center

Data Fundamentals

Get a deeper look

Demo Hub

How-To Guides

Schedule a Live Demo

Get connected

Events & Webinars

Sign Up for Our Newsletter

Get support

Documentation

Customer Support

Get inspired

About us

Who We Are

Leadership

Customers

Partners

News

Connect with us

Careers

Upcoming Events

Contact Us

Customer spotlight