How to Implement Snowflake Data Masking Across Platforms

As cloud data platform adoption accelerates and organizations become more reliant on data, teams using Snowflake as the primary platform for BI must have a tool that enables data masking across Snowflake and any other platform in their data stack. This article will walk through how Immuta delivers on this need with centralized, universal data access control, sensitive data detection and classification, and dynamic data masking.

Regardless of what technologies you use, these concepts apply across cloud services such as DatabricksStarburstAmazon RedshiftAzure Synapse, and others, in addition to different relational databases hosted in AWS, Azure, or GCP.

Snowflake Data Masking for Sensitive Data

Let’s assume that you have been asked to mask all personally identifiable information (PII) data in Snowflake and across the cloud data ecosystem, which can be hundreds or thousands of tables. However, you also must include an exception for the HR department to see PII.

If you query the data from a Snowflake worksheet, you can see some of the PII data with indirect identifiers, such as COUNTY_ID and MIDDLE_NAME, and direct identifiers, such as RESIDENTIAL_ADDRESS. Querying data in another platform might also return results containing PII data, so it’s important to address how to manage this sensitive data regardless of where it lives.

Automate Sensitive Data Discovery & Classification

Immuta provides sensitive data discovery capabilities to automate the detection and classification of sensitive attributes across Snowflake and your entire cloud data ecosystem. After registering data sources with Immuta, the catalog will standardize classification and tagging of direct, indirect, and sensitive identifiers consistently. This enables you to create policies in a dynamic and scalable way across Snowflake and other data platforms.

Consistent Snowflake Data Access Control

Using Immuta’s policy-as-code capabilities, you can create a global masking policy to apply dynamic data masking across all fields in Snowflake and any other platform. This includes hashing, regular expression, rounding, conditional masking, replacing with null or constant, with reversibility, with format preserving masking, and with k-anonymization, as well as external masking.

Note that the policy applies to “everyone except” those possessing an attribute where “Department” is “Human Resources,” which is pulled from an external system. This dynamic approach is also known as attribute-based access control, and it can reduce roles by 100x, making data more manageable and reducing risk for data engineers and architects.

In addition to column controls, Immuta supports row-level filtering and dynamic privacy-enhancing technologies (PETs), such as differential privacy or randomized response.

Natively Enforced Snowflake Data Masking Policy

If we run the query from the Snowflake worksheet, we now see that all the columns tagged during sensitive data discovery as PII are dynamically masked without having to make copies or create and manage views. This policy is enforced natively in Snowflake as a secure view managed by Immuta, so the underlying data is not being copied or modified. This helps avoid risk and confusion associated with having multiple copies of data.

DIY Approaches for Snowflake Data Masking

If you want to take a DIY approach to Snowflake data masking, you can start by creating three roles – one for HR, one for non-HR and one to own secure views.

Next, create a masking policy:

CREATE OR REPLACE MASKING POLICY voter_mask AS
(val number) returns number ->
CASE
WHEN invoker_role()=’MY_PRIVACY_ADMIN’ THEN val
ELSE null
END;

Apply the policy on all of the PII columns. Below is an example for one of them:

ALTER TABLE PUBLIC.VOTER_TBL MODIFY COLUMN MIDDLE_NAME SET MASKING POLICY voter_mask;

Finally, create a secure view.

Here is a good write-up from the chief BI architect at EA (Electronic Arts) for additional context.

Controlling access to sensitive data in the cloud can be challenging as the amount of data, users, and cloud platforms grows. The DIY example above is specific to a table and requires very different approaches among Snowflake and any other platforms in your data ecosystem. Immuta provides a consistent way to automate these steps in a consistent, secure way across your cloud data ecosystem.

You can explore Snowflake data masking capabilities in Immuta further when you request a demo.