Data masking is one of the most important tasks in data governance. It may seem simple in practice, but getting it right can be the difference between a straightforward path to properly secured data, and a complex, confusing experience that leaves your data exposed to breaches. To add to the complexity, dozens of platforms and products claim to offer comprehensive, dynamic data masking capabilities. How do you choose the right option for your needs?
Here’s everything you need to know about the best data masking tools and techniques.
What is Data Masking?
Data masking is a data access control and security measure that involves creating a fake but highly convincing version of your organization’s secure data. Its purpose is to protect data from breaches or leaks in instances where you need functional data sets for demonstration, training, or testing, but you don’t want to reveal actual user data.
Essentially, data masking uses the same format as your existing databases, but changes the data values. This process is done in such a way that the data cannot be reverse-engineered in order to reveal the original data points. Numbers and characters may be scrambled, hidden, substituted, or even encrypted in data masking.
Sign up today to receive Immuta Unlocked™ monthly news and updates!
35,000 data professionals receive our monthly newsletter. Sign up for the latest insights, best practices, resources, and more.
Why Does Data Masking Matter?
Data masking is essential because it protects against a wide range of threats from data loss and exfiltration, compromised accounts, and insecure third party connections. It also helps reduce the inherent risks of cloud data storage and allows data to be shared with authorized users without exposing private information.
What are the Leading Data Masking Tools?
Immuta
Going beyond simply providing data masking, Immuta is a full universal cloud data access control solution that lets businesses automate data access control on any cloud, using any data platform, at any scale. Our platform also provides sensitive data recovery, active data cataloging, dynamic data masking, and advanced auditing capabilities.
Privitar
Privitar is a relatively popular data masking and de-identification tool, though it’s relatively limited in terms of its total capabilities. It does allow collaboration across data owners, data consumers and data guardians to deliver safe data quickly when compared to traditional methods.
Redgate
Redgate is another leading data masking solution, providing a number of customization options and an interface many users describe as intuitive, though outdated. Redgate is often seen as a go-to solution when users are seeking a simple approach with a limited feature set.
How to Choose the Right Data Masking Tool
While every company’s data masking needs are unique, there are several criteria that you’ll want to keep in mind whenever you’re considering implementing a new data masking platform. In fact, your best bet is to use a platform that not only offers data masking, but a full suite of data access control tools to streamline and fully secure your data management and privacy needs. Here are the five top criteria for an effective data masking tool.
Dynamic Data Masking
Dynamic data masking allows you to manage access and privacy to data in order to stay compliant with your own internal rules and federal or industry regulations, all without having to copy or move data. Manually removing or copying data can be time consuming and inefficient, leading to delays or weakening data utility. Dynamic policies enable robust data masking capabilities without slowing down time to data access and use.
Sensitive Data Discovery
The best data masking and data access control tools provide the ability to automatically classify and tag identifiers as direct, indirect, or sensitive. This sensitive data discovery reduces manual processes, and accelerates and simplifies human inspection when necessary.
Universal Data Access Control
One of the most basic but important tenets of data governance is data access control, and to keep sensitive data truly secure, you need a tool that automates access control with fine-grained parameters. With platforms like Immuta, data engineering and operations teams can use fine-grained access control to ensure only the people who need access to data have it, without making copies or working within complicated, static role-based access parameters. The ability to enforce these access controls universally across any cloud compute platform helps ensure policies are consistent and scalable so no data falls through the cracks.
[Read More] RBAC vs. ABAC: Future-Proofing Access Control
Automated Auditing & Reporting
Platforms such as Immuta offer automated data policy enforcement, an essential tool that makes auditing data usage easier and faster than ever. The platform you choose should allow you to gather real-time insights into data usage across your organization with detailed, auto-generated reports that show which data was accessed, by whom, when it was accessed, and for what purpose.
What are the Different Types of Dynamic Data Masking?
When it comes to data masking, there are several different types available. Each type has its own advantages and disadvantages, and may be most efficient for specific applications. Here are some of the most common types of data masking for analytical databases.
Static
Static data masking allows you to create a copy of your existing database and sanitize it to remove all identifying information. The data is altered in the copy to the point that it can be safely shared without risk of privacy breaches. In this instance, an entirely new database is created, moved to a separate location, stripped of unnecessary information, then masked and shared with the intended audience or location.
A major limitation with static data masking is that by creating an altered copy of the original data, it can become difficult to maintain a single source of truth. As new data products are created, this may lead to confusion and data silos.
Dynamic
With dynamic data masking, data never needs to be stored in a secondary data storage environment the way it is with static masking. Instead, the data is effectively streamed from its original location directly to another system within the testing or development environment. This enables data engineering and operations teams to maintain control over the data and to preserve a single source of truth. Dynamic data masking also allows for scalability, as it does not require copying or moving data as data sets or users increase.
Deterministic
Deterministic data masking is a limited form of data masking that simply replaces every instance of one value with a different, predetermined value. For example, this may mean replacing every instance of the value ‘20’ with ‘25.’ While this approach is a fast and simple way to mask data, it carries the potential for reverse engineering of masked data to discover the original values. It’s essentially a simple ‘code’ in the traditional sense, and codes can often be cracked.
Real-time (On-the-Fly)
On-the-fly data masking involves sending small batches of masked data only when that data is requested. Each batch or subset is stored in the development or testing environment until needed. On-the-fly masking is usually applied at the start of a development product, between the production system and the dev environment, in order to prevent problems with security or compliance.
Though more secure than deterministic and static data masking, this approach is more ad hoc than dynamic data masking and is less scalable, particularly as demand for sensitive data analytics grows.
What are Data Masking Policies?
Here are some of the most common techniques used in data masking, from the most secure and complex to the most straightforward.
Encryption
Encrypted data is masked using a scrambling algorithm and cannot be accessed unless the recipient is given a decryption key. This type of data policy is considered to be the most secure form of data masking, but it’s also complicated. Encryption requires advanced technology in order to continually encrypt data, create and share encryption keys, and keep data secure.
Scrambling
In data scrambling, the characters of each data point are randomized into a new order. This approach is simple and easy to implement, but it only applies to some types of data. For instance, scrambling works for numerical or character-based data points, but not binary data points such as Male/Female.
Nulling
Nulling is simple — certain data points are simply Xed out, missing, or listed as null when viewed by an unauthorized data user. This is another simple and relatively secure approach, but its major pitfall is that the data tends to be less useful when used for development or testing. When considering the privacy vs. utility tradeoff, nulling skews toward privacy over utility.
Value Variance
In value variance, data points are replaced by ranges that include the original data point. This is a form of generalization. For example, if an individual’s age is 25-years-old, the data could be generalized to ‘21-30 years old.’ This maintains the data’s utility while still anonymizing the individual data subjects. However, one must be careful because if the data is not sufficiently generalized, quasi identifiers can be combined from other data sets to identify users even with value variance.
Read More: What Is Data Generalization?
Pseudonymization
This relatively new term was recently introduced by the GDPR, and it’s actually less a data masking policy than an aggregate of policies that includes masking, encryption, and hashing. The GDPR defines pseudonymization as any method used to guarantee data cannot be used to re-identify individuals, and generally entails the removal of any direct identifiers and quasi identifiers within that data.
Conclusion
Immuta not only offers dynamic data masking capabilities, but also a full suite of powerful tools to streamline and empower all of your data governance needs. If you’re ready to discover how Immuta can serve you, contact us today.
Data Masking 101: A Comprehensive Guide