Automate HIPAA De-identification Methods on Amazon RDS

Sumit Sarkar on May 11, 2020
Last edited: November 4, 2024
Default alt text

Data engineers and product managers are often responsible for implementing various controls and audit capabilities when managing healthcare data. To enable faster, data-driven innovation, these data professionals – particularly those who come to healthcare from other industries like tech or financial services – apply best practices such as deploying a proven data analytics stack and establishing compliance with a set of regulations.

But then the reality sets in: healthcare data platforms – without proper data access control and security – may be responsible for re-identification of individuals from patient data. Data engineers and PMs may be personally liable for any re-identification.

This tutorial is written for those innovators new to healthcare that want to implement modern data architectures in a legal and compliant way, while building trust with their compliance team. The goal is to enable speed of innovation, without sacrificing security and governance.

It’s important to first know that HIPAA Safe Harbor requires:

  • 18 direct identifiers to be removed from data sources.
  • That data owners not have actual knowledge that data users could re-identify individuals.

Using Immuta’s automated policy enforcement and auditing, this tutorial will demonstrate how to de-identify a data set stored in Amazon RDS for PostgreSQL in accordance with Safe Harbor. This Policy will be enforced on-read, without the requiring data to be copied or moved. The same automation steps apply for other data sources on Amazon such as Athena, MySQL, EMR, S3, Databricks, Microsoft SQL Server, Redshift and others supported by Immuta.

Loading a Sample Healthcare Data Set into AWS RDS for PostgreSQL

The public data set from SMRT Columbus includes licensed healthcare facilities in the state of Ohio, and has been loaded into AWS RDS for PostgreSQL. Since it’s a public data set anyone can access, it does NOT contain patient data for purposes of the article, but has sensitive identifiers that would otherwise be classified as protected health information (PHI). If this were patient data being used for analytics, it would need to be de-identified per the section on uses and disclosures of PHI, 45 CFR §164.514(a)-(b) such as names, geographic subdivisions smaller than a state, dates that are directly related to an individual, phone numbers, and email.

Register the Amazon RDS Data Set with Immuta

From the Immuta console, click on the data sources icon on the left and click, + New Data Source, to create a new PostgreSQL connection and select the table “OHIO_LICENSED_HEALTHCARE_FACILITIES.” No data is ever stored in Immuta since this is a logical table.

Automatically Discover & Tag Sensitive

After setting up the Amazon RDS for PostgreSQL data source, Immuta will discover sensitive information in the data set. Sensitive Data Discovery is a capability built into Immuta that discovers and tags sensitive fields such as names, dates, and geographic locations in the screenshot.

Create a HIPAA Safe Harbor Global Policy

From the policies tab, you will find the policy is available by default and moved from staged to active state at a global scope. The rules in the policy are enforced based on the tagged fields and displayed in plain English so you can show your compliance team for full transparency. Immuta includes a no-code policy builder if you need to create additional policies specific to your organization or other regulation.

Certify the Policy Against Amazon RDS for PostgreSQL

Navigate to the data source, “OHIO_LICENSED_HEALTHCARE_FACILITIES”, certify that all 18 identifiers are properly tagged, and ensure that you have no knowledge that the data set can be used to identify individuals by clicking “Sign and Certify”. If you prefer to use an external data catalog, those tags can also be integrated into Immuta.

Gain Acknowledgement from Business Analysts

Let’s change hats from the data engineer or architect to the business analyst that wants to consume the data. When a business analyst requests access to the project that contains the data source “OHIO_LICENSED_HEALTHCARE_FACILITIES”, that person must agree to use the data set for the stated purpose of the project, refrain from sharing data outside of that project, and not re-identify or take any steps to re-identify individual health information. This combination of steps serves as official acknowledgement of HIPAA Safe Harbor policy compliance.

Access the De-identified Data Set

The business analyst has agreed to not attempt to re-identify individual health information in the previous step. But for the sake of this article, let’s assume he or she does not read anything upon clicking, and then tries to re-identify personally identifiable information at medical facilities in Columbus, Ohio with “CL” code with an effective date of 4/1/19, based on an article from the Columbus Dispatch. Re-identification would be attempted by exporting data to the Tableau desktop and adding [effectivedate] to the details filter by the date. But the policy has masked the effective date to mitigate risk from re-identification that was enforced technically by the automated global policy when drilling into the data.

Note that this data is being accessed with HIPAA Safe Harbor policies being enforced on-read, without copying any data. The identifiers in the data set remain in the database if access is required for different purposes.

Prove Compliance with Audit Reports/Logs

When your compliance team or an auditor needs to understand more about the interaction, Immuta’s audit log and reporting capabilities provide instant evidence of compliance with the Safe Harbor policy. The extract query from Tableau was logged at 23 Apr 2020 11:33:06 -0400 under the purpose of “Re-identification Prohibited”. If you click into the log, you can see the user details and understand what data was accessed and policies applied for de-identification.

Additional Resources

The prebuilt Safe Harbor data policy demonstrated in this article was a collaborative effort by our legal and software engineering teams. Legal engineers are lawyers with deep expertise in regulations that track regulations and map them to Immuta capabilities to manage the impacts on data architects and data engineers managing data. In contrast, I don’t claim to be an expert on healthcare regulation and you may catch me spelling HIPPA, rather than HIPAA. This combination (sans spelling mistakes) is increasingly common as the healthcare industry continues to innovate.

The HIPAA Safe Harbor Global Policy can also be used to protect data sets on the AWS Data Exchange.

If you find value in this approach, you can request a demo:

Request a Demo

Or if you are interested in learning more about the automated HIPAA Safe Harbor Policy available in Immuta, check out the documentation. If interested in learning more about how we can help with HIPAA Expert Determination, please contact us.

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.