Blog
Data Owners: Privacy is YOUR Problem

Data Owners: Privacy is YOUR Problem

STEPHEN BAILEY SHARE ON SOCIAL MEDIA

Published June 23, 2021

Last edited: November 4, 2024

Share this article

Data has been called “the new gold” for its ability to transform and automate business processes; it has also been called “the new uranium” for its ability to violate the human right to privacy on a massive scale. And just as nuclear engineers could effortlessly enumerate fundamental differences in gold and uranium, so too must data engineers learn to instinctively identify and separate dangerous data from the benign.

Take, for instance, the famous “link attack” that re-identified the medical records of several high-profile patients of Massachusetts General Hospital. In 1997, MGH released about 15,000 records in which names and patient IDs had been stripped from the database. Despite the precautions, Harvard researcher Latanya Sweeney was able to connect publicly available voter information to these anonymized medical records by joining them on three indirect identifiers: zip codes, birthdays, and genders. This left Sweeney, with only a handful of records to sift through, to re-identify many individuals — most notably, the Massachusetts governor’s patient records.

More than twenty years later, every business is an MGH and every person with internet access is a potential Latanya Sweeney. Yet, we all want a world where data is handled responsibly, shared cautiously, and leveraged only for the right purposes. Our greatest limitation in realizing that world is not one of possibility but responsibility; it’s not a question of “How?” but “Who?”.

I believe data engineers must be the ones to take ownership of the problem and lead. Controlling the re-identifiability of records in a single dashboard is good analytics hygiene, but preserving privacy in the platform delivering the data is crucial. Managing privacy loss is a systemic problem demanding systemic solutions — and data engineers build the systems.

The mandate to protect privacy does not translate to a boring exercise in implementing business logic; it presents exciting new technical challenges. How can we quantify the degree of privacy protection we are providing? How can we rebuild data products — and guarantee they still function — after an individual requests that their data be deleted? How can we translate sprawling legal regulations into comprehensible data policies while satisfying data-hungry consumers?

We will need to formulate a new set of engineering best practices that extends beyond the familiar domains of security and system design. Determining what is best practice requires much practice, though. It is essential that engineering leaders push their teams to understand and address the pertinent issues: the strengths and weaknesses of data masking, techniques like k-anonymity and differential privacy, and emerging technologies such as federated learning. Ultimately, data engineers should know the practice of privacy by design as intuitively as they do the principle of least privilege.

The alternative, if history is any guide, is a world in which institutions publish “anonymized” data to the world, and clever people and organizations reconstruct and repurpose private data for their own ends. Managing privacy, far from being an abstract concept for just philosophers and lawyers, has become a concrete problem perfectly suited for data engineers. It’s time they made it their own.

To read more about this topic and other essential tips for data engineers, check out
97 Things Every Data Engineer Should Know, available here.

Stephen Bailey is Director of Data & Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy.

Immuta’s Data Team Shares 8 Great Resources for Data Engineers

Many people fall into data engineering by accident. Software engineers may find that they enjoy building platforms to drive their company’s data initiatives; data scientists may find they need to get “dirty” to deliver insights at scale. What they have in common is that there’s always something new to learn...

3 Best Practices for Maximizing Data Management Efficiency

In 2020, global spending on cloud data services reached $312 billion. In 2022, Gartner estimates that this number will rise to a staggering $482 billion. This immense increase proves that the migration to and adoption of cloud platforms is the bona fide standard for contemporary information services and analysis. With...

A Guide to Enabling Inter-Domain Data Sharing

For many, the appeal of a decentralized data architecture relates to its potential for enhanced collaboration. But to achieve this kind of streamlined collaboration, your team must first establish a system of secure, self-service domains. In a previous blog, we explored how to make decentralized data mesh architectures a reality based on phData’s...

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.

Book a demo

Platform Services

Metadata Registry

Data Discovery & Classification

Policy Entitlement Engine

Unified Audit

Data Domains

Apps

Data Marketplace

Data Access Governance

Ecosystem Partners

Native and API Integrations

Get Started

Take a tour of Access Governance

Take a tour of Data Marketplace

Schedule a live demo

Find a consulting partner

Data problems we solve

Unify data access control

Publish & find data products

Create & enforce policy

Monitor & audit data usage

Speed business innovation

Roles we empower

Data Product Owner

Data Consumer

Data Steward

Data Governor

Data IT

Industries we transform

Financial Services

Health & Life Sciences

Public Sector

The E-Trade Moment for Data

Get in the know

Blog

Resource Center

Data Fundamentals

Get a deeper look

Demo Hub

How-To Guides

Schedule a Live Demo

Get connected

Events & Webinars

Sign Up for Our Newsletter

Get support

Documentation

Customer Support

Get inspired

About us

Who We Are

Leadership

Customers

Partners

News

Connect with us

Careers

Upcoming Events

Contact Us

Customer spotlight