How to Run SageMaker Jobs with Immuta + Amazon S3 Access Grants

Sam Carroll, Solutions Architect, Alliances, Immuta

In today’s competitive landscape, leveraging AI and machine learning tools has become a mandate for virtually every organization. According to one survey, 79% of organizations have increased their budgets for AI systems, applications, and development in the last 12 months.

During the fast-paced adoption and implementation of AI and ML, however, data security becomes much more complicated. To ensure effective AI security while keeping up with the speed of innovation, you need security methods that seamlessly integrate with both your data storage and AI/ML platforms.

Immuta’s integration with Amazon S3 Access Grants enables AWS users to build on existing security controls, allowing downstream data consumers to safely leverage S3 data across cloud platforms. In this guide, we’ll demonstrate how to securely use S3 data to build, train, and deploy AI and ML tools in Amazon SageMaker.

Managing Amazon S3 Access at Scale

Managing data access and security is not always straightforward. Whoever owns data access management responsibilities – whether central IT, data engineering, or another team – may bottleneck access requests and slow time-to-data. Reliance on traditional policy creation can become cumbersome and difficult to manage at scale, leading to policy bloat, security gaps, and inefficiencies.

AWS launched its IAM Identity Center and S3 Access Grants capabilities to facilitate data access with standard controls for S3 data. With Immuta, users are enabled with additional dynamic capabilities to streamline data access and management across various S3 buckets, especially at a rapidly growing scale.

Many modern teams leverage data stored in Amazon S3 to train and inform their business-critical AI objectives. Amazon SageMaker – a cloud-based machine learning platform used for building, training, and deploying AI & ML models – requires efficient access to this S3 data in order to operate at its fullest capacity. With secure access to S3 data, SageMaker can be leveraged to train and deploy new AI tools.

Immuta + S3 Access Grants for SageMaker Jobs

Immuta integrates with Amazon S3 to bolster the capabilities of S3 Access Grants, allowing you to map object access to users or IAM roles based on user and object attributes, and centralize access management across your S3 buckets.

In the following scenario, we’ll demonstrate how to leverage Immuta’s Amazon S3 integration to facilitate efficient, secure access to S3 data for SageMaker jobs.

1. Set Up Your S3 Bucket

To start, you’ll need to create a bucket in Amazon S3. In this example, we’ve created a bucket called “samc-s3-access-grants-demo,” with sub-folders for “finance,” “hr,” and “medical” S3 data sets.

If you were to tab over to the S3 Access Grants page, you will see that an Access Grants instance has been automatically created by Immuta through our integration. That said, no grants will have been created at this point in the process.

2. Register the S3 Bucket with Immuta

Once you’ve created a bucket, it will be registered automatically with Immuta. If you log into Immuta, you’ll see that the bucket – along with its folder-specific prefixes – has been registered with the platform.

3. Tag Your S3 Data in Immuta

Next, you’ll tag the S3 data appropriately in Immuta. You can tag the Finance subfolder as a “Company Hierarchy.Finance” tag. Tagging this S3 data allows Immuta to map user attributes to the underlying S3 objects.

Immuta also allows you to push tags on data from external systems, such as Amazon Macie, for additional data classification.

4. Define Access Policies in Immuta

Now that this S3 folder has been tagged, you can start defining rules that control who has access to this data based on user attributes from the identity system. This dynamic attribute-based access control will allow you to grant access to your Finance users in a scalable, easy-to-understand way.

To get started, build a subscription policy using Immuta’s policy editor. In this case, you will specify that you want any user who is a member of the “Finance” group to be able to access any data in S3 that has a “Finance” tag associated with it.

5. Implement the Subscription Policy

Next, you’ll activate the subscription policy in Immuta. Before enabling this policy, you’ll notice that there are no existing grants available on the S3 Access Grants page of the S3 console.

After enabling the Immuta subscription policy, you will notice that a user has been granted access to the bucket.

This is because the user is a member of the “Finance” group. With the policy applied, they are automatically granted access to the files within the Finance folder in S3.

6. Requesting Temporary Credentials for SageMaker Access

To access this S3 data from a SageMaker notebook, you first need to request temporary credentials from the Access Grants Service. To do this, you can simply provide user information from the AWS command line to your Python library and request temporary credentials.

It’s important to note that your request will be denied if you try to access this bucket without getting the temporary credentials.

7. Access the S3 Bucket in SageMaker

Once you have the required temporary credentials, you can call the S3 list_objects API to deliver the list of files in the Finance folder in your S3 bucket. In this bucket, some data contains transaction-level information. The data has been partitioned by country code, so each file only contains data with the transaction details of its respective countries.

If a user tries to open one of these files, they can now see the data because they are a member of the “Finance” group in the identity system.

Benefits of Using Immuta + S3 Access Grants for SageMaker

By leveraging Immuta with S3 Access Grants, your team is enabled with dynamic controls to ensure security and privacy for your AI-driven objectives. These controls provide:

  • Consistency: Policies are applied across users and buckets, ensuring that there are no gaps in your access management and security.
  • Scalability: Dynamic policies can scale alongside growing buckets, data sources, and users, without creating policy bloat or bottlenecking access with a specific team or individual.
  • Centralized Management: Policies are controlled in one location but applied globally, making their creation, application, and maintenance much easier for your busy teams.

To learn more about Immuta + Amazon S3, check out The Amazon S3 Security & Access Handbook today. You can also watch a recorded walkthrough of this demo here: