πŸ’«EMR SaaS Installation

Summary

This guide shows how to grant DataFlint read-only access to Amazon EMR metadata via cross-account IAM role assumption. It applies to EMR on EC2, EMR Serverless, and EMR on EKS (IAM called this "EMR Containers").

For the broader SaaS threat model and stability notes, see SaaS Security & Stability.

You will:

  1. Create a dedicated IAM role in your AWS account.

  2. Add a trust policy that allows the DataFlint service role to assume it.

  3. Attach a minimal read-only policy for EMR / EMR Containers APIs.

  4. Share the role ARN + regions with DataFlint.

The entire process should take a few minutes.

circle-info

You need to repeat this per AWS account you want DataFlint to read from.

circle-exclamation

What DataFlint needs

Send DataFlint:

  • Role ARN you created (one per AWS account).

  • Region(s) where you run EMR (and/or EMR on EKS / EMR Containers).

  • The role name (optional, helps troubleshooting).

How it works

DataFlint assumes the role you create and calls read-only EMR APIs to:

  • Discover clusters / virtual clusters.

  • List and describe job runs and steps.

  • Fetch application UI links when applicable (read-only).

circle-info

AWS classifies a few EMR β€œUI helper” APIs as write actions (for example elasticmapreduce:CreatePersistentAppUI). DataFlint uses them only to generate read-only UI access links.

Required IAM permissions (minimal)

Use a dedicated policy attached to the role. This is the minimal set we currently require for EMR + EMR Containers read access:

circle-info

If you want to scope down further (by region, tags, or resource ARNs), tell us your constraints and we’ll help tighten it.

Installation

Pick one method. All methods create the same resources.

Step 1: Create the IAM policy

  1. Open IAM β†’ Policies.

  2. Click Create policy.

  3. Choose JSON.

  4. Paste the minimal policy from Required IAM permissions.

  5. Click Next.

  6. Policy name: DataflintEmrContainersReadOnly

  7. Create the policy.

Step 2: Create the IAM role

  1. Open IAM β†’ Roles.

  2. Click Create role.

  3. Trusted entity type: AWS account.

  4. Select Another AWS account.

  5. Account ID: DATAFLINT_ACCOUNT_ID (get it from DataFlint).

  6. Enable Require external ID.

  7. External ID: CUSTOMER_EXTERNAL_ID (get it from DataFlint).

  8. In Add permissions, attach DataflintEmrContainersReadOnly.

  9. Role name: dataflint-emr-read-only-role

  10. Create the role.

Step 3: Update the trust policy (Principal role)

In some accounts, the UI creates the trust policy with the root principal. We require the DataFlint service role principal instead.

  1. Open the new role.

  2. Go to Trust relationships β†’ Edit trust policy.

  3. Use this trust policy (replace placeholders):

Step 4: Copy the role ARN

Open the role summary and copy the ARN. You’ll share it with DataFlint.

You can validate that the role exists and has the expected policies attached.

You can also validate permissions using IAM simulation:

Send the details to DataFlint

Share over your approved secure channel:

  • Role ARN

  • Regions for EMR / EMR Containers

Last updated