πŸ’ŠDataProc SaaS Installation

Summary

This guide shows how to grant DataFlint read-only access to Spark event logs and DataProc metadata from Google Cloud DataProc.

For the broader SaaS threat model and stability notes, see SaaS Security & Stability.

You will:

  1. Create a dedicated GCP service account.

  2. Grant minimal IAM roles for Dataproc + Cloud Storage.

  3. Generate a JSON key (or use your preferred secret flow).

  4. Share the credentials with DataFlint

The entire process should take a few minutes.

circle-exclamation

What DataFlint needs

Send DataFlint:

  • Project ID

  • Region(s) you run Dataproc in

  • Service account key (JSON) or an agreed alternative credential method

  • Dataproc temp bucket name(s) if you use custom buckets

How it works

Dataproc writes Spark event logs into a GCS bucket. By default this is the Dataproc temp bucket.

Common layout:

DataFlint reads only these logs and cluster metadata.

Required IAM permissions (minimal)

Grant the DataFlint service account:

  • Project-level: roles/dataproc.viewer

  • Bucket-level (on the event-log buckets): roles/storage.objectViewer

circle-info

If you configured Spark event logs to a custom bucket via spark:spark.eventLog.dir or History Server settings, grant storage.objectViewer on that bucket too.

Installation

Pick one method. All methods create the same resources.

Step 1: Create a service account

  1. Open IAM & Admin β†’ Service Accounts.

  2. Click Create service account.

  3. Use:

    • Name: dataflint-events-reader

    • Description: Read-only access to Dataproc Spark event logs

  4. Click Done.

Step 2: Create a service account key (JSON)

  1. Open the service account.

  2. Go to Keys.

  3. Click Add key β†’ Create new key.

  4. Select JSON.

  5. Download and store the file securely.

Step 3: Grant Dataproc viewer permissions

  1. Open IAM & Admin β†’ IAM.

  2. Click Grant access.

  3. Principal: dataflint-events-reader@<PROJECT_ID>.iam.gserviceaccount.com

  4. Role: Dataproc Viewer (roles/dataproc.viewer)

Step 4: Grant bucket read access (event logs)

  1. Open Cloud Storage β†’ Buckets.

  2. Find the Dataproc temp bucket.

    • It usually looks like dataproc-temp-<region>-<project-number>-...

  3. Open Permissions.

  4. Click Grant access.

  5. Add the same principal as above.

  6. Role: Storage Object Viewer (roles/storage.objectViewer)

Use the key to test that listing clusters and reading objects works.

circle-exclamation

Last updated