Grant minimal IAM roles for Dataproc + Cloud Storage.
Generate a JSON key (or use your preferred secret flow).
Share the credentials with DataFlint
The entire process should take a few minutes.
Service account keys are sensitive credentials. Store them like passwords and share them only over an approved secure channel.
What DataFlint needs
Send DataFlint:
Project ID
Region(s) you run Dataproc in
Service account key (JSON)or an agreed alternative credential method
Dataproc temp bucket name(s) if you use custom buckets
How it works
Dataproc writes Spark event logs into a GCS bucket. By default this is the Dataproc temp bucket.
Common layout:
DataFlint reads only these logs and cluster metadata.
Required IAM permissions (minimal)
Grant the DataFlint service account:
Project-level: roles/dataproc.viewer
Bucket-level (on the event-log buckets): roles/storage.objectViewer
If you configured Spark event logs to a custom bucket via spark:spark.eventLog.dir or History Server settings, grant storage.objectViewer on that bucket too.
Installation
Pick one method. All methods create the same resources.
Step 1: Create a service account
Open IAM & Admin β Service Accounts.
Click Create service account.
Use:
Name: dataflint-events-reader
Description: Read-only access to Dataproc Spark event logs
# List candidate temp buckets
gsutil ls | grep dataproc-temp || true
# Grant read-only access (repeat for every relevant bucket)
gsutil iam ch \
serviceAccount:dataflint-events-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com:objectViewer \
gs://dataproc-temp-REGION-PROJECT_NUMBER-SUFFIX
setup-dataflint-dataproc-reader.sh
#!/bin/bash
set -euo pipefail
# EDIT THESE
PROJECT_ID="your-project-id"
REGION="us-central1"
SA_NAME="dataflint-events-reader"
KEY_OUTPUT_PATH="./dataflint-sa-key.json"
SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
echo "Creating service account (if missing)..."
gcloud iam service-accounts create "${SA_NAME}" \
--display-name="DataFlint Spark Events Reader" \
--description="Read-only access to Dataproc Spark event logs" \
--project="${PROJECT_ID}" 2>/dev/null || true
echo "Creating key..."
gcloud iam service-accounts keys create "${KEY_OUTPUT_PATH}" \
--iam-account="${SA_EMAIL}"
echo "Granting Dataproc Viewer..."
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/dataproc.viewer" \
--condition=None \
--quiet
echo "Granting bucket access for dataproc temp buckets in region ${REGION}..."
for bucket in $(gsutil ls | grep "dataproc-temp-${REGION}" || true); do
echo " ${bucket}"
gsutil iam ch "serviceAccount:${SA_EMAIL}:objectViewer" "${bucket}"
done
echo "Done."
echo "Service account: ${SA_EMAIL}"
echo "Key file: ${KEY_OUTPUT_PATH}"