π¦Spark on K8s SaaS Installation
Summary
This guide shows how to grant DataFlint read-only access to Spark event logs for Spark on Kubernetes deployments, or any other Spark deployment that has Spark Event Logs in S3.
You will:
Find your Spark event log location (
spark.eventLog.dir).Add an S3 bucket policy that lets DataFlint read that location.
Add the bucket + path in the DataFlint UI (one config per location).
The entire process should take a few minutes.
This page documents the SaaS model where DataFlint reads event logs from your object store. For BYOC installations, the DataFlint AWS account / role can be different.
What DataFlint needs
Send DataFlint (or fill in the UI):
Region of the bucket.
Bucket name.
Path/prefix inside the bucket (from
spark.eventLog.dir).
Each bucket + path is a separate configuration. If you have multiple clusters or environments with different spark.eventLog.dir, add each one.
How it works
Spark writes event logs to the directory configured in spark.eventLog.dir.
DataFlint reads those logs (read-only) and builds run summaries and insights.
Common S3 layouts:
Step 1: Find your spark.eventLog.dir
spark.eventLog.dirYou need the exact bucket + prefix that Spark writes to.
Pick one way:
Spark UI β Environment tab.
Look for
spark.eventLog.dir.
Your Spark submit / operator manifest.
Look for
--conf spark.eventLog.dir=...Or
spec.sparkConf.spark.eventLog.dir: ...(Spark Operator).
Your Spark defaults (
spark-defaults.conf).Look for
spark.eventLog.dir ...
Translate spark.eventLog.dir to bucket + path
spark.eventLog.dir to bucket + pathIf your value looks like:
Then:
Bucket:
my-spark-eventsPath/prefix:
prod/spark-events/
If you use a History Server, you might see the same path configured as spark.history.fs.logDirectory. In most setups it matches spark.eventLog.dir.
Step 2: Allow DataFlint to read the bucket (S3 bucket policy)
DataFlint reads Spark event logs via a dedicated role in the DataFlint AWS account.
Use the same principal as in EMR SaaS Installation:
DataFlint AWS account ID:
975050001706DataFlint service role:
arn:aws:iam::975050001706:role/eks-dataflint-service-role
For BYOC installations, this principal can be different. If youβre not sure, ask DataFlint for the correct AWS account ID / role ARN.
Minimal bucket policy statement (recommended)
Add this policy (or merge its statements) into the bucket policy of the bucket used by spark.eventLog.dir.
Replace:
YOUR_BUCKET_NAMEYOUR_PREFIX(no leading/)
In S3, βlist objectsβ is done via s3:ListBucket (there is no separate ListObject action).
Optional: restrict listing to only the event-log prefix
The policy above grants s3:ListBucket on the bucket. To restrict listing to the specific prefix, add this to the statement:
If the bucket uses SSE-KMS, you also need to allow kms:Decrypt for the KMS key.
Apply the bucket policy
Open S3 β Buckets β YOUR_BUCKET_NAME.
Go to Permissions.
Under Bucket policy, click Edit.
Paste or merge the policy from above.
Click Save changes.
Fetch the current bucket policy:
Update the bucket policy from a local file:
Step 3: Add the event log locations in DataFlint
Create one configuration per spark.eventLog.dir location.
Youβll typically provide:
Cloud: AWS
Region
Bucket
Path / prefix
Example (using the screenshot)
Paste the bucket name from
spark.eventLog.dir.Paste the path/prefix from
spark.eventLog.dir.Choose the region where the bucket lives.
Add the AWS account ID where the bucket lives.
Add another config if you have another bucket/prefix.

spark.eventLog.dir.Misc
Optional: add an EKS role
You can also add an EKS (Kubernetes) role integration. It lets DataFlint enrich runs with more cluster metadata.
Ask DataFlint for the exact role requirements for your setup.
Optional: SQS-based ingestion
For high volume environments, you can enable SQS-based ingestion. It reduces S3 listing and can speed up discovery of new event logs.
Ask DataFlint for the S3 β SQS notification setup that matches your bucket layout.
Last updated