Release Notes

Version 0.9.8

Bug fixes: #74 — Fix NullPointerException in Block.code when from_json (or any other CodegenFallback expression) is used under whole-stage codegen with DataFlint instrumentation enabled. TimedWithCodegenExec now reports supportCodegen = false for wrapped operators that contain a CodegenFallback, mirroring Spark's own CollapseCodegenStages check that the transparent wrapper had been hiding. * Compatible with Spark 3.0 → 4.x Hardening: * TimedExec.postRddId now overwrites the rddId metric instead of summing across re-executions of the same plan instance. * TimedExec and TimedWithCodegenExec no longer compare equal — fixes a corner case in plan canonicalization / AQE plan reuse. * executeCollect write-path is now bounds-safe; falls back to the standard path on unexpected plan shapes (vendor write commands, future Spark layouts). * rddId metric switched from a "size" type to plain sum (no longer rendered as bytes — "12 B" — in the SparkUI).

Version 0.9.7

This release adds first-class support for Databricks Runtime 17.3 LTS and newer, and fixes a metric formatting issue that could blank parts of the DataFlint SQL plan view on Databricks.

What changed:

🆕 New artifact: dataflint-spark4-databricks_2.13

Databricks Runtime 17.3 is Spark 4–based but ships javax.servlet instead of the standard jakarta.servlet. The regular dataflint-spark4_2.13 jar was crashing the cluster at startup with NoClassDefFoundError: jakarta/servlet/Servlet.

We now publish a separate jar built for Databricks runtimes:

io.dataflint:dataflint-spark4-databricks_2.13:0.9.7

It’s the same plugin, same spark.plugins class — only the jar coordinate changes.

🛠 Duration metric now displays correctly on Databricks

DataFlint’s TimedExec instrumentation wraps each operator with a "duration" timing metric. On Databricks runtimes, this was previously appearing as a bare number (e.g. 1058) in the Spark UI rather than the expected 5s (1s, 2s, 3s) formatting, which also crashed the DataFlint SQL plan view with Unsupported time unit. Fixed.

The DataFlint UI is also more defensive about unexpected metric formats — it now logs a warning and skips the value rather than blanking the page.

Stock Spark 4.x (non-Databricks)

Unchanged. dataflint-spark4_2.13 continues to be the right artifact for stock Spark 4, EMR, and any other vanilla Spark 4 deployment.

Version 0.9.6

  • Fixed a startup crash on Databricks Runtime 17.3 and up, using spark 4 The DataFlint tab will not be registed

Version 0.9.0

UI and usability

  • Added a YouTube tutorial link in the footer.

  • Added a new version notification.

  • DataFlint now fetches the latest version from Sonatype Central, It shows a chip when a newer version is available. Version checks use semver comparison. The check fails silently if Sonatype Central is unreachable.

  • Fixed duration display for zero values, Nodes with duration=0 now show 0 ms instead of being hidden.

Instrumentation

  • DataFlintRDDUtils now uses a custom RDD instead of mapPartitions for duration timing.

  • The custom RDD captures startTime inside compute() It does this before firstParent.iterator() This now captures eager parent work correctly. That includes operators like SortExec and HashAggregateExec Previously, mapPartitions started timing too late That missed full partition sort work and hash map build time.

  • TimedExec.executeCollect now records per-partition write duration It reconstructs DataWritingCommandExec with the data plan wrapped in RDDTimingWrapper On Spark 3.4+, this happens inside WriteFilesExec On older Spark versions, it wraps the data plan directly The write command then consumes the timed RDD via sparkContext.runJob This captures both data production time and write I/O per partition Previously, write duration used driver-side wall-clock timing That was inconsistent with other per-partition metrics.

  • doProduce now wraps child code in try/finally for blocking operators Duration metrics now flush even when operators exit early through shouldStop() or return This fixes duration=0 in codegen paths for operators like SortExec

  • doProduce now sanitizes ctx.freshNamePrefix This strips non-alphanumeric characters from generated code variable prefixes It fixes invalid Java identifiers for nodes with spaces in nodeName That includes names like Scan ExistingRDD and Execute InsertIntoHadoopFsRelationCommand.

  • Added RDDScanExec to instrumented nodes Scan ExistingRDD nodes now get duration metrics on Spark 3 and Spark 4.

Version 0.8.9

Instrumentation

  • A single generic TimedExec wrapper now replaces 19 per-type DataFlint*Exec classes TimedExec adds a duration metric and an rddId metric.

  • Existing Spark metrics stay intact on the wrapped node The wrapper exposes child.children as its own children The SQL graph stays as one node You do not get double nodes in Spark UI or DataFlint UI.

  • InMemoryTableScanExec and all Exchange nodes are never wrapped.

  • Version-specific nodes are matched by class name string This avoids NoClassDefFoundError on older Spark versions.

  • Join codegen is cancelled where codegen instrumentation does not work.

  • DataWritingCommandExec now gets duration support through doPrepare delegation.

Stage grouping and duration attribution

  • Stage grouping is now topology-based from the SQL plan graph, Exchange boundaries define the stage graph, The result is deterministic across live runs and history server.

  • Stage view now supports Inclusive and Exclusive duration modes, Inclusive shows native Spark metrics as-is Exclusive is the default Exclusive normalizes stage durations to executorRunTime.

  • Attribution mode auto-enables when any instrumented node exists.

  • Exchange read and write durations now come from shuffle metrics.

  • Producer and consumer stages split those durations correctly.

  • Metric reduction is now null-safe throughout the reducer.

Version 0.8.8

  1. Fix for instumenation on Spark 3.1.

  2. Better extraction for Iceberg and bigQuery save operation.

Version 0.8.7

  1. New spark instrumentation - which adds duration metrics for more Python actions and window operation nodes ( spark versions 3.1+) enabled using : spark.dataflint.instrument.spark.window.enabled spark.dataflint.instrument.spark.arrowEvalPython.enabled spark.dataflint.instrument.spark.batchEvalPython.enabled spark.dataflint.instrument.spark.flatMapGroupsInPandas.enabled spark.dataflint.instrument.spark.flatMapCoGroupsInPandas.enabled

Version 0.8.3

  1. New spark instrumentation - spark.dataflint.instrument.spark.mapInArrow.enabled and spark.dataflint.instrument.spark.mapInPandas.enabled, which adds duration metrics to all spark versions 3.3+ with these new operators. So you can know exactly how much a UDF took.

  2. Add full table name to iceberg table reads

Version 0.8.3

Fix to stage identification algorithm Fix to stage read visuals to show hashed/ranges fields Visual improvements to stage sidebar

Version 0.8.2

  • Improvement to sql plan layout with stage nodes

    • Button to switch to plan without stage nodes

  • Improvements to stage identification

Version 0.8.1

  • Small fix for fixing the support of spark version 3.3.X

  • Small fix for shuffle write metrics real time update

Version 0.8.0

✨ New Features New Flow Graph UI with Stage Parent Nodes - Completely redesigned the SQL flow graph visualization with stage parent nodes for better understanding of query execution Task View in Stage Nodes - Added task progress indicators directly within stage nodes for real-time task monitoring "Rows Aggregated" Metric - Added a new metric to track aggregated row counts Exchange Node Separation - Shuffle read and write operations are now displayed separately when 2 stages exist, providing clearer visibility into data exchange operations

🎨 UI Improvements Enhanced visual consistency across UI components Replaced CheckIcon with CancelIcon for clearer error representation Updated FlowLegend to include task progress indicators Improved styling for node elements in SQL flow General SQL plan UI improvements ⚡ Performance Improvements SQLNodeStageReducer Optimization - Implemented O(1) access with lookup maps for nodes and stages, significantly improving efficiency in SQL node stage calculations Smarter Update Cycles - Skip OnCycleEnd calculations when there are no changes in SQL and stages Stage Map for Alerts - Use stage map for faster alert processing

🐛 Bug Fixes Fixed wall clock duration calculation Fixed duration node hover display issue Fixed idle cores bug where idle cores showed 0% when all executors closed (Spark incorrectly detected local mode)

🔧 Other Changes Removed Scala distribution from Spark distribution package

Version 0.7.0

  1. Delta lake collector (experimental)

  2. Stage identification improvements

  3. UI Improvements

Version 0.6.1

  1. Add better delta lake support - delta write command, optimize command, optimize shuffle before write

  2. Add support for SortAggregate nodes

  3. Fix maven dependency issues with spark 3 POM fiels

Version 0.6.0

  1. Spark 4 support

  2. Better stage identification using metrics with statistics

  3. Shrinking metrics text in case of high number of metrics

Version 0.5.1

Visual improvements for the new SQL plan UI

Version 0.5.0

New and updated design for the sql plan nodes

Version 0.4.4

  1. Improvement to query presentation

  2. Support Extended node:

Version 0.4.3

Support query params, like sql-id and node-ids in link

Version 0.4.2

Supports partition pruning

Enriching filter/select nodes with the UDF python function names:

Add support for Generate node - explode, inline and more!

Version 0.4.1

Support newer DBX versions - 14 and up.

Version 0.4.0

  1. Support map by pandas and arrow functions

  2. Added new flag to silence alert for a job -

  3. Added short recommendation on top of alert

  4. Updated DataFlint logo

  5. Support better stage identifications for varios readers

  6. Shows stage failures with an orange V on sql node and list of complete stage failures

Version 0.3.2

  1. New alerts - cross joins, change join to broadcast, large partition size

  2. Better support for cross joins

  3. Additional metrics for joins and shuffles

Version 0.3.1

  1. Support Distinct, skewed join, coalesce nodes

  2. Sorting by default by duration on history server mode

  3. Filter sql's without any job/stage automatically, add a switch to show them.

  4. Add rows filtered percentage for filter and distinct sql nodes

Version 0.3.0

  1. Support for window functions

  2. Better support for databricks

  3. Detecting resource configuration better in configuration tab

  4. Bug and UI fixes

Version 0.2.7

  1. DataFusion Comet support

  2. bug fixes

Version 0.2.6

  1. Nvidia RAPIDS for Spark support

  2. bug fixes

Version 0.2.5

  1. Bug Fixes

Version 0.2.4

  1. New visibility & alerts - Driver's memory

  2. Updated README

  3. Bug Fixes

Version 0.2.3

  1. Driver memory monitoring & alert

  2. Updated readme

  3. Bug fixes

Version 0.2.3

  1. New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function

  2. New alert - Large filter conditions, for wiring long filter conditions instead of using join logic

  3. UI Improvements

Version 0.2.2

  1. Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up

Version 0.2.1

  1. Better Databricks stage to node support

  2. Support spark.dataflint.runId in custom history server providers when appId is not the spark appId

Version 0.2.0

  1. Better support for Databricks Photon plans

  2. Input nodes shows partitions filters and push down filters

  3. Stage Breakdown - press the blue down arrow on sql node to see stage information

  4. New alert - large number of small tasks (see Large Number Of Small Tasks)

Version 0.1.7

  1. Apache Iceberg alerts improvements

  2. Add avg file size in read/write

  3. More information when hovering on stage

Version 0.1.6

  1. Apache Iceberg support

    1. Better node naming

    2. Read metrics and reading small files alerts

    3. Write metrics and overwriting most of table alerts

      1. Require enabling iceberg metric reporter, can be done for you by enabling spark.dataflint.iceberg.autoCatalogDiscovery to true, or setting the iceberg metric reporter manually for each catalog, for example:

Replacing entire table only to change 1% of records (1 in a 100)

Version 0.1.5

  1. Add support for history server with cluster-mode jobs (i.e. with attempt numbet)

  2. Fix "wasted cores" calculation

  3. Fix status tab SQL is flickering when there is SQL with sub queriers

Version 0.1.4

Fix scala 2.13 support

Version 0.1.3

  1. DataFlint SaaS support

  2. partition Skew Alert:

Version 0.1.2

  1. Scala 2.13 support

  2. A spark flag to disable web app mixpanel telemetries - spark.dataflint.telemetry.enabled(true/false)

  3. Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high

Version 0.1.1

  1. Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!

  2. Minor visual fixes

DataFlint Resource Tab:

Version 0.1.0

  1. Small fix to platform identification

Version 0.0.8

  1. Databricks support

  2. Visual improvements

  3. public release

Version 0.0.7

Heat map

Version 0.0.6

Flint Assistant, require OpenAI Key

Version 0.0.5

Syntax highlighting for SQL plan parts

Show selected fields

Calculating container memory usage and using it for GB memory/hour calculations

Version 0.0.4

  1. Minor fix relates to spark operator and nginx

Version 0.0.3

SQL plan modes

IO only, shows only input, joins and output:

Basic mode (default), shows also transformations like filters, aggregations and selects:

Advances, shows repartitions, broadcasts and sorts

Also there is plans informations for:

  1. Joins

  2. Sorts

  3. Selects

  4. Repartitions

Version 0.0.2

DBU calculation instead of core/hour in summary bar

Add memory config to configuration tab

Filter Nodes has condition:

Advanced mode for SQL plan, that also presents shuffle nodes

Additional changes

  1. Support both http and https access with enabling mix-content only on https mode

  2. Support for spark 3.5.X

Version 0.0.1

Initial version, includes:

  1. Status page

  2. Summary page

  3. Configuration Page

  4. Alerts page

Last updated