⏪Release Notes

Version 0.4.2

Supports partition pruning

Enriching filter/select nodes with the UDF python function names:

Add support for Generate node - explode, inline and more!

Version 0.4.1

Support newer DBX versions - 14 and up.

Version 0.4.0

Support map by pandas and arrow functions

Added new flag to silence alert for a job -

spark.dataflint.alert.disabled
Which accepts a column seperated list of alerts such as:
smallTasks,idleCoresTooHigh

Added short recommendation on top of alert
Updated DataFlint logo
Support better stage identifications for varios readers
Shows stage failures with an orange V on sql node and list of complete stage failures

Version 0.3.2

New alerts - cross joins, change join to broadcast, large partition size
Better support for cross joins
Additional metrics for joins and shuffles

Version 0.3.1

Support Distinct, skewed join, coalesce nodes
Sorting by default by duration on history server mode
Filter sql's without any job/stage automatically, add a switch to show them.
Add rows filtered percentage for filter and distinct sql nodes

Version 0.3.0

Support for window functions
Better support for databricks
Detecting resource configuration better in configuration tab
Bug and UI fixes

Version 0.2.7

DataFusion Comet support
bug fixes

Version 0.2.6

Nvidia RAPIDS for Spark support
bug fixes

Version 0.2.5

Bug Fixes

Version 0.2.4

New visibility & alerts - Driver's memory
Updated README
Bug Fixes

Version 0.2.3

Driver memory monitoring & alert
Updated readme
Bug fixes

Version 0.2.3

New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function
New alert - Large filter conditions, for wiring long filter conditions instead of using join logic
UI Improvements

Version 0.2.2

Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up

Version 0.2.1

Better Databricks stage to node support
Support spark.dataflint.runId in custom history server providers when appId is not the spark appId

Version 0.2.0

Better support for Databricks Photon plans
Input nodes shows partitions filters and push down filters
Stage Breakdown - press the blue down arrow on sql node to see stage information
New alert - large number of small tasks (see Large Number Of Small Tasks)

Version 0.1.7

Apache Iceberg alerts improvements
Add avg file size in read/write
More information when hovering on stage

Version 0.1.6

Apache Iceberg support
1. Better node naming
2. Read metrics and reading small files alerts
3. Write metrics and overwriting most of table alerts
  1. Require enabling iceberg metric reporter, can be done for you by enabling spark.dataflint.iceberg.autoCatalogDiscovery to true, or setting the iceberg metric reporter manually for each catalog, for example:
    spark.sql.catalog.[catalog name].metrics-reporter-impl org.apache.spark.dataflint.iceberg.DataflintIcebergMetricsReporter

Version 0.1.5

Add support for history server with cluster-mode jobs (i.e. with attempt numbet)
Fix "wasted cores" calculation
Fix status tab SQL is flickering when there is SQL with sub queriers

Version 0.1.4

Fix scala 2.13 support

Version 0.1.3

DataFlint SaaS support
partition Skew Alert:

Version 0.1.2

Scala 2.13 support
A spark flag to disable web app mixpanel telemetries - spark.dataflint.telemetry.enabled(true/false)
Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high

Version 0.1.1

Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!
Minor visual fixes

DataFlint Resource Tab: