βͺRelease Notes

Version 0.2.7

  1. DataFusion Comet support

  2. bug fixes

Version 0.2.6

  1. Nvidia RAPIDS for Spark support

  2. bug fixes

Version 0.2.5

  1. Bug Fixes

Version 0.2.4

  1. New visibility & alerts - Driver's memory

  2. Updated README

  3. Bug Fixes

Version 0.2.3

  1. Driver memory monitoring & alert

  2. Updated readme

  3. Bug fixes

Version 0.2.3

  1. New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function

  2. New alert - Large filter conditions, for wiring long filter conditions instead of using join logic

  3. UI Improvements

Version 0.2.2

  1. Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up

Version 0.2.1

  1. Better Databricks stage to node support

  2. Support spark.dataflint.runId in custom history server providers when appId is not the spark appId

Version 0.2.0

  1. Better support for Databricks Photon plans

  2. Input nodes shows partitions filters and push down filters

  3. Stage Breakdown - press the blue down arrow on sql node to see stage information

Version 0.1.7

  1. Apache Iceberg alerts improvements

  2. Add avg file size in read/write

  3. More information when hovering on stage

Version 0.1.6

  1. Apache Iceberg support

    1. Better node naming

    2. Read metrics and reading small files alerts

    3. Write metrics and overwriting most of table alerts

      1. Require enabling iceberg metric reporter, can be done for you by enabling spark.dataflint.iceberg.autoCatalogDiscovery to true, or setting the iceberg metric reporter manually for each catalog, for example:

        spark.sql.catalog.[catalog name].metrics-reporter-impl org.apache.spark.dataflint.iceberg.DataflintIcebergMetricsReporter

Version 0.1.5

  1. Add support for history server with cluster-mode jobs (i.e. with attempt numbet)

  2. Fix "wasted cores" calculation

  3. Fix status tab SQL is flickering when there is SQL with sub queriers

Version 0.1.4

Fix scala 2.13 support

Version 0.1.3

  1. DataFlint SaaS support

  2. partition Skew Alert:

Version 0.1.2

  1. Scala 2.13 support

  2. A spark flag to disable web app mixpanel telemetries - spark.dataflint.telemetry.enabled(true/false)

  3. Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high

Version 0.1.1

  1. Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!

  2. Minor visual fixes

DataFlint Resource Tab:

Version 0.1.0

  1. Small fix to platform identification

Version 0.0.8

  1. Databricks support

  2. Visual improvements

  3. public release

Version 0.0.7

Heat map

Version 0.0.6

Flint Assistant, require OpenAI Key

Version 0.0.5

Syntax highlighting for SQL plan parts

Calculating container memory usage and using it for GB memory/hour calculations

Version 0.0.4

  1. Minor fix relates to spark operator and nginx

Version 0.0.3

SQL plan modes

IO only, shows only input, joins and output:

Basic mode (default), shows also transformations like filters, aggregations and selects:

Advances, shows repartitions, broadcasts and sorts

Also there is plans informations for:

  1. Joins

  2. Sorts

  3. Selects

  4. Repartitions

Version 0.0.2

DBU calculation instead of core/hour in summary bar

Add memory config to configuration tab

Filter Nodes has condition:

Advanced mode for SQL plan, that also presents shuffle nodes

Additional changes

  1. Support both http and https access with enabling mix-content only on https mode

  2. Support for spark 3.5.X

Version 0.0.1

Initial version, includes:

  1. Status page

  2. Summary page

  3. Configuration Page

  4. Alerts page

Last updated