✨
Dataflint for Spark
Github
  • πŸ‘‹Welcome to DataFlint
  • Overview
    • ✨Our Features
    • πŸ—ΊοΈHow It Works
    • πŸ”“Security & Stability
    • πŸ” Supported Versions
    • βͺRelease Notes
  • Getting Started
    • ⭐Install on Spark
    • 🧱Install on DataBricks
    • πŸ“ͺInstall on Spark History Server
    • 🏹Usage
  • ⛓️Integrations
    • 🧊Apache Iceberg
  • 🏹Advanced
    • πŸ’°DCU calculation
    • ⏰Alerts
  • πŸ‘¨β€πŸ’ΌSaaS
    • πŸ”’SaaS Security & Stability
Powered by GitBook
On this page
  • Version 0.4.0
  • Version 0.3.2
  • Version 0.3.1
  • Version 0.2.7
  • Version 0.2.6
  • Version 0.2.5
  • Version 0.2.4
  • Version 0.2.3
  • Version 0.2.3
  • Version 0.2.2
  • Version 0.2.1
  • Version 0.2.0
  • Version 0.1.7
  • Version 0.1.6
  • Version 0.1.5
  • Version 0.1.4
  • Version 0.1.3
  • Version 0.1.2
  • Version 0.1.1
  • Version 0.1.0
  • Version 0.0.8
  • Version 0.0.7
  • Version 0.0.6
  • Version 0.0.5
  • Version 0.0.4
  • Version 0.0.3
  • Version 0.0.2
  • Version 0.0.1
  1. Overview

Release Notes

Last updated 18 hours ago

Version 0.4.0

  1. Support map by pandas and arrow functions

  2. Added new flag to silence alert for a job -

    spark.dataflint.alert.disabled
    Which accepts a column seperated list of alerts such as:
    smallTasks,idleCoresTooHigh
    
  3. Added short recommendation on top of alert

  4. Updated DataFlint logo

  5. Support better stage identifications for varios readers

  6. Shows stage failures with an orange V on sql node and list of complete stage failures

Version 0.3.2

  1. New alerts - cross joins, change join to broadcast, large partition size

  2. Better support for cross joins

  3. Additional metrics for joins and shuffles

Version 0.3.1

  1. Support Distinct, skewed join, coalesce nodes

  2. Sorting by default by duration on history server mode

  3. Filter sql's without any job/stage automatically, add a switch to show them.

  4. Add rows filtered percentage for filter and distinct sql nodes

Version 0.3.0

  1. Support for window functions

  2. Better support for databricks

  3. Detecting resource configuration better in configuration tab

  4. Bug and UI fixes

Version 0.2.7

  1. DataFusion Comet support

  2. bug fixes

Version 0.2.6

  1. Nvidia RAPIDS for Spark support

  2. bug fixes

Version 0.2.5

  1. Bug Fixes

Version 0.2.4

  1. New visibility & alerts - Driver's memory

  2. Updated README

  3. Bug Fixes

Version 0.2.3

  1. Driver memory monitoring & alert

  2. Updated readme

  3. Bug fixes

Version 0.2.3

  1. New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function

  2. New alert - Large filter conditions, for wiring long filter conditions instead of using join logic

  3. UI Improvements

Version 0.2.2

  1. Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up

Version 0.2.1

  1. Better Databricks stage to node support

  2. Support spark.dataflint.runId in custom history server providers when appId is not the spark appId

Version 0.2.0

  1. Better support for Databricks Photon plans

  2. Input nodes shows partitions filters and push down filters

  3. Stage Breakdown - press the blue down arrow on sql node to see stage information

Version 0.1.7

  1. Apache Iceberg alerts improvements

  2. Add avg file size in read/write

  3. More information when hovering on stage

Version 0.1.6

  1. Apache Iceberg support

    1. Better node naming

    2. Read metrics and reading small files alerts

    3. Write metrics and overwriting most of table alerts

      1. Require enabling iceberg metric reporter, can be done for you by enabling spark.dataflint.iceberg.autoCatalogDiscovery to true, or setting the iceberg metric reporter manually for each catalog, for example:

        spark.sql.catalog.[catalog name].metrics-reporter-impl org.apache.spark.dataflint.iceberg.DataflintIcebergMetricsReporter

Version 0.1.5

  1. Add support for history server with cluster-mode jobs (i.e. with attempt numbet)

  2. Fix "wasted cores" calculation

  3. Fix status tab SQL is flickering when there is SQL with sub queriers

Version 0.1.4

Fix scala 2.13 support

Version 0.1.3

  1. DataFlint SaaS support

  2. partition Skew Alert:

Version 0.1.2

  1. Scala 2.13 support

  2. A spark flag to disable web app mixpanel telemetries - spark.dataflint.telemetry.enabled(true/false)

  3. Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high

Version 0.1.1

  1. Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!

  2. Minor visual fixes

DataFlint Resource Tab:

Version 0.1.0

  1. Small fix to platform identification

Version 0.0.8

  1. Databricks support

  2. Visual improvements

  3. public release

Version 0.0.7

Heat map

Version 0.0.6

Flint Assistant, require OpenAI Key

Version 0.0.5

Syntax highlighting for SQL plan parts

Calculating container memory usage and using it for GB memory/hour calculations

Version 0.0.4

  1. Minor fix relates to spark operator and nginx

Version 0.0.3

SQL plan modes

IO only, shows only input, joins and output:

Basic mode (default), shows also transformations like filters, aggregations and selects:

Advances, shows repartitions, broadcasts and sorts

Also there is plans informations for:

  1. Joins

  2. Sorts

  3. Selects

  4. Repartitions

Version 0.0.2

DBU calculation instead of core/hour in summary bar

Add memory config to configuration tab

Filter Nodes has condition:

Advanced mode for SQL plan, that also presents shuffle nodes

Additional changes

  1. Support both http and https access with enabling mix-content only on https mode

  2. Support for spark 3.5.X

Version 0.0.1

Initial version, includes:

  1. Status page

  2. Summary page

  3. Configuration Page

  4. Alerts page

New alert - large number of small tasks (see )

βͺ
Large Number Of Small Tasks
Replacing entire table only to change 1% of records (1 in a 100)
Show selected fields