# Release Notes

### Version 0.8.9 <a href="#version-0.8.9" id="version-0.8.9"></a>

#### Instrumentation

* A single generic `TimedExec` wrapper now replaces 19 per-type `DataFlint*Exec` classes.
* `TimedExec` adds a `duration` metric and an `rddId` metric.
* Existing Spark metrics stay intact on the wrapped node.
* The wrapper exposes `child.children` as its own children.
* The SQL graph stays as one node.
* You do not get double nodes in Spark UI or DataFlint UI.
* `InMemoryTableScanExec` and all `Exchange` nodes are never wrapped.
* Version-specific nodes are matched by class name string.
* This avoids `NoClassDefFoundError` on older Spark versions.
* Join codegen is cancelled where codegen instrumentation does not work.
* `DataWritingCommandExec` now gets duration support through `doPrepare` delegation.

#### Stage grouping and duration attribution

* Stage grouping is now topology-based from the SQL plan graph.
* `Exchange` boundaries define the stage graph.
* The result is deterministic across live runs and history server.
* Stage view now supports **Inclusive** and **Exclusive** duration modes.
* **Inclusive** shows native Spark metrics as-is.
* **Exclusive** is the default.
* **Exclusive** normalizes stage durations to `executorRunTime`.
* Attribution mode auto-enables when any instrumented node exists.
* Exchange read and write durations now come from shuffle metrics.
* Producer and consumer stages split those durations correctly.
* Metric reduction is now null-safe throughout the reducer.

### Version 0.8.8 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Fix for instumenation on Spark 3.1.
2. Better extraction for Iceberg and bigQuery save operation.

### Version 0.8.7 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. New spark instrumentation - which adds duration metrics for more Python actions and window operation nodes ( spark versions 3.1+) enabled using : spark.dataflint.instrument.spark.window\.enabled\
   spark.dataflint.instrument.spark.arrowEvalPython.enabled\
   spark.dataflint.instrument.spark.batchEvalPython.enabled\
   spark.dataflint.instrument.spark.flatMapGroupsInPandas.enabled spark.dataflint.instrument.spark.flatMapCoGroupsInPandas.enabled

### Version 0.8.3 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. New spark instrumentation - spark.dataflint.instrument.spark.mapInArrow\.enabled and spark.dataflint.instrument.spark.mapInPandas.enabled, which adds duration metrics to all spark versions 3.3+ with these new operators. So you can know exactly how much a UDF took.
2. Add full table name to iceberg table reads

### Version 0.8.3 <a href="#version-0.3.0" id="version-0.3.0"></a>

Fix to stage identification algorithm\
Fix to stage read visuals to show hashed/ranges fields\
Visual improvements to stage sidebar

### Version 0.8.2 <a href="#version-0.3.0" id="version-0.3.0"></a>

* Improvement to sql plan layout with stage nodes
  * Button to switch to plan without stage nodes
* Improvements to stage identification

### Version 0.8.1 <a href="#version-0.3.0" id="version-0.3.0"></a>

* Small fix for fixing the support of spark version 3.3.X
* Small fix for shuffle write metrics real time update

### Version 0.8.0 <a href="#version-0.3.0" id="version-0.3.0"></a>

✨ New Features\
New Flow Graph UI with Stage Parent Nodes - Completely redesigned the SQL flow graph visualization with stage parent nodes for better understanding of query execution\
Task View in Stage Nodes - Added task progress indicators directly within stage nodes for real-time task monitoring\
"Rows Aggregated" Metric - Added a new metric to track aggregated row counts\
Exchange Node Separation - Shuffle read and write operations are now displayed separately when 2 stages exist, providing clearer visibility into data exchange operations

🎨 UI Improvements\
Enhanced visual consistency across UI components\
Replaced CheckIcon with CancelIcon for clearer error representation\
Updated FlowLegend to include task progress indicators\
Improved styling for node elements in SQL flow\
General SQL plan UI improvements\
⚡ Performance Improvements\
SQLNodeStageReducer Optimization - Implemented O(1) access with lookup maps for nodes and stages, significantly improving efficiency in SQL node stage calculations\
Smarter Update Cycles - Skip OnCycleEnd calculations when there are no changes in SQL and stages\
Stage Map for Alerts - Use stage map for faster alert processing

🐛 Bug Fixes\
Fixed wall clock duration calculation\
Fixed duration node hover display issue\
Fixed idle cores bug where idle cores showed 0% when all executors closed (Spark incorrectly detected local mode)

🔧 Other Changes\
Removed Scala distribution from Spark distribution package

### Version 0.7.0 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Delta lake collector (experimental)
2. Stage identification improvements
3. UI Improvements

### Version 0.6.1 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Add better delta lake support - delta write command, optimize command, optimize shuffle before write
2. Add support for SortAggregate nodes
3. Fix maven dependency issues with spark 3 POM fiels<br>

### Version 0.6.0 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Spark 4 support
2. Better stage identification using metrics with statistics
3. Shrinking metrics text in case of high number of metrics

### Version 0.5.1 <a href="#version-0.3.0" id="version-0.3.0"></a>

Visual improvements for the new SQL plan UI

### Version 0.5.0 <a href="#version-0.3.0" id="version-0.3.0"></a>

New and updated design for the sql plan nodes

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FD62vQJ2DEPM4eQOsKrSo%2Fimage.png?alt=media&#x26;token=926a13d5-7160-4e26-8aef-1a8cb10e709d" alt=""><figcaption></figcaption></figure>

### Version 0.4.4 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Improvement to query presentation
2. Support Extended node:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2Fvt6s3ZkFkaeoI4Kr2YBA%2Fimage.png?alt=media&#x26;token=72e7d6ea-557b-4c33-a7de-ab6974af21b9" alt=""><figcaption></figcaption></figure>

### Version 0.4.3 <a href="#version-0.3.0" id="version-0.3.0"></a>

Support query params, like sql-id and node-ids in link

### Version 0.4.2 <a href="#version-0.3.0" id="version-0.3.0"></a>

Supports partition pruning

Enriching filter/select nodes with the UDF python function names:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FvgZDaa1KJXDiMowzEJ3W%2Fimage.png?alt=media&#x26;token=12adc961-b007-436d-a614-2e8dcbc17b38" alt=""><figcaption></figcaption></figure>

Add support for Generate node - explode, inline and more!

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FLyLKV44RZd9jeTQqiiYJ%2Fimage.png?alt=media&#x26;token=8349652a-477e-4741-9ad0-c264ebc8a5dc" alt=""><figcaption></figcaption></figure>

### Version 0.4.1 <a href="#version-0.3.0" id="version-0.3.0"></a>

Support newer DBX versions - 14 and up.

### Version 0.4.0 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Support map by pandas and arrow functions
2. Added new flag to silence alert for a job -

   ```
   spark.dataflint.alert.disabled
   Which accepts a column seperated list of alerts such as:
   smallTasks,idleCoresTooHigh

   ```
3. Added short recommendation on top of alert
4. Updated DataFlint logo
5. Support better stage identifications for varios readers
6. Shows stage failures with an orange V on sql node and list of complete stage failures

### Version 0.3.2 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. New alerts - cross joins, change join to broadcast, large partition size
2. Better support for cross joins
3. Additional metrics for joins and shuffles

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FpO2asUeALtWyrGVLvsPo%2Fimage.png?alt=media&#x26;token=f8d08936-334e-4fa6-9232-fa01c9670007" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FGcy2lxSuFZ5nFrstV7VH%2Fimage.png?alt=media&#x26;token=bc9ac1d3-fe9e-49df-9190-5b38d9a4a98d" alt=""><figcaption></figcaption></figure>

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FVt7UhJ2EilqKM2NeJnPh%2Fimage.png?alt=media&#x26;token=5cc2ba3b-3cc1-4bbf-979b-e3eb8a469632" alt=""><figcaption></figcaption></figure>

### Version 0.3.1 <a href="#version-0.3.0" id="version-0.3.0"></a>

1. Support Distinct, skewed join, coalesce nodes
2. Sorting by default by duration on history server mode
3. Filter sql's without any job/stage automatically, add a switch to show them.
4. Add rows filtered percentage for filter and distinct sql nodes

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FAsDTx9zxwS0jkCNvbi2J%2Fimage.png?alt=media&#x26;token=0c7ab3ec-ac47-42b0-bf96-ce8c506618f1" alt="" width="563"><figcaption></figcaption></figure>

Version 0.3.0

1. Support for window functions
2. Better support for databricks
3. Detecting resource configuration better in configuration tab
4. Bug and UI fixes

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FJz58gpY8rv2lBUTtKiyi%2Fimage.png?alt=media&#x26;token=12b792d2-5978-4012-a3c1-b7a6a6694950" alt="" width="292"><figcaption></figcaption></figure>

## Version 0.2.7

1. DataFusion Comet support
2. bug fixes

## Version 0.2.6

1. Nvidia RAPIDS for Spark support
2. bug fixes

## Version 0.2.5

1. Bug Fixes

## Version 0.2.4

1. New visibility & alerts - Driver's memory
2. Updated README
3. Bug Fixes

## Version 0.2.3

1. Driver memory monitoring & alert
2. Updated readme
3. Bug fixes

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F9eN2Rb02x7MYKTaGBjsF%2Fimage.png?alt=media&#x26;token=20f22e66-a888-4e88-8e00-f8a70fb1baf9" alt="" width="325"><figcaption></figcaption></figure>

## Version 0.2.3

1. New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function
2. New alert - Large filter conditions, for wiring long filter conditions instead of using join logic
3. UI Improvements

## Version 0.2.2

1. Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up

## Version 0.2.1

1. Better Databricks stage to node support
2. Support spark.dataflint.runId in custom history server providers when appId is not the spark appId

## Version 0.2.0

1. Better support for Databricks Photon plans
2. Input nodes shows partitions filters and push down filters
3. Stage Breakdown - press the blue down arrow on sql node to see stage information
4. New alert - large number of small tasks (see [#large-number-of-small-tasks](https://dataflint.gitbook.io/dataflint-for-spark/advanced/alerts#large-number-of-small-tasks "mention"))

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FSh0P7NGaiAVUlnbKdmLc%2Fimage.png?alt=media&#x26;token=0aa1d85b-51bb-4517-ad12-9be7e672f682" alt=""><figcaption></figcaption></figure>

## Version 0.1.7

1. Apache Iceberg alerts improvements
2. Add avg file size in read/write
3. More information when hovering on stage

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FJ4WzeWUOaBLP3GZ9ZxCi%2Fimage.png?alt=media&#x26;token=a8e16eb4-d045-4ce9-be79-42a86f1e592d" alt=""><figcaption></figcaption></figure>

## Version 0.1.6

1. Apache Iceberg support
   1. Better node naming
   2. Read metrics and reading small files alerts
   3. Write metrics and overwriting most of table alerts
      1. Require enabling iceberg metric reporter, can be done for you by enabling **spark.dataflint.iceberg.autoCatalogDiscovery** to true, or setting the iceberg metric reporter manually for each catalog, for example:

         ```
         spark.sql.catalog.[catalog name].metrics-reporter-impl org.apache.spark.dataflint.iceberg.DataflintIcebergMetricsReporter
         ```

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FNpzDDSPDoB3V6aUGNqow%2Fimage.png?alt=media&#x26;token=68b114f3-5367-45c8-932f-3c46b0863630" alt=""><figcaption><p>Replacing entire table only to change 1% of records (1 in a 100)</p></figcaption></figure>

## Version 0.1.5

1. Add support for history server with cluster-mode jobs (i.e. with attempt numbet)
2. Fix "wasted cores" calculation
3. Fix status tab SQL is flickering when there is SQL with sub queriers

## Version 0.1.4

Fix scala 2.13 support

## Version 0.1.3

1. DataFlint SaaS support
2. partition Skew Alert:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FOk7qojKHCIJ34XISY9FU%2Fimage.png?alt=media&#x26;token=65a7125f-0c59-485e-acf7-7b2bfd438c86" alt=""><figcaption></figcaption></figure>

## Version 0.1.2

1. Scala 2.13 support
2. A spark flag to disable web app mixpanel telemetries - `spark.dataflint.telemetry.enabled`(true/false)
3. Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FfRtqPlTpwStQaW7hlaEc%2Fimage.png?alt=media&#x26;token=34733ceb-d830-4569-842c-987b9ab734ff" alt=""><figcaption></figcaption></figure>

##

## Version 0.1.1

1. Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!
2. Minor visual fixes

DataFlint Resource Tab:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F7sHKQLyxH9JyBbGCP4Hy%2Fimage.png?alt=media&#x26;token=0e84649d-d0d0-41ba-b0ce-f59c11585526" alt=""><figcaption></figcaption></figure>

## Version 0.1.0

1. Small fix to platform identification

## Version 0.0.8

1. Databricks support
2. Visual improvements
3. public release

## Version 0.0.7

Heat map

<img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F5IxRZvdq1wlp8pTYexzo%2Fimage.png?alt=media&#x26;token=f74c3ef6-023d-40a2-9d4d-3b604df8edf1" alt="" data-size="original">

## Version 0.0.6

Flint Assistant, require OpenAI Key

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F1dR2Sx5jWRJxEU72Cq5i%2Fimage.png?alt=media&#x26;token=7b038381-bd83-4e87-9842-7b0e93420d06" alt=""><figcaption></figcaption></figure>

## Version 0.0.5

#### Syntax highlighting for SQL plan parts

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FM6v9udJyqi52zq60mV6q%2Fimage.png?alt=media&#x26;token=582c94b7-53ae-47ee-9143-b10e3f2fa4df" alt="" width="359"><figcaption><p>Show selected fields</p></figcaption></figure>

#### Calculating container memory usage and using it for GB memory/hour calculations

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F0ote8UAW0sVfrmzCOVvn%2Fimage.png?alt=media&#x26;token=f21d4b57-6193-4051-a9d4-7d4c4c6e2ae8" alt=""><figcaption></figcaption></figure>

## Version 0.0.4

1. Minor fix relates to spark operator and nginx

## Version 0.0.3

#### SQL plan modes

IO only, shows only input, joins and output:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FT6DJujjnn5gTvbI1ZxfD%2Fimage.png?alt=media&#x26;token=4c432350-610b-428a-be06-c86765af148a" alt=""><figcaption></figcaption></figure>

Basic mode (default), shows also transformations like filters, aggregations and selects:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F7YjGaJBHl53AkEWSq1xh%2Fimage.png?alt=media&#x26;token=c4c02c14-37c4-4d74-ab03-94fe5247756c" alt=""><figcaption></figcaption></figure>

#### Advances, shows repartitions, broadcasts and sorts

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FMxDjkSqfOmRyeFmv0yVv%2Fimage.png?alt=media&#x26;token=72c0aa0b-d86d-4a0f-a111-a5cd3bc58cd1" alt=""><figcaption></figcaption></figure>

Also there is plans informations for:

1. Joins
2. Sorts
3. Selects
4. Repartitions

## Version 0.0.2

#### DBU calculation instead of core/hour in summary bar

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F9A7VxinZQ3SVukftNjX0%2Fimage.png?alt=media&#x26;token=c94f4752-1259-4a2e-990d-c74dee2501da" alt="" width="563"><figcaption></figcaption></figure>

#### Add memory config to configuration tab

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FYlr3hjYozlOdIgYw8iL5%2Fimage.png?alt=media&#x26;token=e74753ae-8415-4d69-a372-a0b0a0e25f8a" alt=""><figcaption></figcaption></figure>

#### Filter Nodes has condition:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FoHQP2EX26HlBz3L2PsHT%2Fimage.png?alt=media&#x26;token=1220ea44-17ce-4399-bb31-a35d8023a790" alt="" width="239"><figcaption></figcaption></figure>

#### Advanced mode for SQL plan, that also presents shuffle nodes

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2Fw95ximBELAJC0GCnrLKK%2Fimage.png?alt=media&#x26;token=993ff051-ab90-4394-95c3-71770fb0bd35" alt=""><figcaption></figcaption></figure>

#### Additional changes

1. Support both http and https access with enabling mix-content only on https mode
2. Support for spark 3.5.X

## Version 0.0.1

Initial version, includes:

1. Status page
2. Summary page
3. Configuration Page
4. Alerts page
