Fix to stage identification algorithm
Fix to stage read visuals to show hashed/ranges fields
Visual improvements to stage sidebar
Improvement to sql plan layout with stage nodes
Button to switch to plan without stage nodes
Improvements to stage identification
Small fix for fixing the support of spark version 3.3.X
Small fix for shuffle write metrics real time update
β¨ New Features
New Flow Graph UI with Stage Parent Nodes - Completely redesigned the SQL flow graph visualization with stage parent nodes for better understanding of query execution
Task View in Stage Nodes - Added task progress indicators directly within stage nodes for real-time task monitoring
"Rows Aggregated" Metric - Added a new metric to track aggregated row counts
Exchange Node Separation - Shuffle read and write operations are now displayed separately when 2 stages exist, providing clearer visibility into data exchange operations
π¨ UI Improvements
Enhanced visual consistency across UI components
Replaced CheckIcon with CancelIcon for clearer error representation
Updated FlowLegend to include task progress indicators
Improved styling for node elements in SQL flow
General SQL plan UI improvements
β‘ Performance Improvements
SQLNodeStageReducer Optimization - Implemented O(1) access with lookup maps for nodes and stages, significantly improving efficiency in SQL node stage calculations
Smarter Update Cycles - Skip OnCycleEnd calculations when there are no changes in SQL and stages
Stage Map for Alerts - Use stage map for faster alert processing
π Bug Fixes
Fixed wall clock duration calculation
Fixed duration node hover display issue
Fixed idle cores bug where idle cores showed 0% when all executors closed (Spark incorrectly detected local mode)
π§ Other Changes
Removed Scala distribution from Spark distribution package
Delta lake collector (experimental)
Stage identification improvements
Add better delta lake support - delta write command, optimize command, optimize shuffle before write
Add support for SortAggregate nodes
Fix maven dependency issues with spark 3 POM fiels
Better stage identification using metrics with statistics
Shrinking metrics text in case of high number of metrics
Visual improvements for the new SQL plan UI
New and updated design for the sql plan nodes
Improvement to query presentation
Support query params, like sql-id and node-ids in link
Supports partition pruning
Enriching filter/select nodes with the UDF python function names:
Add support for Generate node - explode, inline and more!
Support newer DBX versions - 14 and up.
Support map by pandas and arrow functions
Added new flag to silence alert for a job -
Added short recommendation on top of alert
Support better stage identifications for varios readers
Shows stage failures with an orange V on sql node and list of complete stage failures
New alerts - cross joins, change join to broadcast, large partition size
Better support for cross joins
Additional metrics for joins and shuffles
Support Distinct, skewed join, coalesce nodes
Sorting by default by duration on history server mode
Filter sql's without any job/stage automatically, add a switch to show them.
Add rows filtered percentage for filter and distinct sql nodes
Version 0.3.0
Support for window functions
Better support for databricks
Detecting resource configuration better in configuration tab
Nvidia RAPIDS for Spark support
New visibility & alerts - Driver's memory
Driver memory monitoring & alert
New alert - Large data Broadcast, for requesting to broadcast large data sets with the broadcast() function
New alert - Large filter conditions, for wiring long filter conditions instead of using join logic
Support spark versions 2.4 logs in history server with version later than 3.2 Limited feature-set is available due to events having less data than spark 3.0 and up
Better Databricks stage to node support
Support spark.dataflint.runId in custom history server providers when appId is not the spark appId
Better support for Databricks Photon plans
Input nodes shows partitions filters and push down filters
Stage Breakdown - press the blue down arrow on sql node to see stage information
Apache Iceberg alerts improvements
Add avg file size in read/write
More information when hovering on stage
Apache Iceberg support
Read metrics and reading small files alerts
Write metrics and overwriting most of table alerts
Require enabling iceberg metric reporter, can be done for you by enabling spark.dataflint.iceberg.autoCatalogDiscovery to true, or setting the iceberg metric reporter manually for each catalog, for example:
Replacing entire table only to change 1% of records (1 in a 100) Add support for history server with cluster-mode jobs (i.e. with attempt numbet)
Fix "wasted cores" calculation
Fix status tab SQL is flickering when there is SQL with sub queriers
Fix scala 2.13 support
A spark flag to disable web app mixpanel telemetries - spark.dataflint.telemetry.enabled(true/false)
Renamed Core Activity Rate to Wasted Cores Ratio (which is 100 - Core Activity Rate), and added an alert for wasted cores too high
Resources tab - see a graph of your cluster executors count over time, use it to tune your resource allocation settings and save cost!
DataFlint Resource Tab:
Small fix to platform identification
Heat map

Flint Assistant, require OpenAI Key
Syntax highlighting for SQL plan parts
Calculating container memory usage and using it for GB memory/hour calculations
Minor fix relates to spark operator and nginx
IO only, shows only input, joins and output:
Basic mode (default), shows also transformations like filters, aggregations and selects:
Advances, shows repartitions, broadcasts and sorts
Also there is plans informations for:
DBU calculation instead of core/hour in summary bar
Add memory config to configuration tab
Filter Nodes has condition:
Advanced mode for SQL plan, that also presents shuffle nodes
Additional changes
Support both http and https access with enabling mix-content only on https mode
Initial version, includes: