✨
Dataflint for Spark
Github
  • πŸ‘‹Welcome to DataFlint
  • Overview
    • ✨Our Features
    • πŸ—ΊοΈHow It Works
    • πŸ”“Security & Stability
    • πŸ” Supported Versions
    • βͺRelease Notes
  • Getting Started
    • ⭐Install on Spark
    • 🧱Install on DataBricks
    • πŸ“ͺInstall on Spark History Server
    • 🏹Usage
  • ⛓️Integrations
    • 🧊Apache Iceberg
  • 🏹Advanced
    • πŸ’°DCU calculation
    • ⏰Alerts
  • πŸ‘¨β€πŸ’ΌSaaS
    • πŸ”’SaaS Security & Stability
Powered by GitBook
On this page
  • Summary
  • Alerts
  • Reading Small Files
  • Writing Small Files
  • Apache iceberg - inefficient replace of data
  • Partition Skew
  • Large Number Of Small Tasks
  • Memory Over-Provisioning
  • Memory Under-Provisioning
  • High wasted cores rate
  • Large Data Broadcast
  • Broadcast small table in Sort Merge Join
  • Large Cross Join Scan
  • Large Partition Size
  • Long Filter Conditions
  • Query Failures
  • Alerts roadmap
  1. Advanced

Alerts

Last updated 1 month ago

Summary

One of the unique "selling points" of DataFlint is the alerting system. Where in traditional spark monitoring you get a bunch of metrics and needs to figure out what they mean, with DataFlint you have alerts that points you to what is wrong and suggest you on potential fixes.

Alerts

Reading Small Files

[Also works for Apache Iceberg tables]

Writing Small Files

Apache iceberg - inefficient replace of data

Partition Skew

Large Number Of Small Tasks

Memory Over-Provisioning

Memory Under-Provisioning

High wasted cores rate

Large Data Broadcast

Broadcast small table in Sort Merge Join

Large Cross Join Scan

Large Partition Size

Long Filter Conditions

Query Failures

Another type of "alert" is query failure. When hovering on the alert icon, DataFlint extract the error from the scary JVM stack trace and show it in the top of the message.

When you press the query you can see the exact place on the logical plan the query failed, in our case it's in the stage relates to writing files in the end of the query plan

Alerts roadmap

  1. High task error rate

  2. High executors error rate

  3. High disk spill relative to input size and available memory.

  4. repartition before write with low cardinality that causes lack of parallelism or huge files.

  5. Executor memory overhead is too low and causes container failure

🏹
⏰