β°Alerts
Summary
One of the unique selling points of DataFlint is the alerting system. Where in traditional spark monitoring you get a bunch of metrics and needs to figure out what they mean, with DataFlint you have alerts that points you to what is wrong and suggest you on potential fixes.
Alerts
Reading Small Files
[Also works for Apache Iceberg tables]
Writing Small Files
Apache iceberg - inefficient replace of data
Partition Skew
Large Number Of Small Tasks
Memory Over-Provisioning
Memory Under-Provisioning
High wasted cores rate
Query Failutes
Another type of "alert" is query failure. When hovering on the alert icon, DataFlint extract the error from the scary JVM stack trace and show it in the top of the message.
When you press the query you can see the exact place on the logical plan the query failed, in our case it's in the stage relates to writing files in the end of the query plan
Alerts roadmap
High task error rate
High executors error rate
High disk spill relative to input size and available memory.
Nested for-loop join with high cardinality
Broadcasting huge datasets
repartition before write with low cardinality that causes lack of parallelism or huge files.
Executor memory overhead is too low and causes container failure
Last updated