πŸ—ΊοΈHow It Works

Existing Spark Monitoring Solution

Spark has an existing simple visibility solution, that includes

  1. A static web app (Spark UI) that shows information on spark run

  2. The ability to write events to log and load it via History Server to see past runs

DataFlint solution:

Our solution is based on the existing Apache Spark infrastructure, and by loading a plugin to both the Spark Driver and History Server, We can expose a modern real-time web app.

The plugin also exposes additional REST endpoints to show you information not available in vanilla spark-UI

[Future solution]

The main issue with the existing infrastructure is that hosting History Server is hard, your spark runs are not analyzed and there is no authentication, run sharing, alerting, filtering by team or application name and so on.

In our future solution we want to the spark driver to send a summary of the run to a SaaS solution that will index and analyze all your spark runs for you.

The Web App will be able to access the SaaS solution to show you the app behavior versus past runs and analyze and supply recommendation your run using AI in real time!

Last updated