πŸ”“Security & Stability

DataFlint for Spark is highly secure and stable, and this page is explaining why:

Security

  1. DataFlint is running locally on your spark driver or history server

  2. DataFlint is using the existing Spark UI endpoint. so no new endpoints or ports are being exposed

  3. DataFlint is open source, so you can see what the plugin does. No black-box wizardry!

  4. The dataflint liberary jar is a stable version in maven central OSS repo, and maven does not enable editing or changing stable versions. Meaning it's not possible that the code that runs in your cluster to change.

Stability

  1. If DataFlint failing on startup it will throw a warning and let the app continue

  2. Dataflint is running code in Spark when you access the Web UI

  3. Errors in DataFlint in the driver are in a separate thread and should not effect the app runtime

Performance

  1. Most of the compute is being done in the DataFlint Web UI side

  2. DataFlint only runs compute on the driver, not on the executors

  3. DataFlint runs compute on the driver only when the DataFlint Web UI is open and the tab is active

  4. DataFlint query the driver API ever 1 second, so the performance impact is similar to looking at the existing Spark UI and refreshing constantly

Telemetry

We collect anonymous metrics via MixPanel on usage of DataFlint. We do not collect any data about your actual spark job beside:

  1. Spark version

  2. App id

Last updated