✨
Dataflint for Spark
Github
  • πŸ‘‹Welcome to DataFlint
  • Overview
    • ✨Our Features
    • πŸ—ΊοΈHow It Works
    • πŸ”“Security & Stability
    • πŸ” Supported Versions
    • βͺRelease Notes
  • Getting Started
    • ⭐Install on Spark
    • 🧱Install on DataBricks
    • πŸ“ͺInstall on Spark History Server
    • 🏹Usage
  • ⛓️Integrations
    • 🧊Apache Iceberg
  • 🏹Advanced
    • πŸ’°DCU calculation
    • ⏰Alerts
  • πŸ‘¨β€πŸ’ΌSaaS
    • πŸ”’SaaS Security & Stability
Powered by GitBook
On this page
  • Install on DataBricks from an app/notebook
  • Install on DataBricks as a spark plugin
  1. Getting Started

Install on DataBricks

Last updated 1 month ago

Install on DataBricks from an app/notebook

This method supports both DataBricks Community and the DataBricks paid version

  1. Go to your cluster to "library" tab and click on the "Install new" blue button:

  1. Choose "Maven" and enter in the coordinates for dataflint

io.dataflint:spark_2.12:0.3.2
  1. In your notebook or app, run this 2 lines (if in python notebook, also add %scala)

%scala
import io.dataflint.spark.SparkDataflint
SparkDataflint.install(spark.sparkContext)

Now DataFlint is installed! now you can go to "Spark UI" tab and see a "dataflint" tab. It's highly recommended to use the "Open in new tab" link so you get the best experience.

Dataflint is only supported when the cluster is running, after the cluster is running dataflint will no longer be available

Install on DataBricks as a spark plugin

DataBricks init scripts are not supported in DataBricks Community, so you will need to install it from a notebook/app

Add to you init script the following section:

DATAFLINT_VERSION="0.3.2"
SPARK_DEFAULTS_FILE="/databricks/driver/conf/00-custom-spark-driver-defaults.conf"

mkdir -p /databricks/jars/

wget --quiet \
  -O /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar \
  https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  mkdir -p /mnt/driver-daemon/jars/
  cp /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar /mnt/driver-daemon/jars/spark_2.12-$DATAFLINT_VERSION.jar
  echo "[driver] {" >> $SPARK_DEFAULTS_FILE
  echo "  spark.plugins = io.dataflint.spark.SparkDataflintPlugin" >> $SPARK_DEFAULTS_FILE
  echo "}" >> $SPARK_DEFAULTS_FILE
fi
🧱
image is using version 0.1.0, you should use the newest dataflint version :)