🧱Install on DataBricks

Install on DataBricks from an app/notebook

This method supports both DataBricks Community and the DataBricks paid version

  1. Go to your cluster to "library" tab and click on the "Install new" blue button:

  1. Choose "Maven" and enter in the coordinates for dataflint

io.dataflint:spark_2.12:0.4.1
image is using version 0.1.0, you should use the newest dataflint version :)
  1. In your notebook or app, run this 2 lines (if in python notebook, also add %scala)

%scala
import io.dataflint.spark.SparkDataflint
SparkDataflint.install(spark.sparkContext)

Now DataFlint is installed! now you can go to "Spark UI" tab and see a "dataflint" tab. It's highly recommended to use the "Open in new tab" link so you get the best experience.

Install on DataBricks as a spark plugin

Add to you init script the following section:

DATAFLINT_VERSION="0.4.1"
SPARK_DEFAULTS_FILE="/databricks/driver/conf/00-custom-spark-driver-defaults.conf"

mkdir -p /databricks/jars/

wget --quiet \
  -O /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar \
  https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  mkdir -p /mnt/driver-daemon/jars/
  cp /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar /mnt/driver-daemon/jars/spark_2.12-$DATAFLINT_VERSION.jar
  echo "[driver] {" >> $SPARK_DEFAULTS_FILE
  echo "  spark.plugins = io.dataflint.spark.SparkDataflintPlugin" >> $SPARK_DEFAULTS_FILE
  echo "}" >> $SPARK_DEFAULTS_FILE
fi

Last updated