🧱Install on DataBricks
Install on DataBricks as a spark plugin (recommended)
DataBricks init scripts are not supported in DataBricks Community, in this case you will need to install it from a notebook
Add to you init script the following section
DATAFLINT_VERSION="0.5.1"
SPARK_DEFAULTS_FILE="/databricks/driver/conf/00-custom-spark-driver-defaults.conf"
mkdir -p /databricks/jars/
wget --quiet \
-O /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar \
https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar
if [[ $DB_IS_DRIVER = "TRUE" ]]; then
mkdir -p /mnt/driver-daemon/jars/
cp /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar /mnt/driver-daemon/jars/spark_2.12-$DATAFLINT_VERSION.jar
echo "[driver] {" >> $SPARK_DEFAULTS_FILE
echo " spark.plugins = io.dataflint.spark.SparkDataflintPlugin" >> $SPARK_DEFAULTS_FILE
echo "}" >> $SPARK_DEFAULTS_FILE
fi
How to create an init script
If you do not have an existing init script, this is how you create one in Databricks:
Go to workspace and press "Create -> File"

Paste the DataFlint snippet & save. Afterward in your cluster config, to to Advanced -> init scripts and add your newly created init script:

Install on DataBricks from a notebook
Go to your cluster to "library" tab and click on the "Install new" blue button:

Choose "Maven" and enter in the coordinates for dataflint
io.dataflint:spark_2.12:0.5.1

In your notebook or app, run this 2 lines (if in python notebook, also add %scala)
%scala
import io.dataflint.spark.SparkDataflint
SparkDataflint.install(spark.sparkContext)

Now DataFlint is installed! now you can go to "Spark UI" tab and see a "dataflint" tab. It's highly recommended to use the "Open in new tab" link so you get the best experience.

Dataflint is only supported when the cluster is running, after the cluster is running dataflint will no longer be available
Last updated