# Install on DataBricks

### Install on DataBricks as a spark plugin (recommended)

{% hint style="warning" %}
DataBricks init scripts are not supported in DataBricks Community, in this case you will need to install it from a notebook
{% endhint %}

Add to you init script the following section

```bash
DATAFLINT_VERSION="0.8.8"
SPARK_DEFAULTS_FILE="/databricks/driver/conf/00-custom-spark-driver-defaults.conf"

mkdir -p /databricks/jars/

wget --quiet \
  -O /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar \
  https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  mkdir -p /mnt/driver-daemon/jars/
  cp /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar /mnt/driver-daemon/jars/spark_2.12-$DATAFLINT_VERSION.jar
  echo "[driver] {" >> $SPARK_DEFAULTS_FILE
  echo "  spark.plugins = io.dataflint.spark.SparkDataflintPlugin" >> $SPARK_DEFAULTS_FILE
  echo "}" >> $SPARK_DEFAULTS_FILE
fi
```

### How to create an init script

If you do not have an existing init script, this is how you create one in Databricks:

Go to workspace and press "Create -> File"

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F8ujE0Emq82qn6jx94T0K%2Fimage.png?alt=media&#x26;token=6d2f0f33-ad83-47bb-bea4-678e1b20d541" alt=""><figcaption></figcaption></figure>

Paste the DataFlint snippet & save. Afterward in your cluster config, to to Advanced -> init scripts and add your newly created init script:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FtYabEwyjJYRhJd7gm3vG%2Fimage.png?alt=media&#x26;token=09cf6c6a-5ab9-4bef-b8bc-74fb949fb29e" alt=""><figcaption></figcaption></figure>

### Install on DataBricks from a notebook

{% hint style="info" %}
This method supports both DataBricks Community and the DataBricks paid version
{% endhint %}

1. Go to your cluster to "library" tab and click on the "Install new" blue button:

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FrHOoQHXHcnQcLeBzDuAN%2Fimage.png?alt=media&#x26;token=8d56d401-7705-49c0-a61f-87f56aa0a812" alt=""><figcaption></figcaption></figure>

2. Choose "Maven" and enter in the coordinates for dataflint

```
io.dataflint:spark_2.12:0.8.8
```

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FlkMaXn285KyevOs5hN61%2Fimage.png?alt=media&#x26;token=ec4dd9ee-f72e-4e4c-8b81-f8511eae37ac" alt=""><figcaption><p>image is using version 0.1.0, you should use the newest dataflint version :)</p></figcaption></figure>

3. In your notebook or app, run this 2 lines (if in python notebook, also add %scala)

```scala
%scala
import io.dataflint.spark.SparkDataflint
SparkDataflint.install(spark.sparkContext)
```

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FEPa3ojjlVVHbMaVhcl8c%2Fimage.png?alt=media&#x26;token=ad35d753-fb1a-4fa5-a19e-8168acb25164" alt=""><figcaption></figcaption></figure>

Now DataFlint is installed! now you can go to "Spark UI" tab and see a "dataflint" tab. It's highly recommended to use the "Open in new tab" link so you get the best experience.

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FdECmJXnieWVBV8MagLJC%2Fimage.png?alt=media&#x26;token=ac749385-1f76-463f-bba6-2271077b28ac" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
Dataflint is only supported when the cluster is running, after the cluster is running dataflint will no longer be available
{% endhint %}
