# Install on DataBricks

### Install on DataBricks as a spark plugin (recommended)

{% hint style="warning" %}
DataBricks init scripts are not supported in DataBricks Community, in this case you will need to install it from a notebook
{% endhint %}

Add to you init script the following section

```bash
DATAFLINT_VERSION="0.9.0"
SPARK_DEFAULTS_FILE="/databricks/driver/conf/00-custom-spark-driver-defaults.conf"

mkdir -p /databricks/jars/

wget --quiet \
  -O /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar \
  https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  mkdir -p /mnt/driver-daemon/jars/
  cp /databricks/jars/spark_2.12-$DATAFLINT_VERSION.jar /mnt/driver-daemon/jars/spark_2.12-$DATAFLINT_VERSION.jar
  echo "[driver] {" >> $SPARK_DEFAULTS_FILE
  echo "  spark.plugins = io.dataflint.spark.SparkDataflintPlugin" >> $SPARK_DEFAULTS_FILE
  echo "}" >> $SPARK_DEFAULTS_FILE
fi
```

### How to create an init script

If you do not have an existing init script, this is how you create one in Databricks:

Go to workspace and press "Create -> File"

<figure><img src="/files/65L0PyYDdMOC0qOLopqy" alt=""><figcaption></figcaption></figure>

Paste the DataFlint snippet & save. Afterward in your cluster config, to to Advanced -> init scripts and add your newly created init script:

<figure><img src="/files/5W60iwGesx3nXegRvKWe" alt=""><figcaption></figcaption></figure>

### Install on DataBricks from a notebook

{% hint style="info" %}
This method supports both DataBricks Community and the DataBricks paid version
{% endhint %}

1. Go to your cluster to "library" tab and click on the "Install new" blue button:

<figure><img src="/files/1mOtAOIn0NN9RxvS4h2o" alt=""><figcaption></figcaption></figure>

2. Choose "Maven" and enter in the coordinates for dataflint

```
io.dataflint:spark_2.12:0.9.0
```

<figure><img src="/files/hcIWwNOiNy17jURqJh2t" alt=""><figcaption><p>image is using version 0.1.0, you should use the newest dataflint version :)</p></figcaption></figure>

3. In your notebook or app, run this 2 lines (if in python notebook, also add %scala)

```scala
%scala
import io.dataflint.spark.SparkDataflint
SparkDataflint.install(spark.sparkContext)
```

<figure><img src="/files/D1f5rVoxjFotkfNTs3AZ" alt=""><figcaption></figcaption></figure>

Now DataFlint is installed! now you can go to "Spark UI" tab and see a "dataflint" tab. It's highly recommended to use the "Open in new tab" link so you get the best experience.

<figure><img src="/files/BHwFbJSSKFUDW69ubzHN" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
Dataflint is only supported when the cluster is running, after the cluster is running dataflint will no longer be available
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dataflint.gitbook.io/dataflint-for-spark/getting-started/install-on-databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
