# Install on Spark History Server

Dataflint is implementing as a file system plugin to be loaded to Spark History Server

{% hint style="warning" %}

### For Spark 4.0 users: **Replace spark\_2.12 in the artifact/package name to dataflint\_spark4\_2.13**

{% endhint %}

#### Spark History Server Installation Script

```bash
#!/bin/bash

cd "$SPARK_HOME"
DATAFLINT_VERSION="0.8.9"

# Step 1: Download the jar to the history server machine
wget -O /tmp/spark_2.12-$DATAFLINT_VERSION.jar \
  https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

# Step 2: add dataflint jar to classpath
export SPARK_DAEMON_CLASSPATH=/tmp/spark_2.12-$DATAFLINT_VERSION.jar

# step 3: if history server already running, stop and start it again
./sbin/stop-history-server.sh
./sbin/start-history-server.sh
```

#### Alternative installation

Instead of environment variable, you can download the jar to the $SPARK\_HOME/jars folder so it will be loaded automatically to spark history server

#### How it works

The jar includes a history server plugin that add the DataFlint UI when a spark UI app is being loaded from logs.

History Server does not support packages loading (Apache Ivy) like live spark app, so you need to download the jar and load it to the history server manually

### Install on EMR history server

{% hint style="warning" %}
This method does not work on persistent EMR history server, and for the on-cluster the default AWS proxy currently doesn't work correctly so you need to do port-forward
{% endhint %}

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2FEk90CFX3Av8RZqOsYEwl%2FScreenshot%202024-02-06%20at%2018.51.38.png?alt=media&#x26;token=bedd5448-232a-4453-af4a-5ee4bd4bc21b" alt=""><figcaption><p>Location of the On-Cluster Spark History Server link you should use</p></figcaption></figure>

#### Via bootstrap script

```bash
sudo su
DATAFLINT_VERSION="0.8.9"
if sudo grep isMaster /mnt/var/lib/info/instance.json | grep true;
then        
    sudo wget \
        -O /usr/lib/spark/jars/spark_2.12-$DATAFLINT_VERSION.jar \
        https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar
fi
```

#### Via SSH

Connect to your EMR cluster via ssh and run the following commands:

```bash
sudo su
DATAFLINT_VERSION="0.8.9"
sudo wget \
-O /usr/lib/spark/jars/spark_2.12-$DATAFLINT_VERSION.jar \
https://repo1.maven.org/maven2/io/dataflint/spark_2.12/$DATAFLINT_VERSION/spark_2.12-$DATAFLINT_VERSION.jar

sudo systemctl stop spark-history-server.service
sudo systemctl start spark-history-server.service
```

#### From EMR terminated cluster

Go to your EMR terminated server, Applications tab and press "Spark History Server" to open the persistant history server

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2Fbat4RMOY6qpyTzY8jZiN%2Fimage.png?alt=media&#x26;token=0a292cb1-bff9-4af6-929b-7fb657d8f67f" alt=""><figcaption></figcaption></figure>

Press the "Download" button on the relevant application or applications

<figure><img src="https://2982210886-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Fcg8pTm3VgVaeMncRl8LP%2Fuploads%2F2ndOKpSXWNbKUN7IW8Au%2Fimage.png?alt=media&#x26;token=cb180e91-421f-4a21-8ad8-4873acc823c0" alt=""><figcaption></figcaption></figure>

Extract the zip file to /tmp/spark-events, and then run the [install-on-spark-history-server](https://dataflint.gitbook.io/dataflint-for-spark/getting-started/install-on-spark-history-server "mention") locally on your machine

#### Running spark history server on your machine

In order to run the spark history server on your machine you need first to:

1. Have java (version 8 or 11) installed on your machine, and set up JAVA\_HOME
2. Download spark from <https://spark.apache.org/downloads.html> and extract the downloaded zip somewhere on your machine
3. Set up the SPARK\_HOME environment variable to where you extracted Spark
