Back

/ 2 min read

Building Your Own Spark UI

Last Updated:

Debugging Apache Spark jobs can be challenging. The Spark UI is one of the most crucial tools at your disposal for quickly investigating issues. In this guide, we’ll walk through setting up a local Spark environment and submitting a test job.

Environment Setup

First, let’s install Apache Spark from binaries. We’ll use version 3.2.2 for this tutorial.

Prerequisites

Install Java 11 (Zulu distribution) using SDKMAN:

Terminal window
curl -s "https://get.sdkman.io" | bash
sdk list java
sdk install java 11.0.17-zulu

Installing and Starting Spark

  1. Download and extract Spark:
Terminal window
wget https://downloads.apache.org/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz
tar -xvzf spark-3.2.2-bin-hadoop3.2.tgz
cd spark-3.2.2-bin-hadoop3.2/sbin
  1. Start a standalone Spark cluster:
Terminal window
./start-master.sh
  1. Open http://localhost:8080 in your browser to verify the Spark Master UI is running.

  2. Start a Spark worker (replace Sais-Air:7077 with your master URL):

Terminal window
./start-worker.sh spark://Sais-Air:7077

You should now see a worker registered in the Master UI.

Submitting a Test Job

Let’s submit a sample job to ensure everything is working correctly:

Terminal window
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Sais-Air:7077 \
examples/jars/*-examples_*.jar \
1000

This job calculates Pi using Spark.

Troubleshooting

If you’re using Java 17.0.5-zulu, you may encounter an error: cannot access class sun.nio.ch.DirectBuffer. To fix this, add the --driver-java-options flag to your spark-submit command:

Terminal window
./bin/spark-submit \
--driver-java-options "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED" \
--class org.apache.spark.examples.SparkPi \
--master spark://Sais-Air:7077 \
examples/jars/*-examples_*.jar \
1000

Conclusion

You now have a working Spark environment with the Master UI accessible at http://localhost:8080. This setup allows you to experiment with Spark jobs and utilize the Spark UI for debugging and performance analysis.

In future posts, we’ll dive deeper into using the Spark UI for performance tuning and troubleshooting common issues in Spark applications.

Additional Resources

Remember to stop your Spark cluster when you’re done by running ./stop-master.sh and ./stop-worker.sh in the sbin directory.