Debugging Apache Spark jobs can be challenging. The Spark UI is one of the most crucial tools at your disposal for quickly investigating issues. In this guide, we’ll walk through setting up a local Spark environment and submitting a test job.
Environment Setup
First, let’s install Apache Spark from binaries. We’ll use version 3.2.2 for this tutorial.
Prerequisites
Install Java 11 (Zulu distribution) using SDKMAN:
Installing and Starting Spark
- Download and extract Spark:
- Start a standalone Spark cluster:
-
Open
http://localhost:8080
in your browser to verify the Spark Master UI is running. -
Start a Spark worker (replace
Sais-Air:7077
with your master URL):
You should now see a worker registered in the Master UI.
Submitting a Test Job
Let’s submit a sample job to ensure everything is working correctly:
This job calculates Pi using Spark.
Troubleshooting
If you’re using Java 17.0.5-zulu, you may encounter an error: cannot access class sun.nio.ch.DirectBuffer
. To fix this, add the --driver-java-options
flag to your spark-submit
command:
Conclusion
You now have a working Spark environment with the Master UI accessible at http://localhost:8080
. This setup allows you to experiment with Spark jobs and utilize the Spark UI for debugging and performance analysis.
In future posts, we’ll dive deeper into using the Spark UI for performance tuning and troubleshooting common issues in Spark applications.
Additional Resources
Remember to stop your Spark cluster when you’re done by running ./stop-master.sh
and ./stop-worker.sh
in the sbin
directory.