Quick Start - Spark 3.5.0 Documentation
Pull the Spark Docker Image: Apache Spark provides an official Docker image that you can use. You can pull it from the Docker Hub using the following command:
docker pull apache/spark
Create a Docker Network: It's a good practice to create a Docker network to allow containers to communicate with each other. You can create a network using:
docker network create spark-net
Run the Spark Master Container:
Start the Spark Master container. Replace SPARK_MASTER_HOST
with your host machine's IP address.
docker run -it --rm --name spark-master --network spark-net -p 8080:8080 -p 7077:7077 -e SPARK_MASTER_HOST=<your_host_ip> apache/spark bin/spark-class org.apache.spark.deploy.master.Master
Run a Spark Worker Container:
Open a new terminal and run the Spark Worker container. Replace SPARK_MASTER
with the IP address of the Spark Master container.
docker run -it --rm --name spark-worker-1 --network spark-net -e SPARK_MASTER=spark://spark-master:7077 apache/spark bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
Submit Spark Applications:
Now, you can submit Spark applications to the Spark Master. You can either use the Spark shell or submit applications using spark-submit
. Here's an example:
docker run -it --rm --name spark-submit --network spark-net -v /path/to/your/spark/application:/opt/spark/app -e SPARK_MASTER=spark://spark-master:7077 apache/spark bin/spark-submit --class your.main.Class --master spark://spark-master:7077 /opt/spark/app/your-spark-application.jar
Replace /path/to/your/spark/application
with the path to your Spark application code, and your.main.Class
with the main class of your Spark application.