Quick Start - Spark 3.5.0 Documentation

Installation

  1. Pull the Spark Docker Image: Apache Spark provides an official Docker image that you can use. You can pull it from the Docker Hub using the following command:

    docker pull apache/spark
    
  2. Create a Docker Network: It's a good practice to create a Docker network to allow containers to communicate with each other. You can create a network using:

    docker network create spark-net
    
  3. Run the Spark Master Container: Start the Spark Master container. Replace SPARK_MASTER_HOST with your host machine's IP address.

    docker run -it --rm --name spark-master --network spark-net -p 8080:8080 -p 7077:7077 -e SPARK_MASTER_HOST=<your_host_ip> apache/spark bin/spark-class org.apache.spark.deploy.master.Master
    
  4. Run a Spark Worker Container: Open a new terminal and run the Spark Worker container. Replace SPARK_MASTER with the IP address of the Spark Master container.

    docker run -it --rm --name spark-worker-1 --network spark-net -e SPARK_MASTER=spark://spark-master:7077 apache/spark bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
    
  5. Submit Spark Applications: Now, you can submit Spark applications to the Spark Master. You can either use the Spark shell or submit applications using spark-submit. Here's an example:

    docker run -it --rm --name spark-submit --network spark-net -v /path/to/your/spark/application:/opt/spark/app -e SPARK_MASTER=spark://spark-master:7077 apache/spark bin/spark-submit --class your.main.Class --master spark://spark-master:7077 /opt/spark/app/your-spark-application.jar
    

    Replace /path/to/your/spark/application with the path to your Spark application code, and your.main.Class with the main class of your Spark application.