Running a ZIP-packaged PySpark job using Add Step EMR JobFlow

We are packaging PySpark utilities into a ZIP file and using Sparkflows to build the EMR workflow with the Add Step – EMR JobFlow node to add steps to the cluster.

Example of running the Python script in a ZIP file. The my_job_test.zip has the following structure:

my_job_test.zip
├── main.py
└── utils.py

To run main.py in the ZIP file via AddStepEMRJobFlow:

Cleanup → Download → Unzip → Run

Argument:

bash,-c,“rm -f /tmp/my_zip_test.zip && rm -rf /tmp/my_zip_test && hdfs dfs -copyToLocal s3://sparkflows-airflows-2026/TestingDirectory/my_zip_test.zip /tmp/my_zip_test.zip && unzip -o /tmp/my_zip_test.zip -d /tmp/my_zip_test && spark-submit --master yarn --deploy-mode client --py-files /tmp/my_zip_test.zip /tmp/my_zip_test/main.py”

The above command is added as a step in EMR.