How to Pass Comma-Separated Arguments in Add Step EMR JobFlow Without Breaking spark-submit?

Question

I want to know about the Sparkflows Add Step EMR JobFlow pipeline node. I am trying to run a spark-submit job.

One of my arguments is a comma-delimited list of values. In the example below, it is the value for the --batch argument:

spark-submit,–deploy-mode,client,–class,,,–batch,“2024010100,2024010101,2024010102”

Sparkflows parses the comma-delimited list into separate spark-submit arguments, which causes the step to fail.

Is there any way to quote the parameter value or escape the commas so that the value "2024010100,2024010101,2024010102" will be treated as a single spark-submit argument?

Answer

AddStepEMRJob uses command-runner.jar by default to run commands as a step in EMR.

Here is an example of running the spark-submit command in AddStepEMRJobFlow:

bash,-c,“spark-submit --master yarn --deploy-mode client --class com.fire.SparkPi s3://sparkflows-release/fire/library-jar/livy-jar/fire-spark-test-1.0-jar-with-dependencies.jar {arg}”

In the above command:

  • {arg} is parameterized.
  • The value is passed during execution time.