Question
I want to know about the Sparkflows Add Step EMR JobFlow pipeline node. I am trying to run a spark-submit job.
One of my arguments is a comma-delimited list of values. In the example below, it is the value for the --batch argument:
spark-submit,–deploy-mode,client,–class,,,–batch,“2024010100,2024010101,2024010102”
Sparkflows parses the comma-delimited list into separate spark-submit arguments, which causes the step to fail.
Is there any way to quote the parameter value or escape the commas so that the value "2024010100,2024010101,2024010102" will be treated as a single spark-submit argument?
Answer
AddStepEMRJob uses command-runner.jar by default to run commands as a step in EMR.
Here is an example of running the spark-submit command in AddStepEMRJobFlow:
bash,-c,“spark-submit --master yarn --deploy-mode client --class com.fire.SparkPi s3://sparkflows-release/fire/library-jar/livy-jar/fire-spark-test-1.0-jar-with-dependencies.jar {arg}”
In the above command:
{arg}is parameterized.- The value is passed during execution time.

