When running on Incorta use the below configuration for increasing the driver memory.
--conf spark.driver.memory=12g
Below are practical Spark configs for memory and cores that work well in most production setups.
Core Principles (quick rules)
-
Executors > 1 core (2–5 is ideal)
-
Avoid very large executors (GC pain)
-
Leave headroom for OS + YARN
-
Memory = executor memory + overhead
Standard “Balanced” Configuration (Most workloads)
Executors
--conf spark.executor.instances=8
--conf spark.executor.cores=4
--conf spark.executor.memory=16g
--conf spark.executor.memoryOverhead=4g
Driver
--conf spark.driver.cores=4
--conf spark.driver.memory=8g
Why this works
-
4 cores per executor → good parallelism
-
16G heap → manageable GC
-
4G overhead → safe for shuffle & Python
Memory-Heavy Workloads (joins, aggregations, caching)
--conf spark.executor.instances=6
--conf spark.executor.cores=4
--conf spark.executor.memory=24g
--conf spark.executor.memoryOverhead=6g
Optional tuning:
--conf spark.memory.fraction=0.6
--conf spark.memory.storageFraction=0.3
Use this when:
-
Large shuffles
-
Broadcast joins failing
-
Frequent OOMs
CPU-Heavy Workloads (UDFs, ML, parsing)
--conf spark.executor.instances=10
--conf spark.executor.cores=5
--conf spark.executor.memory=12g
--conf spark.executor.memoryOverhead=3g
Why:
-
More cores → more parallel compute
-
Moderate memory → avoids GC stalls
Small Cluster / Dev Environment
--conf spark.executor.instances=2
--conf spark.executor.cores=2
--conf spark.executor.memory=4g
--conf spark.executor.memoryOverhead=1g
--conf spark.driver.memory=4g
Spark Submit Example (Complete)
spark-submit \
–conf spark.executor.instances=8 \
–conf spark.executor.cores=4 \
–conf spark.executor.memory=16g \
–conf spark.executor.memoryOverhead=4g \
–conf spark.driver.cores=4 \
–conf spark.driver.memory=8g
Quick Sizing Formula (YARN / EMR)
If node = 64 GB RAM, 16 cores
-
Leave 8–10 GB for OS
-
Usable ≈ 54 GB
-
Example:
-
3 executors per node
-
Each executor:
-
4 cores
-
16G + 4G overhead = 20G
-
-
Total: 60G → too much
-
Reduce to 2 executors per node
-