How to increase driver and executor memory when running on Incorta

When running on Incorta use the below configuration for increasing the driver memory.

--conf spark.driver.memory=12g

Below are practical Spark configs for memory and cores that work well in most production setups.

:one: Core Principles (quick rules)

  • Executors > 1 core (2–5 is ideal)

  • Avoid very large executors (GC pain)

  • Leave headroom for OS + YARN

  • Memory = executor memory + overhead

:two: Standard “Balanced” Configuration (Most workloads)

Executors

--conf spark.executor.instances=8

--conf spark.executor.cores=4

--conf spark.executor.memory=16g

--conf spark.executor.memoryOverhead=4g

Driver

--conf spark.driver.cores=4

--conf spark.driver.memory=8g

Why this works

  • 4 cores per executor → good parallelism

  • 16G heap → manageable GC

  • 4G overhead → safe for shuffle & Python

:three: Memory-Heavy Workloads (joins, aggregations, caching)

--conf spark.executor.instances=6

--conf spark.executor.cores=4

--conf spark.executor.memory=24g

--conf spark.executor.memoryOverhead=6g

Optional tuning:

--conf spark.memory.fraction=0.6

--conf spark.memory.storageFraction=0.3

Use this when:

  • Large shuffles

  • Broadcast joins failing

  • Frequent OOMs

:four: CPU-Heavy Workloads (UDFs, ML, parsing)

--conf spark.executor.instances=10

--conf spark.executor.cores=5

--conf spark.executor.memory=12g

--conf spark.executor.memoryOverhead=3g

Why:

  • More cores → more parallel compute

  • Moderate memory → avoids GC stalls

:five: Small Cluster / Dev Environment

--conf spark.executor.instances=2

--conf spark.executor.cores=2

--conf spark.executor.memory=4g

--conf spark.executor.memoryOverhead=1g

--conf spark.driver.memory=4g

:six: Spark Submit Example (Complete)

spark-submit \

–conf spark.executor.instances=8 \

–conf spark.executor.cores=4 \

–conf spark.executor.memory=16g \

–conf spark.executor.memoryOverhead=4g \

–conf spark.driver.cores=4 \

–conf spark.driver.memory=8g

:seven: Quick Sizing Formula (YARN / EMR)

If node = 64 GB RAM, 16 cores

  • Leave 8–10 GB for OS

  • Usable ≈ 54 GB

  • Example:

    • 3 executors per node

    • Each executor:

      • 4 cores

      • 16G + 4G overhead = 20G

    • Total: 60G → too much

    • Reduce to 2 executors per node