How does Sparkflows decide between INTEGER, LONG, and DOUBLE in Read Advanced Excel?

What users observe

A numeric column is inferred as:

  • INTEGER in some cases

  • LONG in others

  • DOUBLE when decimals appear


How Sparkflows decides

Sparkflows uses BigDecimal-based sampling logic to infer numeric types safely.

For each sampled value:

  1. Commas and formatting are removed

  2. Values are parsed using BigDecimal

  3. Precision and scale are evaluated

Decision rules:

  • Whole numbers within Integer range → INTEGER

  • Whole numbers exceeding Integer range → LONG

  • Any decimal precision → DOUBLE

  • Percentages → DOUBLE


Key design goal

Choose the tightest numeric type that safely represents the data.


Recommendation

For deterministic typing across files:

  • Use Enforce Schema

  • Avoid relying on inference for production workflows