Why did Read Excel Advanced infer INTEGER in one file but DOUBLE in another?

Sparkflows uses sample-based inference:

  • Samples a fixed number of rows

  • Evaluates numeric precision

  • Chooses the tightest safe type

Rules:

  • Whole numbers → INTEGER or LONG

  • Decimal values → DOUBLE

  • Overflow beyond Integer → LONG

If any sampled value contains decimals, the column becomes DOUBLE.


Key point

Inference reflects actual data, not visual Excel formatting.


Recommendation

For consistency across files:

  • Use Enforce Schema

  • Avoid relying on inference for production pipelines