Sparkflows uses sample-based inference:
-
Samples a fixed number of rows
-
Evaluates numeric precision
-
Chooses the tightest safe type
Rules:
-
Whole numbers → INTEGER or LONG
-
Decimal values → DOUBLE
-
Overflow beyond Integer → LONG
If any sampled value contains decimals, the column becomes DOUBLE.
Key point
Inference reflects actual data, not visual Excel formatting.
Recommendation
For consistency across files:
-
Use Enforce Schema
-
Avoid relying on inference for production pipelines