What users observe
A numeric column is inferred as:
-
INTEGERin some cases -
LONGin others -
DOUBLEwhen decimals appear
How Sparkflows decides
Sparkflows uses BigDecimal-based sampling logic to infer numeric types safely.
For each sampled value:
-
Commas and formatting are removed
-
Values are parsed using
BigDecimal -
Precision and scale are evaluated
Decision rules:
-
Whole numbers within Integer range →
INTEGER -
Whole numbers exceeding Integer range →
LONG -
Any decimal precision →
DOUBLE -
Percentages →
DOUBLE
Key design goal
Choose the tightest numeric type that safely represents the data.
Recommendation
For deterministic typing across files:
-
Use Enforce Schema
-
Avoid relying on inference for production workflows