When numeric parsing fails, values should revert to STRING.
What Sparkflows actually does
Sparkflows never downgrades a numeric column to STRING once a numeric type is chosen.
Instead:
-
Valid values are parsed
-
Invalid values become
null -
Column type is preserved
This applies to:
-
INTEGER
-
LONG
-
DOUBLE
-
TIMESTAMP
Why this is intentional
Falling back to STRING:
-
Breaks aggregations
-
Corrupts numeric semantics
-
Creates silent data quality issues
Sparkflows enforces schema correctness over permissiveness.
Result
Cleaner data, predictable pipelines, safer analytics.