Column names are:
-
Renamed
-
Suffixed (
_1,_2) -
Auto-generated (
F1,F2)
Why this happens
Sparkflows guarantees:
-
Valid column identifiers
-
Unique column names
-
Schema-safe output
It automatically:
-
Pads missing headers
-
Replaces empty headers with
F# -
Deduplicates duplicate names
-
Optionally cleans special characters
Example:
Amount, Amount, <blank>
Becomes:
Amount, Amount_1, F3
Why this is required
Spark and SQL require:
-
Unique column names
-
Non-empty identifiers
Recommendation
Clean headers in Excel if exact names matter,
or disable column name cleaning when appropriate.