What users observe
When enabled, fileName and sheetName columns always appear after all data columns, never in the middle.
Why Sparkflows does this
Sparkflows treats metadata columns as non-data attributes.
By design:
-
Data columns are inferred or enforced first
-
Metadata columns are appended last
-
Core schema remains stable and predictable
This guarantees:
-
Column order consistency
-
No accidental interference with joins, ML models, or aggregations
-
Deterministic schemas across runs
Why this matters
If metadata were inserted dynamically:
-
Schema would shift when toggled
-
Downstream nodes would break
-
Pipelines would become fragile
Recommendation
Treat fileName and sheetName as traceability fields, not business columns.