Why doesn’t Select Records preserve the original Excel row order?

It actually does — but not the way you might expect.

Spark DataFrames have no guaranteed row order.
To make row selection deterministic, the node assigns an internal row number using:

  • monotonically_increasing_id()

  • followed by row_number()

This creates a stable processing order, not a semantic one.

Key insight:
Row position here means execution order, not “Excel row number”.

Practical tip
If row order is business-critical, add an explicit sort column node before Select Records.