What users observe
Pipelines slow down when recursive directory search is enabled.
Why this happens
Recursive search:
-
Traverses every subdirectory
-
Evaluates every file
-
Performs filesystem metadata calls
On distributed storage (S3, HDFS, ADLS):
-
These operations are expensive
-
Latency accumulates quickly
Best practices
-
Avoid recursion unless required
-
Use controlled folder structures
-
Prefer explicit paths or glob patterns
Design principle
Filesystem traversal cost scales with number of files, not file size.