In Sparkflows, we can use the Sample processor to extract a sample of incoming datasets. The number of rows in the sample would be a percentage of the incoming dataset. Sample can be used for ML Training or Analysis purposes.
To use the ‘Sample’ Processor, do the following:
-
Select True/False for ‘With Replacement’. Setting it to True would result in selected data being added back to the population so that they can be picked again. Setting to False removes a record from the selection once selected.
-
Enter a value for ‘Fraction’. It should be less than 1. It would determine the size of the sample created.
-
Set a ‘Seed’ value. It would help to reproduce the selected sample.
For more information, read the Sparkflows Documentation here: