I want to use only a smaller set of data for my analysis. How to achieve this in Sparkflows?

In Sparkflows, we can use the Sample processor to extract a sample of incoming datasets. The number of rows in the sample would be a percentage of the incoming dataset. Sample can be used for ML Training or Analysis purposes.

To use the ‘Sample’ Processor, do the following:

  • Select True/False for ‘With Replacement’. Setting it to True would result in selected data being added back to the population so that they can be picked again. Setting to False removes a record from the selection once selected.

  • Enter a value for ‘Fraction’. It should be less than 1. It would determine the size of the sample created.

  • Set a ‘Seed’ value. It would help to reproduce the selected sample.

For more information, read the Sparkflows Documentation here: