I have an employee department dataset containing salary information. I want to identify the minimum and maximum salary for each department and location. How to achieve this in Sparkflows?

In Sparkflows, we can use the Multi Windows Analytics processor to compute minimum and maximum values. First it would create a partition by department and location. Then it would compute minimum and maximum salary values within each partition.

To use the ‘Multi Windows Analytics’ Processor, do the following:

  • Select column of interest in ‘Analytics Column’. In this case it would be Salary
    column.

  • Select first_value or last_value in ‘Windows Function’ for minimum
    and maximum respectively.

  • Enter the columns used for partitioning the dataset in ‘PartitionBy’. In this case it
    would be ‘department’, ’location’.

  • Enter the columns used for sorting the dataset in ‘Order By’. In this case it would be
    ‘department’, ’location’.

  • Enter output column to list the output in the outgoing DataFrame.

For more information read the Sparkflows Documentation here: