I have an employee department dataset containing salary information. I want to get a salary based ranking within each department and location. How to achieve this in Sparkflows?

In Sparkflows, we can use the Multi Windows Ranking processor to get ranking within a partition. First it would create a partition by department and location. Then it would rank based on salary.

To use the ‘Multi Windows Ranking’ Processor, do the following:

  • Select Rank in ‘Windows Function’. It would output a rank value.

  • Enter the columns used for partitioning the dataset in ‘PartitionBy’. In this case it would be ‘department’, ’location’.

  • Enter the columns used for sorting the dataset in ‘Order By’. In this case it would be ‘Salary’.

  • Enter Output column to list the output in the outgoing DataFrame. It would contain rank value within a partition.

For more information, read the Sparkflows Documentation here: