How do I split my data into unique and duplicate records in Sparkflows?

Tarika · December 17, 2025, 10:52am

In Sparkflows, there is a Find Duplicate node that can perform the exact operation you’re looking for. You can specify the column(s) based on which you want to determine uniqueness. Simply add this node to your input dataset and specify the column(s) you want to be unique. As a result, you will get two dataframes: the higher edge will contain all the unique data, while the lower edge will contain the duplicate records that were found.

Topic	Replies	Views
How to split the input data into two outputs depending on the condition? Data Preparation	1	December 18, 2025
I want to split the dataset into individual columns and load them into a database. How can I achieve this in Sparkflows? Data Preparation	2	December 10, 2025
How to split the string value in a column into multiple columns in Sparkflows? Data Preparation	1	December 18, 2025
How Feature Engineering, Data Profiling and Data Cleansing can be performed in Sparkflows? FAQs data-preparation	2	December 22, 2025
I want to use only a smaller set of data for my analysis. How to achieve this in Sparkflows? Data Preparation	1	December 18, 2025

How do I split my data into unique and duplicate records in Sparkflows?

Related topics