How Feature Engineering, Data Profiling and Data Cleansing can be performed in Sparkflows?

Tarika · December 22, 2025, 2:09pm

Following nodes in Sparkflows can help to perform Data Profiling:

Correlation - It displays relation between dependent and independent features. Relation between features is plotted in Heatmap Graph.
Summary - It calculates and prints spreads of feature such as Count, Mean, Min, Max and so on.
Using various ML Model nodes we can also get an insight into the importance of each feature.
Flag Outlier - It flags outliers in the dataset.

Following nodes in Sparkflows can help to perform Data Cleansing:

Imputing - There are various imputing nodes available to handle missing values. Using these nodes missing values can be replaced with either a Constant or Mean/Median/Mode value.
Dedup - To resolve duplicate entity data.
Drop Duplicate Rows - Handles duplicate rows.
Null Value handling - There are various nodes to handles null values in the dataset.
Find And Replace - There are various nodes to handle unwanted characters, replacing a string pattern with others and so on.

Following nodes in Sparkflows can help to perform Feature Engineering:

String Indexer - It encodes String categorical data to numeric values.
Min Max Scaler And Standard Scaler - They scale incoming data by reducing variance.
Feature Extraction nodes
Feature Transformation nodes
Feature Selection nodes
Splitting Dataset nodes

Topic	Replies	Views
How can I explore data with the help of Sparkflows? Data Preparation	1	December 18, 2025
Currently I am performing Data Analysis task using Excel; how easy would it be to migrate to Sparkflows FAQs	2	December 22, 2025
How to Normalize data using Sparkflows? Data Preparation	2	December 16, 2025
How to Train ML Model in Sparkflows? Machine Learning	2	December 30, 2025
Feature Transformation Machine Learning	5	December 29, 2025