Here, we will cover how to do some of the most common data exploration tasks using Sparkflows.
The below workflow:
- Reads the Housing dataset
- Calculates summary statistics for important variables
- Creates a histogram to show the distribution of the Sale Price variable
- Creates a graph to show the relationship between Sale Price and Basement Square Footage
- Creates a matrix to show the correlation between important variables
- Flags outliers in Ground Living Area and graphs the results
The tutorial for this workflow is available here: Data Exploration of Housing Data — Sparkflows 3.0 documentation