Data Exploration of Housing Data

Here, we will cover how to do some of the most common data exploration tasks using Sparkflows.

The below workflow:

  • Reads the Housing dataset
  • Calculates summary statistics for important variables
  • Creates a histogram to show the distribution of the Sale Price variable
  • Creates a graph to show the relationship between Sale Price and Basement Square Footage
  • Creates a matrix to show the correlation between important variables
  • Flags outliers in Ground Living Area and graphs the results

The tutorial for this workflow is available here: Data Exploration of Housing Data — Sparkflows 3.0 documentation