Clustering Real Estate Listings

The workflow reads in a dataset containing houses listed for sale and uses K-Means Clustering from Apache Spark ML to group listings.

Below is the workflow for creating a K-Means model for clustering the houses. It does the following:

  • Reads data from a sample dataset.
  • Prints the result.
  • Assembles the features for prediction.
  • Splits it.
  • Perform K-Means Clustering.
  • Prediction.
  • Print the prediction result.