Here, we will cover how to build a Random Forest Model to predict customer churn in the telecommunications market.
The workflow:
- Reads in the dataset from a tab separated file.
- Applies StringIndexer on the field “intl_plan”.
- Applies VectorAssembler on the fields we want to model on.
- Splits the dataset into (.8, .2).
- Performs Random Forest Classification.
- Performs prediction using the model generated on the remaining 20% dataset.
- Finally evaluates the prediction result.
The full tutorial is available here: Telco Churn Prediction — Sparkflows 3.0 documentation