The platform offers a range of supervised and unsupervised machine learning algorithms, each optimized for different data types, business goals, and modeling constraints. The comparison below helps you understand when to use each algorithm, how it works at a high level, and which parameters matter most when tuning models in practice.
→ Supervised Learning: Boosting & Forests
| Algorithm | When to use it | Internal Logic (The “Why”) | Top Tuning Parameters |
|---|---|---|---|
| XGBoost | Tabular data where accuracy is the #1 priority. | Builds trees sequentially using Gradient Descent. Corrects residuals of previous trees. | learnRate (eta): Lower = better convergence. maxDepth: Keep low (3-6) to avoid overfitting. |
| GBM | When you need specific distributions (e.g., Tweedie for insurance). | Sequential building but uses Leaf-Wise growth (XGBoost is Level-Wise). | learnRateAnnealing: Automatically reduces step size as you approach the optimum. |
| DRF | A “robust” baseline or when interpretability via simplicity is needed. | Bagging (Parallel trees). Reduces variance by averaging independent deep trees. | mtries: Crucial knob. Higher = stronger trees; Lower = more diverse forest. |
| GLM | Regulated industries or when you need a fast, linear baseline. | Fits a hyperplane using a Link Function. Uses Elastic Net (L1/L2) regularization. | lambdaSearch: Automates the search for the best penalty strength. standardize: Essential for math. |
| Deep Learning | High-dimensional data with complex, non-linear interactions. | Multi-layer feed-forward network using Stochastic Gradient Descent & backprop. | hidden: Layers/neurons. activation: Use RectifierWithDropout for built-in regularization. |
→ Unsupervised Learning: Anomalies & Structure
| Algorithm | When to use it | Internal Logic (The “Why”) | Top Tuning Parameters |
|---|---|---|---|
| Isolation Forest | Rare event detection (Fraud, Network Intrusions). | Isolates anomalies by random partitioning. Anomalies isolate with fewer splits. | sampleSize: Keep at 256. Large samples mask anomalies (swamping/masking). |
| K-Means | Customer segmentation or data grouping. | Iteratively minimizes the Within-Cluster Sum of Squares (WCSS) around centroids. | standardize: Must be ON. Distance math fails if features are on different scales. |
| GLRM | Missing value imputation or mixed-type dimensionality reduction. | Matrix factorization. PCA for heterogeneous data. | imputeOriginal: Use this to fill gaps in raw data with high-quality estimates. |
| PCA | Compressing numeric features for faster downstream modeling. | Projects data onto orthogonal axes that maximize Variance Explained. | k: Number of components. Use enough to capture 90%+ of total variance. |
There is no single “best” algorithm—the right choice depends on your data, accuracy requirements, interpretability needs, and operational constraints. Start with simpler or more robust models as baselines, then move to more complex algorithms as needed, focusing first on the key tuning parameters highlighted above for the biggest performance gains.