Skip to main content
Version: 0.35.7

Train

Train ML Model lets you train machine learning models by choosing a target column, then visualizing the impact of all other columns in the dataset on the target column. It trains a set of machine learning models and chooses the best model among the set, which you can then use to carry out further analysis and predictions.

Train Time Series also lets you train time series models by choosing a target column, then selecting the number of time intervals to predict based on a temporal column.

By default, Train ML Model optimizes the data in your target dataset by automatically:

  • Grouping continuous, numeric values into optimally-sized bins for tree based models.
  • Oversampling or auto-weighting to help ensure your data is balanced.
  • Pruning columns whose data type isn't supported, columns that have one unique value, temporal columns when temporal slicing is disabled, and columns that have more than 95% null values.

Format

Train uses two distinct format:

Training an ML Model

Train an ML model on <target column> (and generate charts for data visualization) (using <feature columns> | excluding <feature columns>)

Training a Time Series Model

Train time series with measure columns <measure columns> for the next <steps> values of <temporal column> (for each <partition columns>) (using <feature columns>)

Parameters

Train uses the following parameters:

  • target column (required). The column on which you need to run an analysis and receive insights on.
  • measure columns (required). A list of columns whose values you want to predict over a period of time.
  • steps (required). The number of steps into the future you want to predict.
  • temporal column (required). The time-based element of your prediction, such as a date or time column.
  • partition column (optional). A list of columns you want to group your predictions by.
  • feature columns (optional). The columns to include or exclude as the features used to train the model.

Output

Training an ML Model

If the model is successfully trained, a tabbed output appears with tabs to view the impact chart, model types, model scores, confusion matrix, and pipeline report. The log shows a success message which includes details and statistics about the model. Otherwise, the log shows a failure message.

The model is saved as BestFit1. Creating multiple models saves as the next sequential number, such as BestFit2. You can enter List all the models to view the current session's models that can be used with Predict.

Training a Time Series Model

If the time series model is successfully trained, a tabbed output appears with tabs to view the prediction chart, model stats, scores, model introspection, and pipeline report. The log shows a success message which includes details and statistics about the model. Otherwise, the log shows a failure message.

Examples

Training an ML Model

Consider a dataset containing information about the passengers aboard the Titanic. It has the following columns:

  • PassengerID.
  • Survived.
  • Pclass.
  • Name.
  • Gender.
  • Age.

To train an ML model on this dataset to predict whether a passenger would survive the disaster, enter Train an ML model on Survived excluding PassengerID, Ticket, Cabin

Training a Time Series Model

Consider a dataset containing information about bike share data. It has some of the following columns:

  • date.
  • casualRiders.
  • registeredRiders.
  • allRiders.
  • holiday.
  • hour.
  • workingDay.

To train a time series model on this dataset to predict the average number of riders to expect over the next week, enter Train a time series with measure columns allRiders for the next 7 values of date

Going Deeper

Depending on the type of model, after you run Train ML Model you can also view plots:

Regression Models:

  • View the residual statistics by clicking Residual Plot from the tabbed output.

Classification Models:

  • View the confusion matrix by clicking Confusion Matrix from the tabbed output.