Skip to main content

Train a Time Series Model

Time series prediction forecasts future values based on a target and a temporal column.

Prediction Types

Univariate

A univariate time series analysis examines one or more measure variables over time, without additional covariates or feature variables influencing it. A univariate time series prediction requires:

  • At least one measure column that contains target values to predict.
  • The number of time intervals to predict.
  • One temporal column that contains time interval variables, of date/time type, with one measure variable per time interval.

Multivariate

A multivariate time series analysis examines one or more measure variables, incorporating covariates or feature variables that may influence the target measures. A multivariate time series prediction requires:

  • At least one measure column that contains target values to predict. More than one column generates multiple time series.
  • The number of time intervals to predict.
  • One temporal column that contains time interval variables, of date/time type.
  • At least one column that could influence the measure column.

Multiple Time Series

You can analyze multiple time series by selecting more than one measure variable or specifying a grouping variable. DataChat runs a separate time series analysis, either univariate or multivariate, for each unique combination of selected variables. All time series must share the same time axis.

Train a Time Series Model

To Train Time Series, select Machine Learning > Train Time Series in the skill menu. You can use this form to perform both univariate and multivariate predictions.

note

If you're connected to a BigQuery database, you can leverage BigQuery ML within DataChat.

Feature Selection

At a minimum, complete the required fields in the Feature Selection section.

ML Train Time Series form

  1. Select at least one column that contains measure variables. This is the value you want to predict.
  2. Enter the number of values to predict. This is how many steps into the future you want to predict.
  3. Select the column that contains your temporal variable.
  4. Optionally, select a column that contains a variable that groups your data for better predictions.
  5. Optionally, select feature columns to use in your prediction. If any feature columns are selected, the prediction becomes a multivariate.
  6. Optionally, choose whether to include a feature importance plot as part of the prediction. Note that this option is only available for multivariate analysis without BigQuery ML enabled.
  7. If you're ready, click Submit to run the prediction. Otherwise, refer to the Advanced Options or Model Selection sections for more ways to fine-tune your prediction.

Advanced Options

Optionally, you can change some more advanced settings in the Advanced Options section.

The advanced options form

  1. Choose the filling method. This is how missing values in the measure or feature columns are handled. The options are:
    • Linear
    • Polynomial
    • Quadratic
    • Spline
  2. Choose the aggregation method. This is how duplicate values for a given temporal value are handled. By default, duplicate values are averaged. The options are:
    • Average
    • Maximum
    • Median
    • Minimum
    • Total
  3. Choose the validation method. This is how the system validates the model. The options are:
    • Cross-validation
    • Holdout. If this option is selected, you also need to specify the percentage of the dataset that should be held out for validation purposes. By default, 10 percent of the dataset is held out.
  4. Choose the selection criterion. This is the scoring criterion that is used to choose the best model. By default, the SMAPE criterion is used. The options are:
    • SMAPE
    • MAE
    • Mean Squared Error
    • Root Mean Squared Error
    • r2
  5. Choose whether to use recent relevant data. When selected, the system uses only the 1,000 most recent data points to make a prediction. This can be useful when making long term predictions. This option is enabled by default.
  6. Choose whether to use smart data interpretation. When selected, the system interprets string type temporal columns that use the YYYY-MM or YYYY-Q format as datetime columns. This option is enabled by default.

Model Selection

Optionally, select the method to use to predict the values. If no method is chosen, either ARIMA, MLForecast, Prophet, or Theta will be chosen.

  • ARIMA. Uses a statistical model.
  • MLForecast. Uses a machine learning model (instead of a statistical model) to optimize your time component for predictions. Note that this method works only for univariate predictions.
  • Prophet. Uses a statistical model.
  • Theta. Uses a statistical model and is best for short-term predictions.

Each method's hyperparameters can be tuned to your needs.

parameter options for the ARIMA model

  1. Choose whether to use Auto ARIMA, which automatically assigns values for each hyperparameter. If you choose to turn this off, you can then specify your own hyperparameter values.
  2. Set a value for the "p" hyperparameter. This parameter determines the non-seasonal autoregression order.
  3. Set a value for the "d" hyperparameter. This parameter determines the non-seasonal degree of differencing.
  4. Set a value for the "q" hyperparameter. This parameter determines the non-seasonal moving average order.
  5. Set a value for the "P" hyperparameter. This parameter determines the seasonal autoregression order.
  6. Set a value of the "D" hyperparameter. This parameter determines the seasonal degree of differencing.
  7. Set a value for the "Q" hyperparameter. This parameter determines the seasonal moving average order.

BigQuery ML

note

For information on BigQuery ML permissions, refer to Database Types.

If your dataset came from a BigQuery connection, you can optionally toggle the Enable BigQuery ML option under Feature Selection, which is enabled by default for BigQuery datasets. When enabled, DataChat leverages BigQuery ML to train time series models.

BQML toggle

By default, DataChat explores BigQuery ARIMA Plus models. Optionally, you can specify the hyperparameter types and values under Model Selection:

BQML ARIMA

  1. Choose whether to use Auto ARIMA, which automatically assigns values for each hyperparameter. If you choose to turn this off, you can then specify your own hyperparameter values.
  2. Set a value for the "p" hyperparameter. This parameter determines the non-seasonal autoregression order.
  3. Set a value for the "d" hyperparameter. This parameter determines the non-seasonal degree of differencing.
  4. Set a value for the "q" hyperparameter. This parameter determines the non-seasonal moving average order.
  5. Choose whether to clean spikes and dips automatically. By default, this parameter is set to "true".
  6. Choose whether to adjust step changes automatically. By default, this parameter is set to "true".

Univariate Prediction Outputs

If only one measure variable is specified, the univariate analysis generates a new dataset called PredictedTimeSeries_<measure variable>. The new dataset is used to generate a chart, which displays in the Chart tab.

DataChat also provides a number of important results that can be found in the tabbed output:

Train Time Series univariate

Prediction

The chart shows:

  • Data points for the specified measure variable.
  • The confidence interval for time series prediction.
  • Predicted values for the specified number of time intervals.

Pipeline Report

The pipeline report shows detailed information about each step of the model training process (known as a “pipeline”). From this report, you can see important information about date interpretation, missing value interpolation, model training, and model selection.

pipeline report

Model Stats

The Model Stats section of univariate Train Time Series output contains two sections:

  • Scores
  • Model Introspection

Scores

This section displays a table of the model scores to provide context about the success of the prediction model, including:

scores

Model Introspection

This section includes a preview for the Model Introspection table. This table provides detailed information about candidate models and their parameters:

Model introspection table

Multivariate Prediction Outputs

If more than one measure variable is specified, the multivariate analysis generates a new dataset called PredictedTimeSeries. The new dataset is used to generate a chart, which displays in the Chart tab.

note

For charts created with PredictedTimeSeries datasets containing more than 5,000 rows:

  • Non-partitioned datasets retain only the most recent 5,000 rows.
  • For partitioned datasets, we dynamically allocate historical rows to each partition based on availability, ensuring all forecast rows are fully included and the 5,000-row limit is utilized optimally.

DataChat also provides a number of important results that can be found in the tabbed output:

ML Predict Time Series multivariate

Prediction

The chart shows:

  • Data points for the specified measure variable.
  • Predicted values for the specified number of time intervals.

Model Stats

The Model Stats section of multivariate Train Time Series output contains two sections:

  • Scores
  • Model Introspection

Scores

This section displays a table of the model scores to provide context about the success of the prediction model, including:

scores

Model Introspection

This section includes a preview for the Model Introspection table. This table provides detailed information about candidate models and their parameters:

Model introspection table

Feature Importance

Training time series on a multivariate prediction produces an impact chart, which illustrates the average impact each input feature has on the target feature. The features are sorted from most- to least-impactful. In the example below, we can see that the “Lag_11_allRiders” feature has the most impact on the “allRidersPercentage”:

feature importance

Pipeline Report

The pipeline report shows detailed information about each step of the model training process (known as a “pipeline”). From this report, you can see important information about date interpretation, missing value interpolation, model training, and model selection. If enabled, the pipeline report will also explain the model stage, impact scores for each feature column, and lagged features.

pipeline report

Group Temporal Predictions

If DataChat detects that the column containing your temporal variable has at least one repeated value, meaning that at least two rows contain different measure variables for the same temporal variable, DataChat will prompt you with options to aggregate the measure variables. You can choose by clicking one of the following options, displayed as links:

  • Average
  • Maximum
  • Median
  • Minimum
  • Total

For each duplicated temporal variable, the measure variables are aggregated as selected in a new dataset called <dataset name>\_Compute>. After you've addressed the repetitions, try your prediction again.