Time Series Prediction
Time series prediction generates predictions on your dataset given a target and a temporal column.
- Predict Time Series. Predict using a form or the chat box.
- Univariate Analysis. Prediction with one measure variable.
- Multivariate Analysis. Prediction with one measure variable and one or more variables that affect the measure variable.
- Multiple Time Series. Predictions with multiple measure variables or with one or more grouping variables.
- Group Temporal Repetitions. Adjust datasets with multiple values per temporal variable.
Predict Time Series
To use the Predict form, select ML > Predict Time Series in the sidebar. The Predict form appears.
- Select at least one column that contains measure variables.
- Enter the number of values to predict.
- Select the column that contains your temporal variable.
- Optionally, select the method to use to predict the values. If no method is chosen, either ARIMA, Prophet, or Theta will be chosen. See here for more information on each method. The options are:
- ARIMA. Uses a statistical model.
- MLForecast. Uses a machine learning model (instead of a statistical model) to optimize your time component for predictions. Note that this method works only for univariate predictions.
- Prophet. Uses a statistical model.
- Theta. Uses a statistical model and is best for short-term predictions.
- Optionally, select a column that contains a variable that groups your data for better predictions.
- Click Submit.
DataChat applies the selected forecasting method (or automatically selects one) to generate the specified number of predicted values for the measure column. A new dataset is created that includes the predicted values. If only one measure variable is specified, the univariate analysis generates a new dataset: "PredictedTimeSeries_<measure variable>". If more than one measure variable is specified—multiple time series, which can be either univariate or multivariate— the new dataset is named "PredictedTimeSeries".
The current dataset is set to the new, generated dataset. To run a different analysis on the initial dataset, set the current dataset to the initial dataset.
The new dataset is used to generate a visualization, which displays as a popup by default. After you close the popup, the visualization appears in the chart panel.
To build a DataChat sentence in the chat box, see Predict Time Series
.
Understand the Available Methods
When predicting time series values, there are four methods available:
- ARIMA
- MLForecast
- Prophet
- Theta
Each method has its own advantages and disadvantages. If you don't manually select a method when using Predict
, DataChat looks at your data and decides which method would work best for the prediction. In this section, we'll cover each method in more detail.
ARIMA
The autoregressive integrated moving average (ARIMA) method uses a statistical analysis model with time series data to either better understand the data set or to predict future trends. This method works best with data that has short intervals used for short-term predictions.
Strengths:
- Short-term forecasting
- Needs only historical data
Weaknesses:
- Long-term forecasting
- Predicting turning points
MLForecast
The MLForecast method uses the machine learning regression models instead of statistical models for time series forecasting. It works best with large datasets with well-engineered features.
Strengths:
- Fast and accurate
- Flexible and tunable for accuracy
Weaknesses:
- No confidence intervals
- Performance depends on the size of the dataset and how well the dataset's features were engineered.
Prophet
The Prophet method is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality along with holiday effects. It works best with datasets that have strong seasonal effects and several seasons of historical data.
Strengths:
- Supports forecasting within a range.
- Automatically finds seasonal trends.
- Fast and accurate.
Weaknesses:
- Does not work well outside of seasonal predictions.
- Inputs must be date or datetime values.
Theta
The Theta method is a simple method that uses statistical models to smooth out the data and create predictions.
Strengths:
- Short-term forecasting
- Works with seasonal and stationary data.
Weaknesses:
- Long-term forecasting
- Not as flexible as other methods.
Univariate Prediction
A univariate analysis involves a single measure variable for each time variable. A univariate time series prediction requires:
- At least one measure column that contains target values to predict.
- The number of time intervals to predict.
- One temporal column that contains time interval variables, of date/time type, with one measure variable per time interval.
If DataChat detects repetitions in the temporal columns, you are prompted to aggregate repetitions in the measure variable.
You can use the Predict form to perform univariate predictions.
Upon success:
The chat box includes a dropdown of the model scores to provide context about the success of the prediction model:
- SMAPE.
- Mean absolute error.
The chat box includes an additional dropdown for model introspection. This includes a link to preview the Model Introspection table. This table provides detailed information about candidate models and their parameters:
The chart's legend shows:
- Data points for the specified measure variable.
- Interpolated data points for the specified measure variable.
- Predicted values for the specified number of time intervals.
- Data points for the percentage of the dataset that was used as a testing holdout, if specified. Otherwise, validation points are displayed to show model performance on the validation set (the last 10%).
- The confidence interval for time series prediction.
If you specify multiple time series, DataChat includes links to other charts.
Multivariate Prediction
A multivariate analysis involves at least two variables for each time variable. A multivariate time series prediction requires:
- At least one measure column that contains target values to predict. More than one column generates multiple time series.
- The number of time intervals to predict.
- One temporal column that contains time interval variables, of date/time type.
- At least one column that could influence the measure column.
If DataChat detects multiple measure variables per time interval, you are prompted to group temporal repetitions.
You can use the Predict form to perform multivariate predictions.
Upon success:
The chat box includes a dropdown of the model scores to provide context about the success of the prediction model:
- SMAPE.
- Mean absolute error.
The chat box includes an additional dropdown for model introspection. This includes a link to preview the Model Introspection table. This table provides detailed information about candidate models and their parameters:
Click here to run a univariate prediction. DataChat provides a link to run a univariate analysis – without the selected columns that could influence the measure column, The link appears with a successful multivariate analysis as well as if the specified variables are unsuitable for multivariate analysis – for example because there isn't enough data.
The chart displays the multivariate time series prediction. The chart's legend displays:
- Blue circles. The data points for the specified measure variable.
- Teal stars. The predicted values for the specified number of time intervals.
- Orange triangles. The validation values.
If you specify multiple time series, DataChat includes links to other charts.
Multiple Time Series
You can run multiple time series analyses if you either select more than one measure variable or specify a grouping variable. DataChat runs a separate time series analysis—either univariate or multivariate—for each variable. Every variable must share the same time axis.
You can use the Predict form to perform both univariate and multivariate predictions.
Examples:
- Generate two univariate time series predictions with two measure variables:
Predict time series with measure columns Bike_North, Ped_North for the next 6 values of Time
- Generate multiple univariate time series predictions with a grouping variable:
Predict time series with measure columns Total_Counts for the next 6 values of Time for each Bike_North
- Generate multiple multivariate time series predictions with one grouping variable and another variable that might affect the measure variable:
Predict time series with measure columns Total_Counts for the next 7 values of Time for each Bike_North using the columns Ped_North
Group Temporal Predictions
If DataChat detects that the column containing your temporal variable has at least one repeated value, meaning that at least two rows contain different measure variables for the same temporal variable, DataChat will prompt you with options to aggregate the measure variables. You can choose by clicking one of the following options, displayed as links:
- Average
- Maximum
- Median
- Minimum
- Total
For each duplicated temporal variable, the measure variables are aggregated as selected in a new dataset called <dataset name>\_Compute>
. After you've addressed the repetitions, try your prediction again.