Version: 0.32.2

# Create, Explore, and Predict Moving Averages with Time Data

In this example, we'll explore trends in bike share data to learn how to create moving averages and predict how the average might change in the future.

To start, `Load` the "BikeShare.xlsx Dataset" into your session. We're given a sample of our data that looks something like this:

### Create Exploratory Plots​

Let's try to plot some of this data and see if we can see any trends:

1. Click Plot in the sidebar to open the Chart Builder.

2. Select "Bar" for the Type.

3. Under Required Fields, enter "date" for the X-Axis and "allRiders" for the Y-Axis.

4. Under Optional Fields, enter "weatherSituation" for the slider.

5. Click Submit.

We can see some broad trends over time, but the data is noisy and we can see that there are some outliers that can skew our trend lines.

### Create a Moving Average​

With data like this, a moving average helps smooth out our data so it's not so affected by outliers and errors. Let's now calculate a seven day moving average of daily ridership (indicated by the value of the allRiders column). We can do by using the Create form to create a new window column that takes a specified range of rows (which represent a date in our case) and computes the average number of new riders over that time frame:

1. Click Column > Create > Using a Window Query in the sidebar.

2. Enter "7dayMovingAverage" for the name.

3. Enter "Create a new window column 7dayMovingAverage as average allRiders for each weatherSituation sorted by the columns date computed over the row range that looks back from the current position by 7 and looks forward by 0

tip

It's best practice to keep your time variable (such as dates) on the x-axis when you're exploring trends over time.

Before we continue, let's break down this step a little bit:

• `Create a new window column 7DayMovingAverage` tells DataChat what type of column it should be creating and what it should be called.

• `as average allRiders` tells DataChat what the primary calculation should be. In this case, we want to compute the average number of riders.

• `for each weatherSituation` tells DataChat to compute the average number of "allRiders" for each value of the "weatherSituation" column.

• `sorted by the columns called date` tells DataChat how to sort the final results. In our case, we want to sort the resulting dataset by the date column.

• `computed over the row range that looks back from the current position by 7 and looks forward by 0` is the other key piece of this step. This section defines the time frame, or window, DataChat should use when it calculates the average number of "allRiders" for each "weatherSituation". In this case, we want to use the number of "allRiders" from the last seven days (each of which is a row) and include the current day in the calculation.

Now that we understand how we created our moving average, let's see how it helps our plots. Use the step below to create another bar chart. This time, we'll use our moving average instead of the raw "allRiders" count:

1. Click Plot in the sidebar.

2. Select "Bar" for Type

3. Under Required Fields, enter "date" for the X-Axis and "7DayMovingAverage" for the Y-Axis.

4. Under Optional Fields, enter "weatherSituation" for the slider.

5. Under Sample, select "All Rows".

6. Click Submit.

Notice how much more smooth and pronounced the trend is with the moving average. Also, notice that while we still have some outliers, they aren't nearly as drastic. With the moving average, we can clearly see "allRiders" dropping quickly and staying low over as weatherSituation values increase.

### Predict Time Series Data​

Looking at this dataset, we might ask, "What does the future look like?" In DataChat, we can quickly answer that question with the `Train` skill. In this case, let's focus on how the moving average will change over the next seven days:

1. Click ML > Train Time Series in the sidebar.

2. Enter "7DayMovingAverage" for the Measure Column.

3. Enter "7" for the Number of Values to Predict.

4. Enter "allRiders" for the Temporal Column.

5. Enter "weatherSituation" for the Partition Column.

6. Click Submit.

The orange triangles represent the predicted value for each day of the next week. The gray area represents the confidence interval, which means that DataChat is 80% confident that the real value on that day will fall within that shaded area.

### Full Recipe​

You can copy and paste the following steps into your session to recreate this recipe:

``Load data from the file <strong>BikeShare.xlsx</strong>Plot a chart with the specification {...}Create a new window column  7dayMovingAverage as average allRiders for each weatherSituation sorted by the columns date computed over the row range that looks back from the current position by 7 and looks forward by 0Plot a chart with the specification <strong>{"caption":"","typeVersion":2,"plot":{"series":[{"type":"bar","mark":{"x":"date","y":"7dayMovingAverage"},"group":{"slider":"weatherSituation"}}],"presentation":{"title":{"textStyle":{"fontSize":18},"text":["Average 7dayMovingAverage vs. date"]},"xaxis":{"nameTextStyle":{"fontSize":18},"axisLabel":{"fontSize":12},"text":"date"},"yaxis":{"nameTextStyle":{"fontSize":18},"axisLabel":{"fontSize":12},"text":"Average 7dayMovingAverage"},"annotations":[],"colorOverride":[]}},"values":{"complete":false,"metadata":{},"columns":[{"name":"7dayMovingAverage","type":"Float","unique":3734},{"name":"allRiders","type":"Integer","unique":869},{"name":"casualRiders","type":"Integer","unique":322},{"name":"date","type":"Date","unique":731},{"name":"holiday","type":"Integer","unique":2},{"name":"hour","type":"Integer","unique":24},{"name":"normalizedWindSpeed","type":"Float","unique":30},{"name":"registeredRiders","type":"Integer","unique":776},{"name":"relativeHumidity","type":"Float","unique":89},{"name":"seasonCode","type":"Integer","unique":4},{"name":"temperatureRelativeTo41C","type":"Float","unique":50},{"name":"weatherSituation","type":"Integer","unique":4},{"name":"weekDay","type":"Integer","unique":7},{"name":"workingDay","type":"Integer","unique":2}],"aggregate":[{"expression":"Average","columns":["7dayMovingAverage"],"groupBy":["date","weatherSituation"]}],"transforms":[],"bins":[],"reference":{"type":"dataset","params":{"session_id":"birkb8x7yyoglvg2","dataset":"BikeShare","version":2}}}}</strong>Train time series with measure columns 7dayMovingAverage for the next 7 values of allRiders for each weatherSituation with the config {...}``