Skip to main content
Version: 0.32.2

Summarize Large Datasets with Pivot and Reshape

Summarizing large datasets can help you to better understand, identify, and compare specific aspects of your data. In this example, we'll demonstrate how to create pivot tables and reshape large datasets to compare registered riders from a bike share dataset.

Load Our Data

To start, Load the "BikeShare.xlsx Dataset" into your session. We're given a sample of our data that looks something like this:

Bikeshare Data

Extract Our Data

Let's prepare our data by using the Extract skill to find the quarter of the year for each row. Later, we'll use this information to dig into the data further. We can:

  1. Click Column > Create > Extract to open the Extract form.
  2. Enter "date" for the column, "week" for the date parts to extract, and "Week" for the name of the extracted column.
  3. Click Submit.

Extract form

This gives us the following table. As we can see, the dates of recorded registered riders are now listed with their corresponding week.

extracted weeks table

Create a Pivot Table

Currently, our dataset has over 17,000 rows. Let's create a pivot table to make this dataset more manageable. Let's say that we'd like to show the total number of registered riders for each week and weather situation. We can:

  1. Click Dataset > Pivot to open the Pivot form.
  2. Select "total" for the Aggregate and "registeredRiders" for the Column.
  3. Select "Week" for the Pivot Row and "weatherSituation" for the Pivot Column.
  4. Click Submit.

pivot form

This step creates a new pivot table, "totalRegisteredRidersPivot", that summarizes the total number of registered riders for each week in each weather situation.

the pivot table

Reshape Our Data

We can then investigate these findings a bit more using the Reshape skill. This shows us a different way to look at our data, helping us to better visualize comparisons. To compare the registered riders data between weather situation 1 and 3:

  1. From "registeredRidersPivot" click More options > Use this dataset.

  2. Click Dataset > Reshape in the sidebar to open the Reshape form.

  3. Select Long Form.

  4. Enter "Week" for the Row Identifier.

  5. Enter "1" and "3" for the Values.

  6. Click Submit.

    reshape form

Our new table is called "longReshape" and looks something like this:

the reshape table

Reshape takes our previous table from wide form to long form, showing us a comparison between registered riders for weather situations 1 and 3.

From here, we can create a line chart to illustrate the change in registered riders for both weather situations over each week:

  1. Click Plot in the sidebar to open the Chart Builder.
  2. Select "Line Chart" for type.
  3. Under Required Fields, enter "Week" for the X-Axis and "Values" for the Y-Axis.
  4. Under Optional Fields, enter "Variables" for Group.
  5. Click Submit.

the bar chart

Using this chart, we are able to view a side by side comparison of total registered riders between weather situations 1 and 3. Registered riders is consistently higher under weather situation 1.

Full Recipe

Here is the entire recipe we used in this topic:

Load data from the file <strong>BikeShare.xlsx</strong>
Extract the week from the column date and call the output columns Week
Pivot on total registeredRiders creating rows from Week and columns from weatherSituation
Use the dataset called totalRegisteredRidersPivot version 1
Reshape the dataset current to long form, creating row IDs for each Week, and creating a Values column from 1, 3,
Drop the rows where <strong>Week</strong> contains <strong>TOTAL</strong>
Plot a chart with the specification <strong>{"caption":"","typeVersion":2,"plot":{"series":[{"type":"line","mark":{"x":"Week","y":"Values"},"group":{"subplot":"Variables"}}],"presentation":{"title":{"textStyle":{"fontSize":18},"text":["Average Values vs. Week"]},"xaxis":{"nameTextStyle":{"fontSize":18},"axisLabel":{"fontSize":12},"text":"Week"},"yaxis":{"nameTextStyle":{"fontSize":18},"axisLabel":{"fontSize":12},"text":"Average Values"},"annotations":[],"colorOverride":[]}},"values":{"complete":false,"metadata":{},"columns":[{"name":"Values","type":"Integer","unique":103},{"name":"Variables","type":"Integer","unique":2},{"name":"Week","type":"Integer","unique":52}],"aggregate":[{"expression":"Average","columns":["Values"],"groupBy":["Week","Variables"]}],"transforms":[],"bins":[],"reference":{"type":"dataset","params":{"session_id":"u3oip42f5csuwkuv","dataset":"longReshape","version":1}}}}</strong>