When we use DataChat, analysis is augmented with visualization and vice versa. Visualizations provide an easier way to understand your data and simplify how you share your insights with others.
We'll step through how to build and annotate a chart from datasets.
Open the BikeShare Dataset
"BikeShare.xlsx"—the downloaded file—contains the sheets "BikeShare", "SeasonDecode", and "WeatherDecode". When an Excel file loads into a new session, each sheet loads as a separate dataset. The "All Datasets" table displays all the datasets loaded into the session.
Once we close the "All Datasets" table, the "BikeShare" dataset displays in the dataset panel of Grid mode. By default, all new sessions open in Grid mode. You can change the default in Settings > Sessions.
Name the session – Double-click on "Unnamed" and enter "BikeSharePlots".
Before diving into visualization, explore the dataset. Use the scrollbar at the bottom of the dataset panel to see the full dataset. Adjust the divider between columns to expand the column name. Hover over the column name and click the menu. See Guided Learning in the Work with Your Data chapter for more details on how to explore your dataset.
The BikeShare dataset captures the activity of bike riders during the period Jan. 1, 2011 through Feb. 26, 2012. It also captures weather data and information about dates and times.
Click Plot in the sidebar to open the Chart Builder.
Explore Your Data with Chart Builder
The Chart Builder provides an interactive interface in which to refine our analysis by visualization. Let's ask a few questions of our data.
When do people ride their bikes?
Select "hour" from the X-Axis dropdown and "allRiders" from the Y-Axis dropdown.
By default, the ChartBuilder starts with the scatter chart option.
Hover over the options to see what other plot types are available.
In the chart display panel, hover over the lines for each hour to see the individual data points that make up each line.
We can see the lowest number of riders at 4 am, ramping up to one local maximum at 8 am, and an overall maximum at 5 pm – two peaks per weekday. This corresponds to what we'd guess for rush-hour travel.
Change the chart type to "Violin".
The violin chart gives a more nuanced perspective of the distribution of riders. But for now, let's return to the scatter chart.
Click the "Scatter" chart type.
Does it look the same on holidays?
In Optional Fields, select "holiday" from Subplot.
We can see that on holidays, riders start riding about the same time, but stay out later.
Setting a subplot separates the main plot into subplots based on the values of the column chosen in Subplot. In this case, holiday is of Boolean type – click Describe at the bottom of the chart display channel to verify. When we hover over a data point, we can see that the leftmost chart includes points where holiday is 0, while for the rightmost chart, holiday is 1.
Do registered riders and casual riders have different hours?
Change Y-Axis to "registeredRiders".
We can see that the peaks we saw in step 7 remain at rush hour.
Change Y-Axis to "casualRiders".
The profile of rides changes – from rush-hour peaks to a bell curve that shifts to rise between 10 AM and 8 PM, with a maximum around 2 PM.
Note how the number of casualRiders is far fewer than the total number of allRiders. If we used Dataset > Describe during exploration, we would see that there are 322 unique casualRiders and 776 unique registeredRiders – with a total of 1098 unique allRiders.
So far, we have been building the chart within the Chart Builder. When it's in a form we want to save, click Submit.