Guided Learning
When we use DataChat, analysis is augmented with visualization and vice versa. Visualizations provide an easier way to understand your data and simplify how you share your insights with others.
We'll step through how to build and annotate a chart from datasets.
Open the BikeShare Dataset
Open a new session from the homepage, download the dataset "BikeShare dataset (Excel)" from DataChat Training, and load it into a new session.
BikeShare.xlsx contains the datasets "BikeShare", "SeasonDecode", and "WeatherDecode". When an Excel file loads into a new session, each sheet loads as a separate dataset. The "All Datasets" table displays all the datasets loaded into the session. Once we close the "All Datasets" table, the "BikeShare" dataset displays in the dataset panel of Grid mode.
The BikeShare dataset captures the activity of bike riders during the period Jan. 1, 2011 through Feb. 26, 2012. It also captures weather and seasonal data.
Explore Your Data with Chart Builder
The Chart Builder provides an interactive interface in which to refine our analysis by visualization. To open the Chart Builder, click Plot in the sidebar.
Now, let's ask a few questions about our data.
When do people ride their bikes?
In the Chart Builder, select "hour" for the X-Axis and "allRiders" for the Y-Axis.
By default, the ChartBuilder starts with the scatter chart option. You can also hover over the options to see what other plot types are available.
In the chart display panel, hover over the lines for each hour to see the individual data points that make up each line.
We can see the lowest number of riders at 4 am, ramping up to one local maximum at 8 am, and an overall maximum at 5 pm – two peaks per weekday. This corresponds to what we'd guess for rush-hour travel.
Let's change the chart type to "Violin".
The violin chart gives a more nuanced perspective of the distribution of riders. But for now, let's return to the scatter chart.
Does it look the same on holidays?
Under Optional Fields, select "holiday" from Subplot.
We can see that on holidays, riders start riding about the same time, but stay out later.
Setting a subplot separates the main plot into subplots based on the values of the column chosen in Subplot. In this case, holiday is Integer type. We can click Describe at the bottom of the chart display to verify.
Looking at the chart titles, we can see that the leftmost chart includes points where holiday is 0, while for the rightmost chart, holiday is 1.
Do registered riders and casual riders have different hours?
Let's change Y-Axis to "registeredRiders".
We can see that the peaks we saw with "allRiders" remain during rush hour.
We can also change Y-Axis to "casualRiders".
The profile of rides change from rush-hour peaks to a bell curve that shifts to rise between 10 AM and 8 PM, with a maximum around 2 PM.
Note how the number of casualRiders is far fewer than the total number of allRiders. If we used Dataset > Describe during exploration, we would see that there are 322 unique casualRiders and 776 unique registeredRiders with a total of 1098 unique allRiders.
What does the weather look like when people ride?
Let's create a couple more charts to investigate how weather conditions impact riders. Select Single Metric Chart and enter "relativeHumidity" for the column.
This chart displays a single value, in this case, the average relative humidity at 63%. By default, the column selected typically defaults to average for the aggregate. To change the aggregate value, click the column name and select a new aggregate.
Let's try a different chart type. Click Line Chart and enter "temperatureRelativeto41C" for the X-Axis and "allRiders", "casualRiders", and "registeredRiders" for the Y-Axis.
Here, we can see that as temperature increases, so does ridership across all types of riders. We can also see that both registered riders and all riders decreases significantly after the temperature gets too warm.
Let's create one last chart to explore weather situations. Select Stacked Bar Chart then enter "weatherSituation" for the X-Axis, average "allRiders" for the Y-Axis, and "seasonCode" for the Partition.
This chart reveals to us the average riders in each season for each weather situation. We can see that seasons 2 and 3, summer and fall, have the highest ridership for weather situations 1 and 2, clear and cloudy weather, with almost no ridership for weather situation 4, heavy rain.
So far, we have been building the chart within the Chart Builder. When it's in a form we want to save, click Submit.