Explore
After you've loaded data into your session, explore your data before diving in to analysis. By exploring your data first, you can better understand the types of data you're working with, the quality of your data, and some general statistics about your data.
When a skill is applied to a dataset:
- If the skill creates a new dataset, it will use the convention
[dataset]_[Skill]
. - If the skill alters your existing dataset, it will use the convention
[dataset] v[x]
to save to a new version.
Interactive Dataset Panel
The Dataset Panel provides a number of ways to adjust your data.
Adjust Column Width
You can adjust the width of columns in the dataset panel by clicking on the divider between columns and dragging to expand or compress the column. You can also double-click the divider between columns to automatically resize the column to fit its contents.
Expand Cell Contents
If the contents of a cell are truncated (indicated by an ellipsis [...
]), you can click it to expand it vertically without needing to change the column width.
Rename Columns
To rename a column, double-click the column name and enter the new name.
Use the More Options Menu
The More options menu has a couple options to explore and organize with your data. From top to bottom, you can:
- Sort columns in ascending or descending order.
- Hide columns.
- Format numeric columns.
Analyze
a column- Change the column type.
Describe
a column.Drop
a column.Rename
a column.
Interactive Display Panel
The display panel provides a number of way to interact with your objects.
Minimize and Expand Objects
Objects in the display panel, such as charts and tables, can be minimized or expanded as you work in DataChat. You can minimize an object by clicking More options > Minimize <object>
. To expand a minimized object, click More options > Expand. You can also select multiple objects to minimize or expand at once by selecting the objects' checkboxes then click Minimize or Expand in the sidebar.
View Objects in a Larger Window
To view objects from the display panel in a larger window click More options > View in a larger window.
Describe
Use the Describe
skill to confirm that each column's type is correct before continuing your analysis.
The Describe
skill provides summary statistics and details about your dataset.
Describe a Dataset
There are a couple of ways to show summary statistics about your data. The quickest way is to click the Show Descriptive Statistics button in a table's header:
This expands each column to display quick statistics about the values of each column in your dataset.
Clicking Dataset > Describe from the sidebar opens a popup table that shows statistics about each column in the current dataset along with counts, unique counts, and column types. The table is named "<dataset name>_Describe". The columns are listed in the same order as the columns in the dataset. Once you close the popup to continue working with your data, the output table appears in the chat history.
You can also enter in the chat box:
Describe
, which operates on the current datasetDescribe the dataset <dataset name>
, to specify a dataset.
Describe a Column
There are several ways to view further details about a column.
From the table generated by Describe, click on the link for each column to generate a popup that shows the distribution chart of the column, along with further details. Once you close the popup to continue working with your data, the distribution chart appears in the chat history.
Click Column > Describe in the sidebar and choose a column from the current dataset.
In the dataset panel, click the three-button menu and then click Describe.
In the chat box, enter:
Describe the column <column name>
.
If a column has few unique values (low cardinality), such as a Boolean column, a donut chart containing the count of records for each unique value is returned along with a table containing detailed statistics.
Describe a Dataset in Detail
You can also describe a dataset in detail, to see further in-depth summary statistics on your data.
Enter in the chat box:
Describe the current dataset in detail
Describe the dataset <dataset name> in detail
Preview
Display a portion of a given dataset with Preview
. The output dataset appears as a new dataset: <dataset name>_Preview
and doesn't change the current dataset. You can preview the entire dataset, a portion of the dataset (a random sample or percentage), or rows that meet a given condition. By default, a portion of the original order of the dataset is displayed.
To preview your data, enter in the chat box: Preview the dataset <dataset name>
. A popup appears that shows the preview. When you close the popup, the table appears in the chat history. To apply more options, see Preview
.
Sample
Display a portion of a given dataset with Sample
. The output dataset appears as a new dataset: <dataset name>_Sample
and changes the current dataset to the name of the new dataset. You can sample the entire dataset, a portion of the dataset (a random sample or a percentage), or rows that meet a given condition. By default, a portion of the original order of the dataset is displayed.
To sample the current dataset, click Dataset > Sample in the sidebar. The Sample form opens:
Select either a number of rows or a percentage of the dataset.
Enter the number of rows or the percentage of the dataset to sample.
Select either random or sequential sampling. For sequential sampling, the sample starts with the first row and moves down the dataset until the row or percentage limit is met.
Optionally, click Conditions to specify which rows to sample.
Optionally, if you've added a condition, to add another condition, click Add Another Option.
Click Submit.
You can also enter in the chat box: Sample the dataset <dataset name>
. To apply more options, see Sample
.
Search
Click Dataset > Search:
Select whether to search all datasets or a single dataset.
If you choose to search a single dataset, select the dataset to search.
Choose whether to search for a specific string or a pattern.
Enter the string or pattern.
Click Submit.
The Search
skill has many variations. You can search the columns in a specific dataset or across all datasets to find rows with data that fit certain criteria.