Skip to main content
Version: 0.18.3

Describe

Describe helps you understand your data by showing some summary statistics about a given column or dataset. These statistics include:

  • Count. The number of rows with valid values in the column.
  • Unique. The number of unique values in the column.
  • Mean. The average value (if applicable).
  • Min. The smallest value in the column.
  • Max. The largest value in the column.
  • Representation. The type of the values in the column, such as integers, floats, or strings.
  • Category. The column’s category, including:
  • Enum. A categorical value.
  • Measure. A float value that can have either very large or very small density. For example, a distance in miles or a temperature in degrees.
  • Nominal. A named value that has no particular order.
  • Ordinal. A named value that has a particular order.
  • Display. How the values in the column are displayed, such as currency or percentages.

At skill level 3, Describe can also dig a bit deeper to show you even more summary statistics (we call this describing a column or dataset in detail), including:

  • Std. The standard deviation of the values in the column.
  • Top. The most frequent value in the column.
  • Freq. The number of times the Top value appears in the column.
  • 25%. The 25th percentile. 25 percent of the values in the column are below this value.
  • Median. The 50th percentile. 50 percent of the values in the column are below this value and 50 percent of the values are above this value.
  • 75%. The 75th percentile. 75 percent of the values in the column are below this value.

Note that Describe does not change your dataset like some other skills.

Format

Describe has several utterance variations you can use to show metadata of a column or a whole dataset:

  • Describe the column <column name> shows the summary statistics of the given column along with a distribution chart.
  • Describe the dataset <dataset name> shows the summary statistics of all columns in the given dataset.
  • Describe the dataset <dataset name> in detail shows more detailed summary statistics of all columns in the given dataset.

Parameters

The parameters used to describe a dataset include:

  • Dataset name or column name (required). The name of the dataset or column to describe.
  • In detail (optional). Provides a more detailed description of a dataset.

Output

If a column is successfully described,a two-column table with the column’s statistics appears in the chat box. Also, a histogram is plotted if the given column is continuous, or a bar chart of counts is plotted if the given column is categorical.

If a dataset is successfully described or described in detail, the statistics for each column in the dataset are shown in a table. Note that clicking the name of any value in the Column column runs Describe on that column.

note

In datasets larger than 100,000 rows, string-type columns display "N/A" instead of tallying the number of Unique values.

Examples

To describe a column, enter Describe the column <column name>

To describe dataset, enter Describe or Describe the current dataset

To describe a named dataset, enter Describe the dataset <dataset name>

To describe the current dataset in detail, enter Describe the current dataset in detail

To describe a named dataset in detail, Describe the dataset <dataset name> in detail