Skip to main content
Version: 0.20.7

Shape

After you've explored your data, you're ready to prepare it for analysis and visualization. Some ways to shape your data include:

note

When a skill is applied to a dataset:

  • If the skill creates a new dataset, it will use the convention [dataset]_[Skill].
  • If the skill alters your existing dataset, it will use the convention [dataset] v[x] to save to a new version.

Drop or Keep Rows or Columns

You might want to limit the number of rows or columns in your dataset with:

Drop Columns

There are a couple of ways to drop columns:

  • The column header
  • The Column button

Column Header

In the dataset panel, click the menu next to the column name in the dataset panel, then click Drop.

Dataset panel column menu Drop

The chosen column is removed from your dataset.

Column Button

You can also select Column > Drop in the sidebar, which is useful for dropping multiple columns. The Drop Columns form appears:

  1. Select the columns to drop.
  2. Click Submit.

Column Drop form

Drop Rows

There are a couple of ways to drop rows:

  • The Row button
  • The cell menu

Row Button

To Drop a row, select Row > Drop in the sidebar. The Drop form appears.

  1. Select the column that selects the rows to be dropped.
  2. Select an expression that identifies the rows to be dropped.
  3. Fill out the values of the expression.
  4. Optionally, click the + button to add more columns and conditions. When you enter more than one column and condition, a toggle appears choose whether all or any of the conditions drop the row.
  5. Click Submit.

Row Drop form

To build a DataChat sentence in the chat box, see Drop.

Cell Menu

To conditionally drop rows that match the value of a specific cell:

  1. Right-click the cell containing the value you want to drop across all rows.
  2. To drop all of the rows that match that value, click Drop rows matching {value}.
  3. To open the Drop form to add more conditions, click Drop rows matching {value} and ... and complete the form.

the context menu for dropping rows

Keep Columns

To Keep a column, select Column > Keep in the sidebar. The Keep form appears.

  1. Select the rows to keep.
  2. Click Submit.

Column Keep form

Keep Rows

There are a couple of ways to keep rows:

  • The Row button
  • The cell menu

Row Button

To Keep a row, select Row > Keep in the sidebar. The Keep form appears.

  1. Select the column that selects the rows to be kept.
  2. Select an expression that identifies the rows to be kept.
  3. Optionally, click the + button to add more columns and conditions. Click the - button to remove a column and condition.
  4. When you enter more than one column and condition, choose whether all or any of the conditions keep the row.
  5. Click Submit.

Row Keep form

To build a DataChat sentence in the chat box, see Keep.

Cell Menu

To conditionally keep rows that match the value of a specific cell:

  1. Right-click the cell containing the value you want to keep across all rows.
  2. To keep all of the rows that match that value, click Keep rows matching {value}.
  3. To open the Keep form to add more conditions, click Keep rows matching {value} and ... and complete the form.

the context menu for keeping rows

Add Columns to a Dataset

To create a new column, select the method with which to create the column.

form for the Create Column

Choose from among five methods:

To create a column using an expression:

  1. Enter a name for your column.
  2. Enter an expression that operates on existing columns. Or, fill the column with an increasing sequence.

form for Create Column - Expression

Adjust Column Characteristics

You can change the column characteristics in your dataset:

Change the Column Type

To view your dataset's column types, see the "Representation" column of the Describe skill's table output.

To change the type of the column with a form, click the Column button in the sidebar and select Change. The Change the Column Type form appears:

  1. Select the column whose type to change.
  2. Select the new type for the column.
  3. Optionally, add another column to change and its new type.
  4. Click Submit.

form for Create Column - Window

To use the chat box, enter: Change the type of the column <column name> to <type>

Rename a Column

Rename a column in several different ways:

  1. Click the Column > Rename button.

    Column Rename button

  2. Fill out the "Rename Using the Dataset" form to rename one or more columns.

    Column Rename form

Dataset Panel

Double-click on the column name in the dataset panel to make the column name editable and enter the new name.

Column Rename in dataset panel

Or, click on the menu next to the column name and select "Rename". The cursor relocates to inside the editable column name as above.

Column Rename in dataset panel menu

Chat Box

To compose a DataChat sentence with the Rename skill, enter in the chat box:

Rename the column <column> to <new name>

Column Rename in chat box

note

Column names are case-insensitive.

Convert an Interval Column

Interval types are time-based:

  • days
  • hours
  • minutes
  • months
  • seconds
  • weeks
  • years

To convert a column from one type of interval to another, enter into the chat box: Convert the interval column <column> to <interval type>. You might first use Extract to separate date parts from your Date type column.

Bundle Columns into Groups

Some skills in DataChat can use a bundle of columns as a group (as opposed to a single column or an explicit list) as an input. This bundle of columns is called a "column group" in DataChat. For example, the for each clause in Compute takes a single column, a list of columns, or a column group. Column groups are essentially nicknames you assign to a list of related columns (such as a group of columns that together make up demographic information). You can then refer to those columns using the name of the column group instead of entering each of their names every time you want to use the group in a skill.

But, in order to use a column group, you must create it first. You can use the Define skill to do that. For example, you could define and use a column group that contains the demographic information of the passengers aboard the Titanic:

  1. Load the dataset.

    Load data from the file called titanic.csv

  2. Define the column group as the Age and Gender columns.

    Define a column group called Demographics as the columns called Age, Gender

  3. Use the column group in a skill like Compute.

    Compute the count of records for each column in column group Demographics

    instead of Compute the count of records for each Age, Gender

To use Define with expressions, see Define Expressions.

Group Rows into Bins

To bin values into categories like “High” or “Low” you can use the Bin skill. Note, this will leave your original column and add a new column with the bins. You will need to decide whether each bin should contain an equal number of records (based on percentile), an equal range of values (based on width), or use custom intervals. The Bin skill also creates a secondary dataset that contains metadata of the bins, such as the start and end boundaries of each bin, that you can use as you would any other dataset.

To group a column's values into bins:

  1. Select the column whose values to bin.
  2. Select and configure how the values should be binned. Refer to the reference page for Bin for more information on these options.
  3. Enter the values used to bin the column.
  4. Optionally, enter a list of names to use for the new bins.
  5. Click Submit.

the form for the Bin skill

You can also use Bin in the chat box. For example, you can say Bin the column called Age based on percentile setting the number of intervals to 3 and call the bins Low, Medium, High.

Adjust Dataset Values

The Clean skill can be used in most cases to manipulate values.

To clean one or more columns:

  1. Select whether to clean all columns of a specific type or a specific string or numeric column.
  2. Select the columns you'd like to clean.
  3. Enter the new value you want to use.
  4. Click Submit.

the form for the Clean skill

You can also use Clean in the chat box. For example, enter: Clean the string column Name by deleting the phrase Mr.

Adjust Dataset Appearance

Sort

To sort your data:

  1. Select the columns by which you want to sort the data.
  2. Select whether you want to sort the data in ascending or descending order.
  3. Click Submit.

the sort form

Or, click on the menu next to the column name:

Column sort in dataset panel menu

Bring

To bring a column to the front of your dataset, enter into the chat box: Bring to the front the columns <columns>

Add a Caption

To add a caption to a table, switch to Notebook mode and describe your table in the Caption this table field. Click the Save button to save your changes.

the table caption field

Edit Data

To modify your dataset directly, switch to Notebook mode, then click the Edit Mode button:

the edit mode button

Toggle Notations

To display the values of numeric columns in standard or scientific notation, switch to Notebook mode, then click the Convert Notation toggle next to the column name. Numbers in scientific notation are shown with two significant digits. Numbers in standard notation are shown with up to 15 significant digits. Click the Save button to save your changes:

the toggle button

Create a New Dataset

You can create a dataset with the Dataset Builder or with the Create skill.

Use the Dataset Builder

The recommended way to create a new dataset is to use the Dataset Builder and it's familiar spreadsheet layout.

It works just like other spreadsheet tools, with a couple caveats:

  • Your column names must be in the first row.
  • You must remove any empty columns that are between populated columns before saving the dataset.

Create a Dataset from Scratch

To create a new dataset from scratch with the Dataset Builder:

  1. In the sidebar, click Dataset > Create.
  2. Click the Create a Dataset from Scratch button.

the dataset builder

Create a Dataset from an Existing Dataset

You can use the Dataset Builder to create a dataset from the contents of an existing dataset:

  1. Load the dataset you want to use as the foundation of the new dataset.
  2. Open the Dataset Builder
  3. Click the dataset you loaded in step 1.
  4. Make and save your changes in the Dataset Builder.

Create Multiple Datasets

You can create multiple datasets in the Dataset Builder by simply adding a "sheet:"

adding a dataset

Use Functions

The Dataset Builder supports a wide array functions for a performing a variety of calculations. To use a function:

  1. Click the cell that should hold the output of the function.
  2. Add your function using the syntax =<function name>(<cell or start of cell range>:<end of cell range>). For example, to calculate the sum of the cells A2 and B2, enter =SUM(A1:B2).
  3. Press Enter to save your function.

Refer to this page for a comprehensive list of all the available functions.

Name Your Dataset

Your dataset's name is the same as its "sheet" in the Builder. You can change the name of your dataset by either double-clicking or right-clicking the sheet's tab and clicking Rename:

renaming a dataset

Save Your Dataset

Before you save, make sure:

  • Your column names are in the first row.
  • You've removed any empty columns that are between populated columns.

When you're finished creating your datasets, click Save Dataset to save them in your session.

note

You can continue to make changes to your dataset after saving as long as the Dataset Builder is open.

Remove Datasets

To remove a dataset from your session, click the X at the upper right corner of a dataset's tab in the dataset panel to Forget that dataset.

X button to Forget datasets

Or, enter in the chat box:

Forget the datasets <dataset names>

To remove a specific version of the dataset, click the version dropdown of a dataset in the dataset panel and click the X next to version to forget.

X button to Forget datasets

Or, enter in the chat box:

Forget the datasets <dataset name>, version <dataset version>