Skip to main content

Wrangle

Wrangling helps to refine and organize your data for analysis. You can wrangle your data in several ways:

  • Using Sort to sort your dataset based on one or more columns.
  • Using Change Type to modify a column's type or how the values in a column are shown.
  • Using Sample to see a small number of rows from a given dataset.
  • Using Clean to replace existing values with new values.
  • Using Keep/Drop Columns to keep or remove specified columns.
  • Using Keep/Drop Rows to keep or remove specified rows.
  • Using Remove Duplicates to remove duplicate data.
  • Using Reshape to view your data in "wide" or "long" form.

wrangle

note

When a skill is applied to a dataset:

  • New datasets follow the naming convention [dataset]_[Skill].
  • Altered datasets are saved as a new version, using [dataset] v[x].

If the Data Assistant creates or alters a dataset, it assigns a new, semantically meaningful name.

Sort

You can use the Sort skill to sort your dataset based on one or more columns.

To sort your data:

  1. Click Wrangle > Sort in the skill menu.

  2. Select the columns by which you want to sort the data.

  3. Select whether you want to sort the data in ascending or descending order.

  4. Click Submit.

    sort

Change Type

You can use the Change skill to modify a column's type, such as an integer or string column, or how the values in a column are show, such as currency or percentages. Note that not all values can be converted to a different type.

To change a column's type:

  1. Click Wrangle > Change Type in the skill menu.

  2. Select the column whose type you'd like to change.

  3. Select the new type.

  4. Optionally, click the + button to add additional columns to change.

  5. Click Submit.

    change type

Optionally you can also use a column's More menu to change the type.

Sample

You can use the Sample skill to display a portion of a given dataset. You can sample the entire dataset, a portion of the dataset (a random sample or a percentage), or rows that meet a given condition. By default, a portion of the original order of the dataset is displayed.

To sample on a dataset, select the dataset to sample from, then:

  1. Click Wrangle > Sample in the skill menu.

  2. Select either a number of rows or a percentage of the dataset.

  3. Enter the number of rows or the percentage of the dataset to sample.

  4. Select either random or sequential sampling. For sequential sampling, the sample starts with the first row and moves down the dataset until the row or percentage limit is met.

  5. Optionally, click Conditions to specify which rows to sample.

  6. Optionally, if you've added a condition, to add another condition, click Add Another Option.

  7. Click Submit.

    sample

Clean

You can use the Clean skill to manipulate values, such as replacing existing values with new ones.

To clean one or more columns:

  1. Click Wrangle > Clean in the skill menu.

  2. Select whether to clean all columns of a specific type or a specific string or numeric column.

  3. Select the columns you'd like to clean.

  4. Enter the old value to replace.

  5. Choose between a new value or aggregation to replace with.

  6. Optionally, specify matching options.

  7. Optionally, specify a conditional statement to follow.

  8. Click Submit.

    clean

Keep or Drop Columns

You can use the Keep and Drop skills to keep and remove columns in your dataset.

Keep Columns

To keep specified columns:

  1. Click Wrangle > Keep Columns in the skill menu.

  2. Select the columns to keep.

  3. Click Submit.

    keep columns

Drop Columns

To drop specified columns:

  1. Click Wrangle > Drop Columns in the skill menu.

  2. Select the columns to drop.

  3. Click Submit.

    drop columns

Optionally, you can also drop columns using the More menu of the column to drop.

Keep or Drop Rows

You can use the Keep and Drop skills to keep and remove rows in your dataset.

Keep Rows

To keep specified rows:

  1. Click Wrangle > Keep Rows in the skill menu.

  2. Select the column that contains the rows to be kept.

  3. Select an expression that identifies the rows to be kept.

  4. Optionally, click the + button to add more columns and conditions. Click the - button to remove a column and condition.

  5. When you enter more than one column and condition, choose whether all or any of the conditions keep the row.

  6. Click Submit.

    keep rows

Optionally, you can also conditionally keep rows that match the value of a specific dataset cell:

  1. Right-click the cell containing the value you want to keep across all rows.
  2. To keep all of the rows that match that value, click Keep rows matching {value}.
  3. To to add more conditions, click Keep rows matching {value} and ... and complete the form.

Drop Rows

To drop specified rows:

  1. Click Wrangle > Drop Rows in the skill menu.

  2. Select the column that selects the rows to be dropped.

  3. Select an expression that identifies the rows to be dropped.

  4. Optionally, click the + button to add more columns and conditions.

  5. When you enter more than one row and condition, choose whether all or any of the conditions drop the row.

  6. Click Submit.

    drop rows

Optionally, you can also conditionally drop rows that match the value of a specific dataset cell:

  1. Right-click the cell containing the value you want to drop across all rows.
  2. To drop all of the rows that match that value, click Drop rows matching {value}.
  3. To add more conditions, click Drop rows matching {value} and ... and complete the form.

Remove Duplicates

You can use the Remove Duplicates skill to remove duplicate data from specified columns.

To remove duplicates:

  1. Click Wrangle > Remove Duplicates in the skill menu.
  2. Enter the columns to remove duplicates from.
  3. Click Submit.

remove duplicates form

Reshape

You can use the Reshape skill to convert your data to either wide or long form. This allows you to specify one or more columns to act as unique identifiers for rows and create new columns from the values of your specified columns. Each row in the resulting dataset represents a combination of the column values.

To reshape a dataset:

  1. Click Wrangle > Reshape in the skill menu.

  2. Select either Wide Form or Long Form.

  3. Enter the Row Identifiers.

  4. Enter the Values.

  5. Optionally, enter a name for the variables column and the value column.

  6. Click Submit.

    reshape