Wrangle
Wrangling helps to refine and organize your data for analysis. You can wrangle your data in several ways:
- Using Sort to sort your dataset based on one or more columns.
- Using Change Type to modify a column's type or how the values in a column are shown.
- Using Sample to see a small number of rows from a given dataset.
- Using Clean to replace existing values with new values.
- Using Keep/Drop Columns to keep or remove specified columns.
- Using Keep/Drop Rows to keep or remove specified rows.
- Using Remove Duplicates to remove duplicate data.
- Using Reshape to view your data in "wide" or "long" form.
When a skill is applied to a dataset:
- New datasets follow the naming convention
[dataset]_[Skill]
. - Altered datasets are saved as a new version, using
[dataset] v[x]
.
If the Data Assistant creates or alters a dataset, it assigns a new, semantically meaningful name.
Sort
You can use the Sort
skill to sort your dataset based on one or more columns.
To sort your data:
-
Click Wrangle > Sort in the skill menu.
-
Select the columns by which you want to sort the data.
-
Select whether you want to sort the data in ascending or descending order.
-
Click Submit.
Change Type
You can use the Change
skill to modify a column's type, such as an integer or string column, or how the values in a column are show, such as currency or percentages. Note that not all values can be converted to a different type.
To change a column's type:
-
Click Wrangle > Change Type in the skill menu.
-
Select the column whose type you'd like to change.
-
Select the new type.
-
Optionally, click the + button to add additional columns to change.
-
Click Submit.
Optionally you can also use a column's More menu to change the type.
Sample
You can use the Sample
skill to display a portion of a given dataset. You can sample the entire dataset, a portion of the dataset (a random sample or a percentage), or rows that meet a given condition. By default, a portion of the original order of the dataset is displayed.
To sample on a dataset, select the dataset to sample from, then:
-
Click Wrangle > Sample in the skill menu.
-
Select either a number of rows or a percentage of the dataset.
-
Enter the number of rows or the percentage of the dataset to sample.
-
Select either random or sequential sampling. For sequential sampling, the sample starts with the first row and moves down the dataset until the row or percentage limit is met.
-
Optionally, click Conditions to specify which rows to sample.
-
Optionally, if you've added a condition, to add another condition, click Add Another Option.
-
Click Submit.
Clean
You can use the Clean
skill to manipulate values, such as replacing existing values with new ones.
To clean one or more columns:
-
Click Wrangle > Clean in the skill menu.
-
Select whether to clean all columns of a specific type or a specific string or numeric column.
-
Select the columns you'd like to clean.
-
Enter the old value to replace.
-
Choose between a new value or aggregation to replace with.
-
Optionally, specify matching options.
-
Optionally, specify a conditional statement to follow.
-
Click Submit.
Keep or Drop Columns
You can use the Keep
and Drop
skills to keep and remove columns in your dataset.
Keep Columns
To keep specified columns:
-
Click Wrangle > Keep Columns in the skill menu.
-
Select the columns to keep.
-
Click Submit.
Drop Columns
To drop specified columns:
-
Click Wrangle > Drop Columns in the skill menu.
-
Select the columns to drop.
-
Click Submit.
Optionally, you can also drop columns using the More menu of the column to drop.
Keep or Drop Rows
You can use the Keep
and Drop
skills to keep and remove rows in your dataset.
Keep Rows
To keep specified rows:
-
Click Wrangle > Keep Rows in the skill menu.
-
Select the column that contains the rows to be kept.
-
Select an expression that identifies the rows to be kept.
-
Optionally, click the + button to add more columns and conditions. Click the - button to remove a column and condition.
-
When you enter more than one column and condition, choose whether all or any of the conditions keep the row.
-
Click Submit.
Optionally, you can also conditionally keep rows that match the value of a specific dataset cell:
- Right-click the cell containing the value you want to keep across all rows.
- To keep all of the rows that match that value, click Keep rows matching {value}.
- To to add more conditions, click Keep rows matching {value} and ... and complete the form.
Drop Rows
To drop specified rows:
-
Click Wrangle > Drop Rows in the skill menu.
-
Select the column that selects the rows to be dropped.
-
Select an expression that identifies the rows to be dropped.
-
Optionally, click the + button to add more columns and conditions.
-
When you enter more than one row and condition, choose whether all or any of the conditions drop the row.
-
Click Submit.
Optionally, you can also conditionally drop rows that match the value of a specific dataset cell:
- Right-click the cell containing the value you want to drop across all rows.
- To drop all of the rows that match that value, click Drop rows matching {value}.
- To add more conditions, click Drop rows matching {value} and ... and complete the form.
Remove Duplicates
You can use the Remove Duplicates
skill to remove duplicate data from specified columns.
To remove duplicates:
- Click Wrangle > Remove Duplicates in the skill menu.
- Enter the columns to remove duplicates from.
- Click Submit.
Reshape
You can use the Reshape
skill to convert your data to either wide or long form. This allows you to specify one or more columns to act as unique identifiers for rows and create new columns from the values of your specified columns. Each row in the resulting dataset represents a combination of the column values.
To reshape a dataset:
-
Click Wrangle > Reshape in the skill menu.
-
Select either Wide Form or Long Form.
-
Enter the Row Identifiers.
-
Enter the Values.
-
Optionally, enter a name for the variables column and the value column.
-
Click Submit.