Keep lets you keep certain columns or rows in your dataset while dropping the remaining rows or columns. Columns can be selected by name and rows can be selected by providing an expression or condition those rows must partially or wholly satisfy.
Keep has several utterances you can use:
Keep the columns <column one name>, <column two name> ... <column N name> where <column name> <condition>. Keep the listed columns that satisfy the condition.
Keep the rows where <condition>. Keep all rows that satisfy the condition.
Keep the rows satisfying all of the conditions <condition one>, <condition two> ... <condition N>. Keep all rows that satisfy all of the conditions.
Keep the rows satisfying any of the conditions <condition one>, <condition two> ... <condition N>. Keep all rows that satisfy any of the conditions.
Keep uses the following parameters:
column names(required). The names of the columns to keep.
condition(required for rows, optional for columns). A condition has of two parts:
column name. The column whose values to use in the condition.
expression. A row is kept if the value of the row in the specified column satisfies the condition. The available expressions include:
(not) between the numbers <X> to <Y>
(not) equal to the aggregate value...
(not) equal to the column <column name>
(not) equal to the math expression <expression>
(not) equal to the value <value>
(not) greater than <value>
(not) less than <value>
(not) one of <value>
does not contain
does not end with
matches the value
does not match the value
does not start with
is (on or) before/after the date <date>
is (not) in the last X <time part> (and Y <time part>). Keeps the rows that meet the criteria. Note that this expression is available only for temporal columns. The available options include:
If the columns or rows are successfully kept, the resulting dataset becomes [Dataset] v2 or the next incremental version value.
If the columns or rows can't be kept, an error message is logged in the chat box.
Consider a dataset called "Titanic" that contains information on each passenger, including the following columns:
Age. Their age.
Gender. Their gender.
Name. Their name.
PClass. Their class.
Survived. Whether they survived the disaster.
To keep only the
Gender columns, enter
Keep the columns Age, Gender.
To keep only the
Gender columns where the passenger is between 20 and 40 years old, enter
Keep the columns Age, Gender where Age is between the numbers 20 to 40.
To keep all rows where the passenger is female, enter
Keep the rows where Gender is equal to the value female.
To keep all rows where the passenger is female and under 40 years old, enter
Keep the rows satisfying all of the conditions Gender is equal to the value female, Age is less than 40.
To keep all rows where the passenger is female or first class, enter
Keep the rows satisfying any of the conditions Gender is equal to the value female, Pclass is equal to the value 1.
Consider a different dataset that contained a "Date" column that contained values up to and including today's date. To keep all of the rows with a date greater than two weeks and eight days ago, enter
Keep the rows where Date is not in the last 2 weeks and 8 days or
Keep the rows where Date is not in the last 22 days.