Skip to main content
Version: 0.21.2

Keep

Keep lets you keep certain columns or rows in your dataset while dropping the remaining rows or columns. Columns can be selected by name and rows can be selected by providing an expression or condition those rows must partially or wholly satisfy.

Format

Keep has several utterances you can use:

  • Keep the columns <column one name>, <column two name> ... <column N name> where <column name> <condition>. Keep the listed columns that satisfy the condition.
  • Keep the rows where <condition>. Keep all rows that satisfy the condition.
  • Keep the rows satisfying all of the conditions <condition one>, <condition two> ... <condition N>. Keep all rows that satisfy all of the conditions.
  • Keep the rows satisfying any of the conditions <condition one>, <condition two> ... <condition N>. Keep all rows that satisfy any of the conditions.

Parameters

Keep uses the following parameters:

  • column names (required). The names of the columns to keep.
  • condition (required for rows, optional for columns). A condition has of two parts:
    • column name. The column whose values to use in the condition.
    • expression. A row is kept if the value of the row in the specified column satisfies the condition. The available expressions include:
      • (not) between the numbers <X> to <Y>
      • (not) equal to the aggregate value...
      • (not) equal to the column <column name>
      • (not) equal to the math expression <expression>
      • (not) equal to the value <value>
      • (not) greater than <value>
      • (not) less than <value>
      • (not) null
      • (not) one of <value>
      • contains
      • does not contain
      • ends with
      • does not end with
      • matches the value
      • does not match the value
      • starts with
      • does not start with
      • is (on or) before/after the date <date>
      • is (not) in the last X <time part> (and Y <time part>). Keeps the rows that meet the criteria. Note that this expression is available only for temporal columns. The available options include:
        • second(s)
        • minute(s)
        • hour(s)
        • day(s)
        • week(s)
        • month(s)
        • year(s)

Output

If the columns or rows are successfully kept, the resulting dataset becomes [Dataset] v2 or the next incremental version value.

If the columns or rows can't be kept, an error message is logged in the chat box.

Examples

Consider a dataset called "Titanic" that contains information on each passenger, including the following columns:

  • Age. Their age.
  • Gender. Their gender.
  • Name. Their name.
  • PClass. Their class.
  • Survived. Whether they survived the disaster.

To keep only the Age and Gender columns, enter Keep the columns Age, Gender.

To keep only the Age and Gender columns where the passenger is between 20 and 40 years old, enter Keep the columns Age, Gender where Age is between the numbers 20 to 40.

To keep all rows where the passenger is female, enter Keep the rows where Gender is equal to the value female.

To keep all rows where the passenger is female and under 40 years old, enter Keep the rows satisfying all of the conditions Gender is equal to the value female, Age is less than 40.

To keep all rows where the passenger is female or first class, enter Keep the rows satisfying any of the conditions Gender is equal to the value female, Pclass is equal to the value 1.

Consider a different dataset that contained a "Date" column that contained values up to and including today's date. To keep all of the rows with a date greater than two weeks and eight days ago, enter Keep the rows where Date is not in the last 2 weeks and 8 days or Keep the rows where Date is not in the last 22 days.