Skip to main content
Version: 0.21.2

Drop

Drop lets you remove certain columns or rows in your dataset while keeping the remaining rows or columns. Columns can be selected by name and rows can be selected by providing an expression or condition those rows must partially or wholly satisfy.

Format

Drop has several utterance variations you can use, including:

  • Drop the columns <column one name>, <column two name> ... <column N name>.
  • Drop the rows satisfying all of the conditions <condition one>, <condition two> … <condition N>
  • Drop the rows satisfying any of the conditions <condition one>, <condition two> … <condition N>

Parameters

Drop uses two parameters:

  • column names (required). The names of the columns to drop. 
  • condition (required for rows). A condition has of two parts:
    • column name. The column whose values to use in the condition.
    • expression. A row is dropped if the value of the row in the specified column satisfies the condition. The available expressions include:
      • (not) Between the numbers <X> to <Y>
      • (not) Equal to the aggregate value...
      • (not) Equal to the column <column name>
      • (not) Equal to the math expression <expression>
      • (not) Equal to the value <value>
      • (not) Greater than <value>
      • (not) Less than <value>
      • (not) null
      • (not) One of <value>
      • contains
      • does not contain
      • ends with
      • does not end with
      • matches the value
      • does not match the value
      • starts with
      • does not start with
      • is (on or) before/after the date <date>
      • is (not) in the last X <time part> (and Y <time part>). Drops the rows that meet the criteria. Note that this expression is available only for temporal columns. The available options include:
        • second(s)
        • minute(s)
        • hour(s)
        • day(s)
        • week(s)
        • month(s)
        • year(s)

Output

If the columns or rows are successfully dropped, the resulting dataset becomes [Dataset] v2 or the next incremental version value. The number of rows that were dropped is also reported in the chat box.

If the columns or rows can’t be dropped, an error message is logged in the chat box.

Examples

Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:

  • Age. Their age.
  • Gender. Their gender.
  • Name. Their name.
  • PClass. Their class.
  • Survived. Whether they survived the disaster.

To drop the Age and Gender columns, enter Drop the columns Age, Gender.

To drop all rows where the passenger is female, enter Drop the rows where Gender is equal to the value female.

To drop all rows where the passenger is female and under 40 years old, enter Drop the rows satisfying all of the conditions Gender is equal to the value female, Age is less than 40.

To drop all rows where the passenger is female or first class, enter Drop the rows satisfying any of the conditions Gender is equal to the value female, Pclass is equal to the value 1.

Consider a different dataset that contained a "Date" column that contained values up to and including today's date. To drop all of the rows with a date greater than two weeks and eight days ago, enter Drop the rows where Date is not in the last 2 weeks and 8 days or Drop the rows where Date is not in the last 22 days.