Skip to main content

Drop

Drop lets you remove certain columns or rows in your dataset while keeping the remaining rows or columns. Columns can be selected by name and rows can be selected by providing an expression or condition those rows must partially or wholly satisfy.

Format

Drop uses several formats:

  • Drop the columns <column one name>, <column two name> ... <column N name>.
  • Drop the rows satisfying (all | any) of the conditions <condition one>, <condition two> … <condition N>
  • Drop the rows where <predicate>

Parameters

Drop uses three parameters:

  • column names (required). The names of the columns to drop.
  • condition (required for rows). A condition has of two parts:
    • column name. The column whose values to use in the condition.
    • expression. A row is dropped if the value of the row in the specified column satisfies the condition. The available expressions include:
      • (not) Between the numbers <X> to <Y>
      • (not) Equal to the aggregate value...
      • (not) Equal to the column <column name>
      • (not) Equal to the math expression <expression>
      • (not) Equal to the value <value>
      • (not) Greater than <value>
      • (not) Less than <value>
      • (not) null
      • (not) One of <value>
      • contains
      • does not contain
      • ends with
      • does not end with
      • matches the value
      • does not match the value
      • starts with
      • does not start with
      • is (on or) before/after the date <date>
      • is (on or) before/after <temporal expression>
        • See Create for more information on temporal expressions.
      • is (not) in the last X <time part> (and Y <time part>). Drops the rows that meet the criteria. Note that this expression is available only for temporal columns. The available options include:
        • second(s)
        • minute(s)
        • hour(s)
        • day(s)
        • week(s)
        • month(s)
        • year(s)
  • predicate (required). Operators used to compare two values. Refer to Compute for more information.

Output

If the columns or rows are successfully dropped, the resulting dataset becomes [Dataset] v2 or the next incremental version value. The number of rows that were dropped is also reported in the conversation history.

If the columns or rows can’t be dropped, an error message is shown in the conversation history.

Examples

Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:

  • Age. Their age.
  • Gender. Their gender.
  • Name. Their name.
  • PClass. Their class.
  • Survived. Whether they survived the disaster.

To drop the Age and Gender columns, enter Drop the columns Age, Gender.

To drop all rows where the passenger is female, enter Drop the rows where Gender is equal to the value female.

To drop all rows where the passenger is female and under 40 years old, enter Drop the rows satisfying all of the conditions Gender is equal to the value female, Age is less than 40.

To drop all rows where the passenger is female or first class, enter Drop the rows satisfying any of the conditions Gender is equal to the value female, Pclass is equal to the value 1.

Consider a different dataset that contained a "Date" column that contained values up to and including today's date. To drop all of the rows with a date greater than two weeks and eight days ago, enter Drop the rows where Date is not in the last 2 weeks and 8 days or Drop the rows where Date is not in the last 22 days.