Drop
Drop
lets you remove certain columns or rows in your dataset while keeping the remaining rows or columns. Columns can be selected by name and rows can be selected by providing an expression or condition those rows must partially or wholly satisfy.
Format
Drop
uses several formats:
Drop the columns <column one name>, <column two name> ... <column N name>
.Drop the rows satisfying (all | any) of the conditions <condition one>, <condition two> … <condition N>
Drop the rows where <predicate>
Parameters
Drop
uses three parameters:
column names
(required). The names of the columns to drop.condition
(required for rows). A condition has of two parts:column name
. The column whose values to use in the condition.expression
. A row is dropped if the value of the row in the specified column satisfies the condition. The available expressions include:(not) Between the numbers <X> to <Y>
(not) Equal to the aggregate value...
(not) Equal to the column <column name>
(not) Equal to the math expression <expression>
(not) Equal to the value <value>
(not) Greater than <value>
(not) Less than <value>
(not) null
(not) One of <value>
contains
does not contain
ends with
does not end with
matches the value
does not match the value
starts with
does not start with
is (on or) before/after the date <date>
is (on or) before/after <temporal expression>
- See Create for more information on temporal expressions.
is (not) in the last X <time part> (and Y <time part>)
. Drops the rows that meet the criteria. Note that this expression is available only for temporal columns. The available options include:- second(s)
- minute(s)
- hour(s)
- day(s)
- week(s)
- month(s)
- year(s)
predicate
(required). Operators used to compare two values. Refer to Compute for more information.
Output
If the columns or rows are successfully dropped, the resulting dataset becomes [Dataset] v2 or the next incremental version value. The number of rows that were dropped is also reported in the conversation history.
If the columns or rows can’t be dropped, an error message is shown in the conversation history.
Examples
Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:
Age
. Their age.Gender
. Their gender.Name
. Their name.PClass
. Their class.Survived
. Whether they survived the disaster.
To drop the Age
and Gender
columns, enter Drop the columns Age, Gender
.
To drop all rows where the passenger is female, enter Drop the rows where Gender is equal to the value female
.
To drop all rows where the passenger is female and under 40 years old, enter Drop the rows satisfying all of the conditions Gender is equal to the value female, Age is less than 40
.
To drop all rows where the passenger is female or first class, enter Drop the rows satisfying any of the conditions Gender is equal to the value female, Pclass is equal to the value 1
.
Consider a different dataset that contained a "Date" column that contained values up to and including today's date. To drop all of the rows with a date greater than two weeks and eight days ago, enter Drop the rows where Date is not in the last 2 weeks and 8 days
or Drop the rows where Date is not in the last 22 days
.