Define
Define
lets you create reusable objects, such as patterns, aggregations, predicates, math expressions, and more. You can then use these objects in other skills, such as Compute
, Keep
, or Drop
.
Format
Define
has a utterance for each available object:
Define a math expression <name> as <math expression>
Define a predicate expression <name> that satisfies <any/all> of <predicates>
Define an aggregate query expression <name> to be <aggregation> (<for each> <predicates> <where> <order> <limit>)
Define an extract expression <name> as the expression <date part> from <datetime column>
Define an aggregate math expression <name> as <math expression>
. Compared to a standard math expression, aggregate math expressions allow you to define a math expression that uses aggregations along with standard math expressions, such assum(column A / column B)
.Define a column group <name> as the columns <columns>
Define an aggregate expression <name> as the expression <aggregation>
Define a predicate expression <name> as the expression <predicates>
Define a predicate expression <name> that satisfies <any/all> of the following conditions <predicates>
Define a column reference <phrase> as the column <reference column>
Define a pattern expression <name> satisfying <any/all> of the conditions <expression>
Define a pattern expression <name> that looks like the regular expression <regex>
Define a pattern expression <name> to match strings that begin with <type> repeated <repeat amount>, <followed by/immediately followed by> <type>, ...
Define a pattern expression <name> to match strings that contain repeated <repeat amount>, <followed by/immediately followed by> <type>, ...
Parameters
The parameters used in Define
include:
name
(required). The name of the object.date part
(required). For extract phrases, this is the part of the date or time that should be extracted, such asday
orhour
. See Extract for more information on the available options.datetime column
(required). For extract phrases, this is the column thedate part
should be extracted from.math expression
(required). A math expression, such as(<column x> * <column y>) / <column z>
.predicates
(required). Operators used to compare two values. Refer to Compute for more information.for each
(optional). The columns to include in the aggregate query expression.where
(optional). A condition that must be met for the row to be included.order
(optional). The order, ascending or descending, the results should be sorted.limit
(optional). Limits the resulting dataset to the specified number of rows.columns
(required). A comma-separated list of columns to include in the column group.phrase
(required). A phrase to use as the name of the object.aggregation
(required). A comma-separated list of calculations. Refer to Compute for more information.expression
(required). An already-defined pattern expression.reference column
(required). The column the phrase should reference.regex
(required). A regular expression used to match values in a dataset.type
(required). The character's type. The options include:a character that is of type digit
. The character is a digit (0-9).a character that is of type letter
. The character is a letter (A-Z, case insensitive).the value <X>
. The character is the specified value.
repeat amount
(required). The number of times the character, letter, or value repeats. The options include:any times
. The character, letter, or value can be repeated any number of times.at least <X> times
. The character, letter, or value needs to be repeated at least this many times.exactly <X> times
. The character, letter, or value needs to be repeated exactly this many times.once or more
. The character, letter, or value must be repeated at least once and can be repeated multiple times.
Output
If the object is successfully defined, a success message is returned in the chat history. Otherwise, an error message is returned.
Examples
Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:
Age
. Their age.Gender
. Their gender.Name
. Their name.PClass
. Their class.Survived
. Whether they survived the disaster.
To define a predicate that returns true when the passenger is an adult, enter Define a predicate expression isAdult as the expression Age is greater than or equal to the value 18
.
To define an aggregation that calculates the average age of the passengers, enter Define an aggregate expression AverageAge as the expression average Age
.
To define a pattern you can use to find the rows of the Name column that contain the phrase "Mrs.," enter Define a pattern expression Married to match strings that contain the value Mrs repeated exactly 1 times, immediately followed by the value . repeated exactly 1 times, followed by a character that is of type letter repeated once or more.
. You could also define this same pattern directly by entering Define a pattern expression Married that looks like the regular expression .*(?:Mrs){1}(?:\.){1}.*[a-zA-Z]+.*
.
To define a math expression that calculates the Age to Fare ratio for each row, enter Define a math expression AgeFareRatio as Age / Fare
.
To define an aggregate math expression that calculates the total Age to Fare ratio for the dataset, enter Define an aggregate math expression AgeFareRatio as sum(Age) / sum(Fare)
.
Uses
With the expressions defined above, we can use them in other utterances. For example:
- To compute the total count of passengers who are adults, enter
Compute the count of records where isAdult
. - To visualize the average age for each passenger class, enter
Visualize AverageAge by Pclass
. - To search the dataset for all married women passengers, enter
Search all columns in the dataset Titanic for values that look like a Married
.