Skip to main content
Version: 0.35.7

Define

Define lets you create reusable objects, such as patterns, aggregations, predicates, math expressions, and more. You can then use these objects in other skills, such as Compute, Keep, or Drop.

Format

Define has a different format for each available object:

  • Define a column group <name> as the columns <columns>
  • Define a column reference <phrase> as the column <reference column>
  • Define a math expression <name> as <math expression>
  • Define a predicate expression <name> as the expression <predicate>
  • Define a predicate expression <name> that satisfies (any | all) of the following conditions <predicate>
  • Define an aggregate expression <name> as the expression <aggregation>
  • Define an aggregate math expression <name> as <math expression>. Compared to a standard math expression, aggregate math expressions allow you to define a math expression that uses aggregations along with standard math expressions, such as sum(column A / column B).
  • Define an aggregate query expression <name> to be <aggregation> (for each <column> | where <predicate> | such that <predicate> | sorted in (ascending | descending) order | displaying (bottom | first | last | top) <number of rows>)
  • Define an extract expression <name> as the expression <date part> from <datetime column>

Parameters

The parameters used in Define include:

  • name (required). The name of the object.
  • date part (required). For extract phrases, this is the part of the date or time that should be extracted, such as day or hour. See Extract for more information on the available options.
  • datetime column (required). For extract phrases, this is the column the date part should be extracted from.
  • math expression(required). A math expression, such as (<column x> * <column y>) / <column z>.
  • predicate (required). Operators used to compare two values. Refer to Compute for more information.
  • column (required). A comma-separated list of columns to include in the column group.
  • phrase (required). A phrase to use as the name of the object.
  • aggregation (required). A comma-separated list of calculations. Refer to Compute for more information.
  • expression (required). An already-defined pattern expression.
  • reference column (required). The column the phrase should reference.
  • number of rows (optional). The number of rows to display.

Output

If the object is successfully defined, a success message is returned in the log. Otherwise, an error message is returned.

Examples

Consider a dataset called “Titanic” that contains information on each passenger, including the following columns:

  • Age. Their age.
  • Gender. Their gender.
  • Name. Their name.
  • PClass. Their class.
  • Survived. Whether they survived the disaster.

To define a predicate that returns true when the passenger is an adult, enter Define a predicate expression isAdult as the expression Age is greater than or equal to the value 18.

To define an aggregation that calculates the average age of the passengers, enter Define an aggregate expression AverageAge as the expression average Age.

To define a math expression that calculates the Age to Fare ratio for each row, enter Define a math expression AgeFareRatio as Age / Fare.

To define an aggregate math expression that calculates the total Age to Fare ratio for the dataset, enter Define an aggregate math expression AgeFareRatio as sum(Age) / sum(Fare).

Uses

With the expressions defined above, we can use them in other steps. For example:

  • To compute the total count of passengers who are adults, enter Compute the count of records where isAdult.
  • To visualize the average age for each passenger class, enter Visualize AverageAge by Pclass.