Skip to main content
Version: 0.21.2

Create

Create lets you add a standard, window, temporal, or time series column to your dataset, create alerts that can help you monitor your data, APIs to host your data, charts, and models; and create datasets from scratch.

When creating a column, the new column is prepended to the front (the left side) of your dataset and can be populated in several ways, including concatenating the values of two other columns, performing a calculation, or simply filling the column with text. When you create a new column, its type is inferred by the data it’s populated with.

There are five distinct column types:

  • A standard column simply contains text or a numeric value. You can also use a case statement to conditionally set the value of the column.
  • A window column runs calculations based on a “window” or “subset” of the data in the dataset. Refer to the Window Column section below for more information.
  • A datetime column contains date or time values. Unlike temporal columns, you can only add or remove amounts of time (such as days, hours, or minutes) from the values in an existing date or time column.
  • A temporal column contains date or time values and allows you to use math expressions with time values, such as calculating the number of days between dates or adding minutes to a time value. See the Temporal Column section below for more information.
  • A time series column contains date or time values that allows you to create a column where each row is an increasing sequence of time based on a set interval (such as hours or days).

When creating an alert, you are automatically subscribed to the newly-created alert.

Format

Create has several utterance variations depending on what object you're creating.

Standard Columns

  • Create a new column <column name> as <expression>
  • Create a new column <column name> with text <text>
  • Create a new column <column name> by concatenating <column names or symbols>
  • Create a new column <column name> using the delimiter <delimiter> to concatenate <expression> <columns>
  • Create a new column <column name> as an increasing sequence
  • Create a new column <column name> setting it as <column value> when <expression>, as <default value> otherwise. This utterance uses a case statement.

Window Columns

Window columns use a single utterance: Create a new window column <column name> as <aggregation> (for each <grouping columns> | for the entire table) (ordered in no particular way | sorted by the columns <sort columns>) computed over the <row range/value range> that looks back from the current position <look back> and looks forward <look forward>

Datetime Columns

Datetime columns use a single utterance: Create a new datetime column <column name> as <datetime column> <plus/minus> <time interval>.

Temporal Columns

Temporal columns use a single utterance: Create a new temporal column <column name> as <temporal expression>

Time Series Columns

Time series columns use a single utterance: Create a time series column called <column name> starting on <start date> at <start time> and incrementing by <X time interval>

Alerts

Alerts use a single utterance: Create a new alert <alert name> when <condition 1> <operator> <condition 2> on <days> at <time> <time zone> with the message <message>

APIs

APIs use a single utterance: Create a new API <api name>

Datasets

Datasets use a single utterance: Create a new dataset <dataset name> with columns <column names> and values <values>

Parameters

You can use the following parameters with Create.

Standard Column

The following parameters are available. Note that only one can be used for a given column:

  • column name. The name of the new column.
  • column value. For case statements, this is the value to use in the new column when the expression is satisfied. Refer to the list of possible aggregations for more information.
  • default value. For case statements, this is the default value to use in the new column when the expression is not satisfied.
  • expression. An expression is typically a conditional predicate or a mathematical expression. For case statements, the available expressions depend on the type of the column used in the expression. Refer to the parameters for the Compute skill for more information. Note that math expressions are not supported in case statements.
  • columns. The name of an existing column(s) to concatenate with.
  • text. A free-text string.
  • delimiter. A character or string used to delimit the values of two or more concatenated columns.

Window Column

The available parameters include:

  • aggregation (required):
    • lag. Returns the value of the current column a given number of rows behind the current row. If there is no row behind the current row, this parameter returns NULL.
    • lead. Returns the value of the current column a given number of rows after the current row. If there is no row after the current row, this parameter returns NULL.
    • ntile. Similar to the Bin skill, the ntile function divides the data based on what you’re sorting it by into the specified number of groups. For example, to create quartiles, you would use Create a new window column <column name> as ntile 4 <over> <partition> <order>
    • first. Returns the first value of the specified column in the given window.
    • last. Returns the last value of the specified column in the given window.
    • nth Value. Returns the value of the specified column at the given row.
    • rank/row number. Returns the position of this row within the window. For example, if three rows in the window have values 1, 1, and 2, rank number returns 1, 1, 3 and row number returns 1, 2, 3, respectively.
    • Any of the Compute functions, with the exception of median and distinct count.
  • row range/value range (optional). The window to use, such as a range of values.
  • grouping columns (optional). The columns to split the data by.
  • sort columns (optional). The columns to sort the data on before aggregating the data.
  • order (optional). If you're sorting by specific columns, this is the order (ascending or descending) to sort by.
  • look back (required). How far back to look in the data.
  • look forward (required). How forward to look in the data.

Datetime Column

The available parameters include:

  • datetime column (required). The name of an existing date, time, or datetime column in your dataset.
  • plus/minus (required). Whether to add or subtract time from the datetime column.
  • time interval (required). The amount of time to add or subtract. The options include:
    • seconds
    • minutes
    • hours
    • days
    • months
    • years

Temporal Column

The available parameters include:

  • temporal expression (required). A temporal expression is a math expression that works primarily with date and time values. The options include:
    • A column in your dataset.
    • Date(YYYY-MM-DD).
    • Now. The current date and time.
    • Timestamp(YYYY-MM-DD HH:MM:SS). Use the given date and time as a timestamp. Note that the time uses a 24-hour clock.
    • Today. Today's date.
    • Tomorrow. Tomorrow's date.
    • Yesterday. Yesterday's date.

Time Series Column

The available parameters include:

  • start date (required). The starting date and time. This parameter uses the format MM-DD-YY.
  • start time (optional). This parameter uses the format HH:MM and uses either a 12-hour or 24-hour clock.
  • column name (required). The name of the column.
  • X time interval (required). The amount of time each row should increase by. The options include:
    • seconds
    • minutes
    • hours
    • days
    • months
    • years

Alerts

The available parameters include:

  • alert Name (required). The name of the alert.
  • condition 1 (required). A reference to a dataset using the format <dataset name>.<”any” or row index>.<column name>. For example, if you wanted to use the first row of the Age column in a dataset called Titanic, enter Titanic.1.Age. If you wanted to use any row in the Age column, enter Titanic.any.Age.
  • operator (required). The comparison operator to use between the Condition 1 and Condition 2 parameters.
  • condition 2 (required). Another reference to a dataset or a numeric value.
  • days (required). The days on which the alert should check the conditions.
  • time (required). The time at which the alert should check the conditions on the given days. Note that this parameter uses a 24 hour (HH:MM) format.
  • time zone (required). The time zone to use with the time parameter. The options include:
    • CST. Central Standard Time.
    • EST. Eastern Standard Time.
    • MST. Mountain Standard Time.
    • PST. Pacific Standard Time.
  • message (optional). The message that should be sent to the recipient. This can be a combination of a string and a dataset reference. For example, if you wanted to include the age of the first passenger in the Titanic dataset, enter "The age of the first passenger is [Titanic.1.Age]."

APIs

The only available parameter is api name. This is the name to give the API.

Datasets

The available parameters include:

  • dataset name (required). The name of the new dataset.
  • column names (required). A comma-separated list of column names for the new dataset.
  • values (required). A comma-separated list of values to fill the columns. Note that these values populate the columns from left to right and wrap around. For example, if you have two columns, A and B, entering 1, 2, 3, 4 here puts 1 in column A, 2 in column B, 3 in column A (in a new row), and 4 in column B (in the new row).

Output

If a new column is successfully created, it's added to the dataset and the updated dataset becomes [Dataset] v2 or the next incremental version value.

If an alert is successfully created, a success message is returned in the chat history and a text message is sent to confirm that the alert has been created.

If a dataset is successfully created, a success message is returned in the chat history and your new dataset appears in the display panel. Otherwise, an error message is returned.

Examples

Standard Column

In a dataset with a totalPoints column and a numGames column, you could add the averagePoints column to the dataset using the utterance Create a new column averagePoints as totalPoints / numGames.

To concatenate the columns Age and Gender from the dataset Titanic, enter Create a new column GenderAge using the delimeter , to concatenate the column Gender, the column Age.

To concatenate the text "$" with the column Fare from the dataset Titanic, enter Create a new column Cost using the delimeter SPACE to concatenate the text DOLLAR SIGN, the column Fare.

Window Column

In a COVID-19 dataset for each state where each row contains the number of cases reported in a certain state on a certain day, you could create a window column that contains the cases from the previous day using the utterance Create a new window column casesDayBefore as lag Cases by 1 for each State sorted by Date.

Datetime Column

To create a date called followUpDate which is one year after the value in the salesDate column, enter Create a new datetime column called followUpDate as salesDate plus 1 year.

Temporal Column.

To create a column called twoDaysAgo, enter Create a new temporal column called twoDaysAgo as Today - 2 days.

To create a column called daysSinceChristmas2020, enter Create a new temporal column called daysSinceChristmas2020 as Today - Date(2020-12-25).

To create a column called 6Days, enter Create a new temporal column called 6Days as 3 * 2 days.

To create a column called HoursSinceDaylightSavings, enter Create a new temporal column called DaylightSavingsTimeStart as Now - Timestamp(2020-03-14 02:00:00).

Alerts

In a dataset called TicketSales that contains a Price column, you could create an alert to notify you when a high-priced ticket is purchased using the utterance Create a new alert highTicketAlert when TicketSales.any.Price is greater than or equal to the value 50 on all days at 8:00 CST with the message A high-priced ticket has been purchased!.

Datasets

To create a dataset called "Employees," with the columns "Name" and "Age," and add Jack, 21; and Jill, 23; enter Create a new dataset Employees with columns Name, Age and values Jack, 21, Jill, 23.

Behind the Scenes: Window Columns

You can think of a dataset as a single spreadsheet with rows and columns, such as an Excel file. Creating a standard column is as simple as adding a column to that spreadsheet and populating it. Creating a window column, however, is a bit more complicated. Take the utterance above, Create a new window column casesDaysBefore as lag Cases by 1 for each State sorted by Date, as an example. What’s really happening here?

First, we take the data and split it based on the column in the for each clause. In this case, we’re interested in the State column. In our spreadsheet example, this would be the same as breaking our single spreadsheet into new spreadsheets that contain every row for each state.

Second, in each newly-created spreadsheet, we sort the data based on the column in the sort by clause. In this case, we sort the data by the Date column.

Third, we create the new window column and populate it row by row based on the provided column name and aggregation. In this case, the column name is casesDaysBefore and the aggregation is to lag cases by 1. To do this, we take each spreadsheet (which is now sorted chronologically) and compute the difference in cases from the day before by subtracting the value in the current row of the Cases column from the row above it. The resulting value is then placed in the casesDayBefore column.

Then, each state’s spreadsheet is combined together to recreate the original dataset. But, the resulting dataset has the new casesDayBefore column and is sorted chronologically.