Create
Create
lets you add a standard, window, temporal, or time series column to your dataset, create alerts that can help you monitor your data, APIs to host your data, charts, and models; and create datasets from scratch.
When creating a column, the new column is prepended to the front (the left side) of your dataset and can be populated in several ways, including concatenating the values of two other columns, performing a calculation, or simply filling the column with text. When you create a new column, its type is inferred by the data it’s populated with.
There are five distinct column types:
- A standard column simply contains text or a numeric value. You can also use a case statement to conditionally set the value of the column.
- A window column runs calculations based on a “window” or “subset” of the data in the dataset. Refer to the Window Column section below for more information.
- A datetime column contains date or time values. Unlike temporal columns, you can only add or remove amounts of time (such as days, hours, or minutes) from the values in an existing date or time column.
- A temporal column contains date or time values and allows you to use math expressions with time values, such as calculating the number of days between dates or adding minutes to a time value. See the Temporal Column section below for more information.
- A time series column contains date or time values that allows you to create a column where each row is an increasing sequence of time based on a set interval (such as hours or days).
When creating an alert, you are automatically subscribed to the newly-created alert.
Format
Create
has several utterance variations depending on what object you're creating.
Standard Columns
Create a new column <column name> as <expression>
Create a new column <column name> with text <text>
Create a new column <column name> by concatenating <column names or symbols>
Create a new column <column name> using the delimiter <delimiter> to concatenate <expression> <columns>
Create a new column <column name> as an increasing sequence
Create a new column <column name> setting it as <column value> when <expression>, as <default value> otherwise
. This utterance uses a case statement.
Window Columns
Window columns use a single utterance: Create a new window column <column name> as <aggregation> (for each <grouping columns> | for the entire table) (ordered in no particular way | sorted by the columns <sort columns>) computed over the <row range/value range> that looks back from the current position <look back> and looks forward <look forward>
Datetime Columns
Datetime columns use a single utterance: Create a new datetime column <column name> as <datetime column> <plus/minus> <time interval>
.
Temporal Columns
Temporal columns use a single utterance: Create a new temporal column <column name> as <temporal expression>
Time Series Columns
Time series columns use a single utterance: Create a time series column called <column name> starting on <start date> at <start time> and incrementing by <X time interval>
Alerts
Alerts use a single utterance: Create a new alert <alert name> when <condition 1> <operator> <condition 2> on <days> at <time> <time zone> with the message <message>
APIs
APIs use a single utterance: Create a new API <api name>
Datasets
Datasets use a single utterance: Create a new dataset <dataset name> with columns <column names> and values <values>
Parameters
You can use the following parameters with Create
.
Standard Column
The following parameters are available. Note that only one can be used for a given column:
column name
. The name of the new column.column value
. For case statements, this is the value to use in the new column when the expression is satisfied. Refer to the list of possible aggregations for more information.default value
. For case statements, this is the default value to use in the new column when the expression is not satisfied.expression
. An expression is typically a conditional predicate or a mathematical expression. For case statements, the available expressions depend on the type of the column used in the expression. Refer to the parameters for the Compute skill for more information. Note that math expressions are not supported in case statements.columns
. The name of an existing column(s) to concatenate with.text
. A free-text string.delimiter
. A character or string used to delimit the values of two or more concatenated columns.
Window Column
The available parameters include:
aggregation
(required):lag
. Returns the value of the current column a given number of rows behind the current row. If there is no row behind the current row, this parameter returns NULL.lead
. Returns the value of the current column a given number of rows after the current row. If there is no row after the current row, this parameter returns NULL.percentile with X groups
. Similar to the Bin skill, this aggregation divides the data based on what you’re sorting it by into the specified number of groups. For example, to create quartiles, you would useCreate a new window column <column name> as percentile with 4 groups <over> <partition> <order>
the first <column> value
. Returns the first value of the specified column in the given window.the last <column> value
. Returns the last value of the specified column in the given window.The X value at postion Y
. Returns the value of the specified column at the given row.rank/row number
. Returns the position of this row within the window. For example, if three rows in the window have values 1, 1, and 2,rank number
returns 1, 1, 3 androw number
returns 1, 2, 3, respectively.- Any of the
Compute
functions, with the exception ofmedian
anddistinct count
.
row range/value range
(optional). The window to use, such as a range of values.grouping columns
(optional). The columns to split the data by.sort columns
(optional). The columns to sort the data on before aggregating the data.order
(optional). If you're sorting by specific columns, this is the order (ascending or descending) to sort by.look back
(required). How far back to look in the data.look forward
(required). How forward to look in the data.
Datetime Column
The available parameters include:
datetime column
(required). The name of an existing date, time, or datetime column in your dataset.plus/minus
(required). Whether to add or subtract time from thedatetime column
.time interval
(required). The amount of time to add or subtract. The options include:- seconds
- minutes
- hours
- days
- months
- years
Temporal Column
The available parameters include:
temporal expression
(required). A temporal expression is a math expression that works primarily with date and time values. The options include:- A column in your dataset.
- Date(YYYY-MM-DD).
- Now. The current date and time.
- Timestamp(YYYY-MM-DD HH:MM:SS). Use the given date and time as a timestamp. Note that the time uses a 24-hour clock.
- Today. Today's date.
- Tomorrow. Tomorrow's date.
- Yesterday. Yesterday's date.
Time Series Column
The available parameters include:
start date
(required). The starting date and time. This parameter uses the formatMM-DD-YY
.start time
(optional). This parameter uses the formatHH:MM
and uses either a 12-hour or 24-hour clock.column name
(required). The name of the column.X time interval
(required). The amount of time each row should increase by. The options include:- seconds
- minutes
- hours
- days
- months
- years
Alerts
The available parameters include:
alert Name
(required). The name of the alert.condition 1
(required). A reference to a dataset using the format<dataset name>.<”any” or row index>.<column name>
. For example, if you wanted to use the first row of the Age column in a dataset called Titanic, enterTitanic.1.Age
. If you wanted to use any row in the Age column, enterTitanic.any.Age
.operator
(required). The comparison operator to use between the Condition 1 and Condition 2 parameters.condition 2
(required). Another reference to a dataset or a numeric value.days
(required). The days on which the alert should check the conditions.time
(required). The time at which the alert should check the conditions on the given days. Note that this parameter uses a 24 hour (HH:MM) format.time zone
(required). The time zone to use with thetime
parameter. The options include:CST
. Central Standard Time.EST
. Eastern Standard Time.MST
. Mountain Standard Time.PST
. Pacific Standard Time.
message
(optional). The message that should be sent to the recipient. This can be a combination of a string and a dataset reference. For example, if you wanted to include the age of the first passenger in the Titanic dataset, enter "The age of the first passenger is [Titanic.1.Age]."
APIs
The only available parameter is api name
. This is the name to give the API.
Datasets
The available parameters include:
dataset name
(required). The name of the new dataset.column names
(required). A comma-separated list of column names for the new dataset.values
(required). A comma-separated list of values to fill the columns. Note that these values populate the columns from left to right and wrap around. For example, if you have two columns, A and B, entering1, 2, 3, 4
here puts 1 in column A, 2 in column B, 3 in column A (in a new row), and 4 in column B (in the new row).
Output
If a new column is successfully created, it's added to the dataset and the updated dataset becomes [Dataset] v2 or the next incremental version value.
If an alert is successfully created, a success message is returned in the chat history and a text message is sent to confirm that the alert has been created.
If a dataset is successfully created, a success message is returned in the chat history and your new dataset appears in the display panel. Otherwise, an error message is returned.
Examples
Standard Column
In a dataset with a totalPoints
column and a numGames
column, you could add the averagePoints
column to the dataset using the utterance Create a new column averagePoints as totalPoints / numGames
.
To concatenate the columns Age and Gender from the dataset Titanic, enter Create a new column GenderAge using the delimeter , to concatenate the column Gender, the column Age
.
To concatenate the text "$" with the column Fare from the dataset Titanic, enter Create a new column Cost using the delimeter SPACE to concatenate the text DOLLAR SIGN, the column Fare
.
Window Column
In a COVID-19 dataset for each state where each row contains the number of cases reported in a certain state on a certain day, you could create a window column that contains the cases from the previous day using the utterance Create a new window column casesDayBefore as lag Cases by 1 for each State sorted by Date
.
Datetime Column
To create a date called followUpDate
which is one year after the value in the salesDate
column, enter Create a new datetime column called followUpDate as salesDate plus 1 year
.
Temporal Column.
To create a column called twoDaysAgo
, enter Create a new temporal column called twoDaysAgo as Today - 2 days
.
To create a column called daysSinceChristmas2020
, enter Create a new temporal column called daysSinceChristmas2020 as Today - Date(2020-12-25)
.
To create a column called 6Days
, enter Create a new temporal column called 6Days as 3 * 2 days
.
To create a column called HoursSinceDaylightSavings
, enter Create a new temporal column called DaylightSavingsTimeStart as Now - Timestamp(2020-03-14 02:00:00)
.
Alerts
In a dataset called TicketSales that contains a Price
column, you could create an alert to notify you when a high-priced ticket is purchased using the utterance Create a new alert highTicketAlert when TicketSales.any.Price is greater than or equal to the value 50 on all days at 8:00 CST with the message A high-priced ticket has been purchased!
.
Datasets
To create a dataset called "Employees," with the columns "Name" and "Age," and add Jack, 21; and Jill, 23; enter Create a new dataset Employees with columns Name, Age and values Jack, 21, Jill, 23
.
Behind the Scenes: Window Columns
You can think of a dataset as a single spreadsheet with rows and columns, such as an Excel file. Creating a standard column is as simple as adding a column to that spreadsheet and populating it. Creating a window column, however, is a bit more complicated. Take the utterance above, Create a new window column casesDaysBefore as lag Cases by 1 for each State sorted by Date
, as an example. What’s really happening here?
First, we take the data and split it based on the column in the for each
clause. In this case, we’re interested in the State
column. In our spreadsheet example, this would be the same as breaking our single spreadsheet into new spreadsheets that contain every row for each state.
Second, in each newly-created spreadsheet, we sort the data based on the column in the sort by
clause. In this case, we sort the data by the Date
column.
Third, we create the new window column and populate it row by row based on the provided column name and aggregation. In this case, the column name is casesDaysBefore
and the aggregation is to lag cases by 1
. To do this, we take each spreadsheet (which is now sorted chronologically) and compute the difference in cases from the day before by subtracting the value in the current row of the Cases
column from the row above it. The resulting value is then placed in the casesDayBefore
column.
Then, each state’s spreadsheet is combined together to recreate the original dataset. But, the resulting dataset has the new casesDayBefore
column and is sorted chronologically.