Create
Create
lets you add a standard, window, temporal, or time series column to your dataset, APIs to host your data, and create datasets from scratch.
When creating a column, the new column is prepended to the front (the left side) of your dataset and can be populated in several ways, including concatenating the values of two other columns, performing a calculation, or simply filling the column with text. When you create a new column, its type is inferred by the data it’s populated with.
There are five distinct column types:
- A standard column simply contains text or a numeric value. You can also use a case statement to conditionally set the value of the column.
- A window column runs calculations based on a “window” or “subset” of the data in the dataset. Refer to the Window Column section below for more information.
- A datetime column contains date or time values. Unlike temporal columns, you can only add or remove amounts of time (such as days, hours, or minutes) from the values in an existing date or time column.
- A temporal column contains date or time values and allows you to use math expressions with time values, such as calculating the number of days between dates or adding minutes to a time value. See the Temporal Column section below for more information.
- A time series column contains date or time values that allows you to create a column where each row is an increasing sequence of time based on a set interval (such as hours or days).
Format
Create
has several formats depending on what object you're creating.
Standard Columns
Create a new column <column name> as <expression>
Create a new column <column name> with text <text>
Create a new column <column name> by concatenating the (column <column> | text <string>) using the delimeter <delimiter>
Create a new column <column name> setting it as <column> when <predicate>, as <column> otherwise
. This format uses a case statement.
Window Columns
Window columns use a single format: Create a new window column <column name> as <aggregation> (for each <grouping columns> | for the entire table) (ordered in no particular way | sorted by the columns <sort columns>) computed over the (row range | value range) that looks back from the current position (by <look back> | to the beginning) and looks forward (by <look forward> | to the end)
Datetime Columns
Datetime columns use a single format: Create a new datetime column <column name> as <datetime column> <plus/minus> <time interval>
.
Temporal Columns
Temporal columns use a single format: Create a new temporal column <column name> as <temporal expression>
APIs
APIs use a single format: Create a new API <api name>
Datasets
Datasets use a single format: Create a new dataset <dataset name> with columns <column names> and values <values>
Parameters
You can use the following parameters with Create
.
Standard Column
The following parameters are available. Note that only one can be used for a given column:
column name
. The name of the new column.predicate
. For case statements, this is the value to use in the new column when the expression is satisfied. Refer to the list of possible predicates for more information.expression
. An expression is typically a conditional predicate or a mathematical expression. For case statements, the available expressions depend on the type of the column used in the expression. Refer to the parameters for theCompute
skill for more information. Note that math expressions are not supported in case statements.column
. The name of an existing column(s) to concatenate with.string
. A free-text string.delimiter
. A character or string used to delimit the values of two or more concatenated columns.
Window Column
The available parameters include:
aggregation
(required):lag
. Returns the value of the current column a given number of rows behind the current row. If there is no row behind the current row, this parameter returns NULL.lead
. Returns the value of the current column a given number of rows after the current row. If there is no row after the current row, this parameter returns NULL.percentile with X groups
. Similar to the Bin skill, this aggregation divides the data based on what you’re sorting it by into the specified number of groups. For example, to create quartiles, you would useCreate a new window column <column name> as percentile with 4 groups <over> <partition> <order>
the first <column> value
. Returns the first value of the specified column in the given window.the last <column> value
. Returns the last value of the specified column in the given window.The X value at postion Y
. Returns the value of the specified column at the given row.rank/row number
. Returns the position of this row within the window. For example, if three rows in the window have values 1, 1, and 2,rank number
returns 1, 1, 3 androw number
returns 1, 2, 3, respectively.- Any of the
Compute
functions, with the exception ofmedian
anddistinct count
.
row range/value range
(optional). The window to use, such as a range of values.grouping columns
(optional). The columns to split the data by.sort columns
(optional). The columns to sort the data on before aggregating the data.look back
(required). How far back to look in the data.look forward
(required). How forward to look in the data.
Datetime Column
The available parameters include:
datetime column
(required). The name of an existing date, time, or datetime column in your dataset.plus/minus
(required). Whether to add or subtract time from thedatetime column
.time interval
(required). The amount of time to add or subtract. The options include:- seconds
- minutes
- hours
- days
- months
- years
Temporal Column
The available parameters include:
temporal expression
(required). A temporal expression is a math expression that works primarily with date and time values. The options include:- A column in your dataset.
- Date(YYYY-MM-DD).
- Now. The current date and time.
- Timestamp(YYYY-MM-DD HH:MM:SS). Use the given date and time as a timestamp. Note that the time uses a 24-hour clock.
- Today. Today's date.
- Tomorrow. Tomorrow's date.
- Yesterday. Yesterday's date.
Time Series Column
The available parameters include:
start date
(required). The starting date and time. This parameter uses the formatMM-DD-YY
.start time
(optional). This parameter uses the formatHH:MM
and uses either a 12-hour or 24-hour clock.column name
(required). The name of the column.X time interval
(required). The amount of time each row should increase by. The options include:- seconds
- minutes
- hours
- days
- months
- years
APIs
The only available parameter is api name
. This is the name to give the API.
Datasets
The available parameters include:
dataset name
(required). The name of the new dataset.column names
(required). A comma-separated list of column names for the new dataset.values
(required). A comma-separated list of values to fill the columns. Note that these values populate the columns from left to right and wrap around. For example, if you have two columns, A and B, entering1, 2, 3, 4
here puts 1 in column A, 2 in column B, 3 in column A (in a new row), and 4 in column B (in the new row).
Output
If a new column is successfully created, it's added to the dataset and the updated dataset becomes [Dataset] v2 or the next incremental version value.
If a dataset is successfully created, a success message is returned in the conversation history and your new dataset appears in the Data tab. Otherwise, an error message is returned.
Examples
Standard Column
In a dataset with a totalPoints
column and a numGames
column, you could add the averagePoints
column to the dataset using the skill pattern Create a new column averagePoints as totalPoints / numGames
.
To concatenate the columns Age and Gender from the "Titanic Dataset", enter Create a new column GenderAge using the delimeter , to concatenate the column Gender, the column Age
.
To concatenate the text "$" with the column Fare from the dataset Titanic, enter Create a new column Cost using the delimeter SPACE to concatenate the text DOLLAR SIGN, the column Fare
.
Window Column
In a COVID-19 dataset for each state where each row contains the number of cases reported in a certain state on a certain day, you could create a window column that contains the cases from the previous day using the skill pattern Create a new window column casesDayBefore as lag Cases by 1 for each State sorted by Date
.
Datetime Column
To create a date called followUpDate
which is one year after the value in the salesDate
column, enter Create a new datetime column called followUpDate as salesDate plus 1 year
.
Temporal Column
To create a column called twoDaysAgo
, enter Create a new temporal column called twoDaysAgo as Today - 2 days
.
To create a column called daysSinceChristmas2020
, enter Create a new temporal column called daysSinceChristmas2020 as Today - Date(2020-12-25)
.
To create a column called 6Days
, enter Create a new temporal column called 6Days as 3 * 2 days
.
To create a column called HoursSinceDaylightSavings
, enter Create a new temporal column called DaylightSavingsTimeStart as Now - Timestamp(2020-03-14 02:00:00)
.
Datasets
To create a dataset called "Employees," with the columns "Name" and "Age," and add Jack, 21; and Jill, 23; enter Create a new dataset Employees with columns Name, Age and values Jack, 21, Jill, 23
.
Behind the Scenes: Window Columns
A dataset contains rows and columns. Creating a standard column is as simple as adding a column to that dataset and populating it. Creating a window column, however, is a bit more complicated. Take the step above, Create a new window column casesDaysBefore as lag Cases by 1 for each State sorted by Date
, as an example. What’s really happening here?
First, we take the data and split it based on the column in the for each
clause. In this case, we’re interested in the State
column. In our dataset example, this would be the same as breaking our single dataset into new datasets that contain every row for each state.
Second, in each newly-created dataset, we sort the data based on the column in the sort by
clause. In this case, we sort the data by the Date
column.
Third, we create the new window column and populate it row by row based on the provided column name and aggregation. In this case, the column name is casesDaysBefore
and the aggregation is to lag cases by 1
. To do this, we take each dataset (which is now sorted chronologically) and compute the difference in cases from the day before by subtracting the value in the current row of the Cases
column from the row above it. The resulting value is then placed in the casesDayBefore
column.
Then, each state’s dataset is combined together to recreate the original dataset. But, the resulting dataset has the new casesDayBefore
column and is sorted chronologically.