Concatenate
Concatenate
lets you combine two similar datasets into a single datasets. You can also choose whether to keep or discard any duplicate values that are found. In SQL terms, Concatenate
performs a set union between the two datasets.
The two datasets must contain an equal number of columns, with the same column names. The order of the columns can differ. If two columns of the same name have different types, they will be cast to the most specific shared type. The columns are automatically ordered according to the column order of the dataset listed first in the recorded step.
Format
Concatenate
uses the following formats:
Concatenate the datasets <dataset1>, <version> and <dataset2>, <version>
Concatenate the datasets <dataset1> and <dataset2> (keep | remove) all duplicates>
Parameters
Concatenate
uses the following parameters:
dataset1
(required). The first dataset to concatenate with the second dataset.dataset2
(required). The second dataset to concatenate with the first dataset.version
(optional). The specified version of the dataset to use.keep/remove all duplicates
(optional). Whether you want to keep all duplicate values in the final dataset or remove them from it. If unspecified,Concatenate
keeps all duplicates by default.
Output
If the concatenation is successful, the combined dataset becomes [dataset1]_Concatenate.
Examples
Given:
- a dataset "Sales2020" with a "SaleDate" column of Integer values
- a dataset "Sales2021" with a "SaleDate" column of Date values
Concatenate the datasets Sales2020 and Sales2021
generates a new dataset called "Sales2020_Concatenate", with a "SaleDate" column of string values. The columns are automatically ordered according to the column order in "Sales2020".
Consider the following subsets of the "Sales2020" and "Sales2021" datasets:
Sales2020
Salesperson | Volume(units) | GrossSales |
---|---|---|
Tony | 1000 | $200,000 |
Raj | 1500 | $300,000 |
Gary | 1250 | $250,000 |
Emilee | 1375 | $275,000 |
Sarah | 1400 | $280,000 |
Sales2021
Salesperson | GrossSales | Volume(units) |
---|---|---|
Tony | $180,000 | 900 |
Raj | $300,000 | 1500 |
Sam | $240,000 | 1200 |
Danny | $200,000 | 1000 |
Mariah | $244,000 | 1220 |
To combine these two datasets into one (and keep the duplicates), enter:
Concatenate Sales2020 and Sales2021 keep all duplicates
.
The resulting dataset, containing two (duplicate) rows for Tony and Gary:
Sales2020_Concatenate
Salesperson | Volume(units) | GrossSales |
---|---|---|
Tony | 1000 | $200,000 |
Raj | 1500 | $300,000 |
Gary | 1250 | $250,000 |
Emilee | 1375 | $275,000 |
Sarah | 1400 | $280,000 |
Tony | 900 | $180,000 |
Raj | 1500 | $300,000 |
Sam | 1200 | $240,000 |
Danny | 1000 | $200,000 |
Mariah | 1220 | $244,000 |
To keep only the unique rows, enter:
Concatenate Sales2020 and Sales2021 remove all duplicates
The resulting dataset of unique rows:
Sales2020_Concatenate
Salesperson | Volume(units) | GrossSales |
---|---|---|
Tony | 1000 | $200,000 |
Raj | 1500 | $300,000 |
Gary | 1250 | $250,000 |
Emilee | 1375 | $275,000 |
Sarah | 1400 | $280,000 |
Sam | 1200 | $240,000 |
Danny | 1000 | $200,000 |
Mariah | 1220 | $244,000 |