Skip to main content
Version: 0.22.2

Concatenate

Concatenate lets you combine two similar datasets into a single datasets. You can also choose whether to keep or discard any duplicate values that are found. In SQL terms, Concatenate performs a set union between the two datasets.

The two datasets must contain an equal number of columns, with the same column names. The order of the columns can differ. If two columns of the same name have different types, they will be cast to the most specific shared type. The columns are automatically ordered according to the column order of the dataset listed first in the DataChat sentence.

note

DataChat column names are case-insensitive.

Format

Concatenate uses the following utterances:

  • Concatenate the datasets <dataset1> and <dataset2>
  • Concatenate the datasets <dataset1> and <dataset2> <keep/remove all duplicates>

Parameters

Concatenate uses the following parameters:

  • dataset1 (required). The first dataset to concatenate with the second dataset.
  • dataset2 (required). The second dataset to concatenate with the first dataset.
  • keep/remove all duplicates (optional). Whether you want to keep all duplicate values in the final dataset or remove them from it. If unspecified, Concatenate keeps all duplicates by default.

Output

If the concatenation is successful, the combined dataset becomes [dataset1]_Concatenate.

Examples

Given:

  • a dataset "Sales2020" with a "SaleDate" column of Integer values
  • a dataset "Sales2021" with a "SaleDate" column of Date values

Concatenate the datasets Sales2020 and Sales2021

generates a new dataset called "Sales2020_Concatenate", with a "SaleDate" column of string values. The columns are automatically ordered according to the column order in "Sales2020".


Consider the following subsets of the "Sales2020" and "Sales2021" datasets:

Sales2020

SalespersonVolume(units)GrossSales
Tony1000$200,000
Raj1500$300,000
Gary1250$250,000
Emilee1375$275,000
Sarah1400$280,000

Sales2021

SalespersonGrossSalesVolume(units)
Tony$180,000900
Raj$300,0001500
Sam$240,0001200
Danny$200,0001000
Mariah$244,0001220

To combine these two datasets into one (and keep the duplicates), enter:

Concatenate Sales2020 and Sales2021 keep all duplicates.

The resulting dataset, containing two (duplicate) rows for Tony and Gary:

Sales2020_Concatenate

SalespersonVolume(units)GrossSales
Tony1000$200,000
Raj1500$300,000
Gary1250$250,000
Emilee1375$275,000
Sarah1400$280,000
Tony900$180,000
Raj1500$300,000
Sam1200$240,000
Danny1000$200,000
Mariah1220$244,000

To keep only the unique rows, enter:

Concatenate Sales2020 and Sales2021 remove all duplicates

The resulting dataset of unique rows:

Sales2020_Concatenate

SalespersonVolume(units)GrossSales
Tony1000$200,000
Raj1500$300,000
Gary1250$250,000
Emilee1375$275,000
Sarah1400$280,000
Sam1200$240,000
Danny1000$200,000
Mariah1220$244,000