Skip to main content

Combine

Combining data helps to unify your data from multiple sources into a single table. You can combine your data in two ways:

  • Using Join to combine the contents of one dataset with another dataset.
  • Using Concatenate to combine two similar datasets with the same columns, including name, number, and type, into a single dataset.

combine skill menu

note

When a skill is applied to a dataset:

  • If the skill creates a new dataset, it will use the convention [dataset]_[Skill].
  • If the skill alters your existing dataset, it will use the convention [dataset] v[x] to save to a new version.

Join

To create a new dataset that combines two datasets with different columns, you can use the Join skill and optionally indicate how the rows in the resulting dataset should be joined based on the columns in the two datasets:

  • Designate existing columns to match between the datasets.
  • Create shared columns to match between the datasets.
  • Without shared columns, calculate a Cartesian product across both datasets.

To join two datasets together:

  1. Click Combine > Join in the skill menu.
  2. Select the dataset you want to join with your current dataset.
  3. Optionally, specify the columns to use as a join key between the datasets. A join key is used to match values between the two datasets.
  4. Click Submit.

join

note

Joining datasets with shared columns:

Ensure that columns you intend to match between the two datasets share the same column names and types. The initial dataset is extended via a left-join with the second dataset. For each row of the initial dataset, if values match in the specified columns with the second dataset, that row of the initial dataset is extended with the columns from the second dataset. The extended columns are populated with the values from the second dataset's row.

Ensure that columns you don't intend to match are named differently. For example, to refrain from matching on different "id" columns in datasets called "Person" and "Office", rename the columns to "Person_id" and "Office_id" if you want to retain the full content of each "id" column..

Joining datasets without shared columns:

If you extend one dataset with another dataset, without a shared column or a designated primary/foreign key relationship, the resulting operation, a Cartesian product, is computationally expensive and time-consuming. It can result in a very large dataset. DataChat prompts if you wish to continue with this unusual request, and computes a Cartesian product if you click "Yes".

Concatenate

To create a new dataset that combines two datasets with the same columns, including name, number, and type, you can use the Concatenate skill.

To concatenate two datasets together:

  1. Click Combine > Concatenate in the skill menu.
  2. Select the datasets to concatenate. Note that you must pick at least two datasets and they must already be loaded into your session.
  3. Choose how to handle duplicate values. By default, Concatenate keeps all duplicates
  4. Click Submit.

concatenate