Skip to main content

Use Rename, Keep, and Join

In this example, we'll explore two datasets, learning how to keep or drop specific columns and extend datasets together to test our hypothesis that a country's overall happiness is affected more by alcohol consumption than by GPD per capita.

Load Data

To get started, click on the links below to download the 2018 World Happiness Report dataset and the 2018 Alcohol Consumption dataset.

Now we can Load the datasets into the session. We now see two datasets that look like this:

Alcohol Consumption dataset

2018 Alcohol Consumption

This dataset contains the following columns:

  • Entity. The country.
  • Code. A code given for the country name.
  • Year. The year of the study.
  • TotalAlcoholConsumptionPerCapita. The total alcohol consumption per capita.

World Happiness dataset

2018 Happiness Report

This dataset contains the following columns:

  • OverallRank. The rank of a country's happiness compared to others.
  • CountryOrRegion. The country or region.
  • Score. The overall happiness score.
  • GDPPerCapita. The GDP per capita.
  • SocialSupport. How much support the country provides.
  • HealthyLifeExpectancy. The average life expectancy.
  • FreedomToMakeLifeChoices. How much freedom one has to make life choices.
  • Generosity. How generous their fellow citizens are.
  • PerceptionsOfCorruption. How corrupt the country appears to be.

Rename Datasets

The names of our datasets contain a lot of random characters and are not very clear. Before moving forward, let's rename them to "Alcohol_Consumption" and "Happiness2018". To rename the datasets:

  1. Click the More menu of the dataset name from the list of datasets on the left.
  2. Click Duplicate.
  3. Enter the new name for the dataset and click Submit.
  4. Delete the old version.

renamed datasets

Clean Our Data

To be able to analyze how alcohol consumption affects a country's overall happiness, we must first clean and wrangle our data to make sure it's ready to use with machine learning.

2018 World Happiness Data

We can now begin cleaning the Happiness2018 dataset. Currently, our dataset has information that we don't really need in order to analyze the impact alcohol consumption has on overall happiness. Let's drop the unneeded columns:

  1. Click the More menu of the "OverallRank" column.
  2. Click Drop.

Now our dataset no longer has the OverallRank column:

dropped column

Alcohol Consumption Data

We can now begin cleaning our other dataset. Lets make sure we're working with the right one. Click the dataset name, "Alcohol-Consumption" to make it the current dataset.

Like our Happiness2018 dataset, this dataset has information that we don't really need in order to analyze the impact alcohol consumption has on overall happiness. Since we only want data from 2018, let's use the Keep Rows skill to keep only the rows we need:

  1. Click Wrangle > Keep Rows in the skill menu.
  2. Enter "Year" for the column, "is equal to" for the expression, "the value" for the value type, and "2018" for the value.
  3. Click Submit.

the keep form

Now we see a dataset with reports from only the year 2018.

Keep Rows

From here, we can also Drop columns we don't really need to analyze, including the columns Code and Year. To drop columns:

  1. Click Wrangle > Drop Columns in the skill menu.
  2. Enter "Code" and "Year" for the columns.
  3. Click Submit.

drop form

Our dataset now looks something like this:

Dropped Columns

From here, we can use the Rename skill again to match the column names for the countries between the datasets. This will make it easier to join the datasets later:

  1. Double-click on the column name "Entity".
  2. Enter "CountryOrRegion"
  3. Press Enter.

rename column

Join Our Data

From here, we can now combine the our two datasets using the Join skill. This way, all of the relevant information can be found in one dataset.

Let's join the cleaned 2018 World Happiness Dataset with our cleaned 2018 Alcohol Consumption dataset:

  1. Click the "Happiness2018" dataset to make it the current dataset.
  2. Click Combine > Join in the skill menu.
  3. Enter "Alcohol_Consumption" for the dataset to join with.
  4. Click Submit.

extend form

The new dataset should look something like this:

Extend Dataset

As you can see, TotalAlcoholConsumptionPerCapita has been added to the Happiness2018 dataset, and is now called Happiness2018_Extend.