Skip to main content

Add Weighting to Classification Models

Load Data

To begin, download the Students Adaptability Level in Online Education dataset to your machine. Note that this downloads as a .zip file. Open the .zip file and load the .csv file.

We're then given a dataset that looks something like this:

loaded data

Our dataset contains the following columns:

  • Gender. Gender of the student.
  • Age. Age range of the student.
  • Education_Level. Education institution level.
  • Institution_Type. Type of educational institution.
  • IT_Student. Whether the student is studying IT.
  • Location. Whether or not the student is in-town.
  • Load-shedding. Level of load shedding.
  • Finacial_Condition. Financial condition of the student's family.
  • Internet_Type. Internet type used on the student's device.
  • Network_Type. Network connectivity type.
  • Class_Duration. Daily class duration.
  • Self_Lms. Whether or not the institution has their own learning management system.
  • Device. Device mostly used in class.
  • Adaptability_Level. Adaptability level of the student.

Rename Column

Before jumping into machine learning, let's first rename the "Load-shedding" column to instead use an underscore. Simply double-click the column name and enter "Load_shedding".

rename column

Train the Target Column

With our column cleaned, let's now use some machine learning techniques to reveal patterns. The Train Model skill trains a set of machine learning models and selects the model that best predicts how other columns in our dataset impact our target column:

  1. Click Machine Learning > Train Model in the skill menu.

  2. Select "Adaptivity_Level" for the target column.

  3. Click Submit.

    train form

    This uses default optimizations for our model. Train Model might take some time to train various models on the data to select the best one.

Once Train Model is complete, an impact chart appears in the Chart tab, showing how each column in the dataset impacts the target column.

impact chart

Hover over each bar in the chart to view more details. We can see that "Class_Duration" and "Financial_Condition" have the most impact on "Adaptivity_Level".

From here, we can click Visualize in the chart header. This provides a dropdown of different auto-generated charts that visualize the most impactful columns. By default, Visualize opens with Chart1A, in this case a stacked bar chart displaying the adaptability level for different class durations.

visualize section

We can see that institutions with class durations less than an hour have significantly higher instances of low adaptability levels.

Let's investigate the models used in training. Click on Models to learn more about the model, which is called "BestFit1".

models section

This shows us that two model types were used in training, a LightGBM Classifier with an accuracy score of 91%, and CatBoost Classifier with an accuracy score of 82%.

Tune the Model with Target Weighting

Earlier, we noted the inherent imbalance in the dataset from our initial Visualize chart. To compensate for the imbalance, Train ML Model automatically uses industry-standard imbalance handling mechanisms, such as oversampling and auto label-weighting, to focus on high adaptability levels.

Let's tune our model to a specific set of adaptivity levels. This will give us a model that determines which factors are important only for "Low" and "High" adaptivity levels. We'll assign weights to these labels:

  1. In the Data tab, click Machine Learning > Train Model in the skill menu.
  2. Select "Adaptability_Level" for the target column.
  3. Click Advanced Options.
  4. Click Fix Class Imbalances to open a dropdown, then switch the Custom Label Weighting toggle "On".
  5. Enter "High" for the first label and "100" for its weight, "Low" for the second label and "100" for its weight, and "Moderate" for the third label and "10" for it's weight.
  6. Click Submit.

train form

impact chart

We can see that the impact has changed significantly. Although "Class_Duration" is still the most impactful, "Age" and "Institution_Type" have much more impact. We can also see that "Gender" has also increased in overall impact on "High" and "Low" instances of online adaptability.

Let's click Confusion Matrix from the chart header to visualize the model's AUC scores.

confusion matrix

The confusion matrix reveals to us that the predicted values of "AdaptabilityLevel" are accurate in comparison to the actual values displayed in our data.

Let's use this model, BestFit2, to Predict on our dataset. From the Data tab:

  1. Click Machine Learning > Predict.

  2. Select "BestFit2" for the model.

  3. Select "Dataset" for the content.

  4. Click Submit.

    predict form

We're then given a "PredictionsAdaptivity_Level" dataset that looks something like this:

prediction dataset

We can see that two columns have been added to the left side of the dataset, "Adaptivity_LevelPredited" and "Adaptivity_LevelActual", that show the predicted values and actual values.

Plot Prediction Accuracy

Let's visualize the accuracy of our predictions. First, let's create a new column that compares "Adaptivity_LevelPredited" and "Adaptivity_LevelActual", setting the value to "true" if they match and "false" if they don't:

  1. Click Add Column > Using Conditional Values in the skill menu.
  2. Enter "Comparison" for the column name.
  3. Enter "True" for the value, "Adaptivity_LevelPredited" for the column, "is equal to" for the expression, "the column" for the value type, and "Adaptivity_LevelActual" for the column.
  4. Click Add Another Option.
  5. Enter "False" for the value, "Adaptivity_LevelPredited" for the column, "is not equal to" for the expression, "the column" for the value type, and "Adaptivity_LevelActual" for the column.
  6. Enter "False" for the default value.
  7. Click Submit.

comparison column

We're then given a dataset with an added "Comparison" column:

comparison dataset

From here, let's plot a donut chart to visualize this comparison:

  1. Click Plot Chart in the skill menu.
  2. Select "Donut Chart" for the type, and "Comparison" for the split.
  3. Click Submit.

donut chart

The resulting donut chart shows us that our predictions were 87.88% accurate.