Insurance
In this section, we'll investigate health insurance charges using a US Health Insurance dataset.
Question
Analytics is applied to all stages of insurance coverage, including policy creation, risk determination, and fraud investigations. As a customer seeking health insurance, it's important to know how one's current demographic and health can impact overall health insurance costs. Can we use the Data Assistant to investigate health insurance data to make informed decisions about insurance charges?
Challenges
The insurance industry is exceptionally large with many complexities. Ranging from large enterprises to an individual, the insurance industry has a substantial scale of impact on millions of people in the United States. Several challenges can make analyzing this data complex:
- Technical barriers. Advanced data analytics tools, such as machine learning models and data visualizations, might be needed to analyze large amounts of data effectively.
- Data and result interpretation. Data cannot always reveal the human component of information. An individual may report having cancer, but that does not provide context as to how long they've had it, the cancer types, how it's been treated, and so on.
- Bias and error. Interpretation of results can be influenced by individual experience and perspectives.
Method
Load Data
Let's get an idea of the data we're working with. Upload the US Health Insurance Dataset into your session. Note that this downloads as a .zip folder from Kaggle. Extract the contents of the .zip folder and upload the .csv file into DataChat.
This dataset contains 1338 rows of data where the insurance charges are given against the following attributes of the insured:
- age. Age of the primary beneficiary.
- sex. Insurance contractor gender (male/female).
- bmi. Body mass index of the primary beneficiary.
- children. Number of children covered by health insurance.
- smoker. Whether the beneficiary is a smoker.
- region. Beneficiary's residential area in the US.
- charges. Individual medical costs billed by health insurance.
Investigate Statistics
Let's say we'd like to better understand the average insurance charges by region. Using the Data Assistant, we can ask "What are the average insurance charges by region?"
From Ava's output we can see that the southwest region has the lowest average insurance charges, $12,346.94, followed by the northwest region, $12,417.58. The most expensive region is the southeast, $14,735.41, well over $2000 more expensive than the lowest two regions.
Let's add this chart to the Chart tab for a larger view by clicking Add to Chart tab:
Use Machine Learning
From here, let's say that we'd now like to dive a bit deeper into understand what contributes to overall insurance charges. Navigate to the Data Assistant tab. We can ask "What factors are most important in predicting insurance charges?"
The Data Assistant's first output is an impact chart, showing the most impactful to the least impactful columns. We can see that whether or not the primary beneficiary is a smoker has the most impact on the overall charges made by health insurance companies. This is followed by the beneficiary's age and bmi.
If we scroll to the third chart, "smoker vs. ageInt6", we can see that non-smokers charges never succeed $41,121.87 regardless of age, while smokers overall cost continues to increase above $41,121.87 as age also increases.
Scrolling to the fourth chart, "charges vs. bmi", we can view another chart that displays the relationship between BMI, charges, and whether or not the beneficiary is a smoker. We can see some trend that as BMI increase, so do charges. However, more notably we can see that regardless of BMI, smokers are consistently paying more than non-smokers.
Results
Through our analysis, we have identified several factors that most influence the health insurance charges that individuals face, including:
- Which region in the United States the beneficiary lives.
- Whether the beneficiary is a smoker.
- The beneficiary's BMI.
We have observed that the most influential factor of these listed is whether or not the beneficiary is a smoker. Although BMI and living region have some impact on the overall charges an individual faces, smoking consistently increases health insurance costs across the board. From this we can conclude that smoking is the most important factor to consider when analyzing insurance charges.