Charts and whatnot

Queenie Pamatian
4 min readOct 1, 2020

Let me tell you a story about insurance, but first you have to bare with me as I am not yet proficient with the tool that I will be using to paint you a picture.

I was able to acquire an anonymous data about insurance and I would like to gather some insights about it. I will try to make use of pandas and matplotlib libraries . Let’s have a look with the data first.

insurance.csv

Okay, so there are few null values. I would like to disregard them so that they will not have much bearing when I employ statistics.

code
output

All good, we were only able to remove 4 rows. I also noticed that the label of gender is inconsistent some are labeled male, female, and there are also woman and man, so I have to sync them by running these code:

df = df.replace({“man”:”male”})
df = df.replace({‘woman’:’female’})

Same goes for the region.

What I want to know is which group pays the most insurance. I would like to know it according to the following categories: bmi, and number of children.

code: print(df[[‘sex’, ‘charges’]].groupby([‘sex’]).agg([“mean”, “max”, “min”]))

charges / sex

As you can see, the male population is pays higher insurance. It is possible to consider that it is because the average number of children and bmi of men are slightly higher as well.

bmi and number of children / sex

For the number of children, what is very noticeable is that those with the most number of children pay the least amount for insurance. Whereas the other age group does not have a wide range of the amount they are paying for.

charges / number of children

Meanwhile, let’s look closer at the BMI. I was able to create a new column for my data which is the BMI status. It tells us whether the person is underweight, normal, overweight or obese.

I used the below code to create a new column:

rating = [
(df[‘bmi’]< 18.5),
(df[‘bmi’]>= 18.5) & (df[‘bmi’] < 24.9),
(df[‘bmi’]>= 24.9) & (df[‘bmi’] < 29.9),
(df[‘bmi’]>= 29.9),
]

label = [‘underweight’,’normal’,’overweight’,’obese’]
df[‘bmi_stats’] = np.select(rating,label)

bmi/sex

There are also more male that has a BMI status of overweight to obese. As you can see as well, obese people pay most for the insurance.

Let’s try to visualize the data.

charges / gender

In here we can see that the average insurance being charged to men is a bit higher than with what is being charged with women.

We would like to know first if the number of children may have affected the difference. keep in mind that the average number of children for male is 1.12, whereas 1.07 for female. Looking at the chart below, we can see that the average charges being paid according to the number of children is quite close per category, with the exception of those who have 5 children. Hence, what we can only infer here is that it reaches a point that the more children you have, the less willing you are to pay for high insurance.

charges paid according to number of children

Meanwhile, the graph below shows that more obese and overweight people are paying for higher insurance.

Charges paid according to BMI
BMI stats / gender

And since we can see that there are more overweight and obese male than female, we might consider the BMI status as a factor for the price difference between insurance.

Feel free to look at my codes here: https://github.com/ftwqueenie/pythoncharts

--

--