Questions & Answers

Is there anyway to apply different prediction models to a dataframe depending on the value of a column?

Let's say that I have a dataframe and I want to create a model that estimates the blood pressure of patients. After doing an exploratory analisis with with a training set, I've decided that what's better is to create two models depending on the sex.

    #  weight height age  Hours_active   sex   Blood_Pressure
    #1     74   1.68  22     5         Male    0.8
    #2     85   1.83  25    7          Male    0.95
    #3     58   1.58  21    2          Woman   0.97
    #4     80   1.72  20    4          Woman   0.23
    #5     70   1.72  24    1          Woman   0.40

I split for males and for females

data_males = data[data['sex'] == "Male"].drop(['sex'], axis=1)
data_females = data[data['sex'] == "Woman"].drop(['sex'], axis=1)

Then I create a model with this

X_train_m= data_males.drop(['blood_pressure'], axis=1)
y_train_m =  data_males['blood_pressure']

X_train_f= data_females.drop(['blood_pressure'],axis=1)
y_train_f =  data_females['blood_pressure']

lin= LinearRegression()
model_males=, y_train_m)

SGD =StochasticGradientDescent()
model_females=, y_train_f)

So now let's imagine that they give a dataframe just like the one before and they tell me to predict the values of blood pressure. I could split the dataframe into sexes, apply each model and then concat them, but it affects my index

Is there any way to apply two models on a dataset depending on the value of one column, or even merge two models to create one that considers the sex?

Answers(1) :

The way to apply any function, selectively or otherwise to a dataframe (or list) is to apply a lambda function. In your case, let's say your dataframe is called test_df. Assuming you have two functions that take height, weight and age values as parameters and returns the blood pressure, eg bp_males and bp_females, you would use:

test_df['blood_pressure'] = test_df.apply(lambda x: bp_males(x['height'], x['weight'], x['age']) if x['sex'] == 'Male' else bp_females(x['height'], ['weight'], x['age']))