Let's say that I have a dataframe and I want to create a model that estimates the blood pressure of patients. After doing an exploratory analisis with with a training set, I've decided that what's better is to create two models depending on the sex.
data # weight height age Hours_active sex Blood_Pressure #1 74 1.68 22 5 Male 0.8 #2 85 1.83 25 7 Male 0.95 #3 58 1.58 21 2 Woman 0.97 #4 80 1.72 20 4 Woman 0.23 #5 70 1.72 24 1 Woman 0.40
I split for males and for females
data_males = data[data['sex'] == "Male"].drop(['sex'], axis=1) data_females = data[data['sex'] == "Woman"].drop(['sex'], axis=1)
Then I create a model with this
X_train_m= data_males.drop(['blood_pressure'], axis=1) y_train_m = data_males['blood_pressure'] X_train_f= data_females.drop(['blood_pressure'],axis=1) y_train_f = data_females['blood_pressure'] lin= LinearRegression() model_males= lin.fit(X_train_m, y_train_m) SGD =StochasticGradientDescent() model_females= SGD.fit(X_train_f, y_train_f)
So now let's imagine that they give a dataframe just like the one before and they tell me to predict the values of blood pressure. I could split the dataframe into sexes, apply each model and then concat them, but it affects my index
Is there any way to apply two models on a dataset depending on the value of one column, or even merge two models to create one that considers the sex?
The way to apply any function, selectively or otherwise to a dataframe (or list) is to apply a lambda function. In your case, let's say your dataframe is called test_df. Assuming you have two functions that take height, weight and age values as parameters and returns the blood pressure, eg bp_males and bp_females, you would use:
test_df['blood_pressure'] = test_df.apply(lambda x: bp_males(x['height'], x['weight'], x['age']) if x['sex'] == 'Male' else bp_females(x['height'], ['weight'], x['age']))