I have a pandas dataframe with following columns

```
col1 col2 col3
x 12 abc
x 7 abc
x 5 abc
x 3
y 10 abc
y 9 abc
```

I would like to find all rows in a pandas DataFrame which have the max value for col2 column, after grouping by 'col1' columns after filtering the rows where col3 is null?

The expected output is:

```
col1 col2 col3
x 12 abc
y 10 abc
```

I have tried the below code so far.

```
df[df[['col3']].notnull().all(1) & df.sort_values('col2').drop_duplicates(['col1'], keep='last')]
```

However I am getting following error.

```
TypeError: unsupported operand type(s) for &: 'bool' and 'float'
```

Any help is highly appreciated

Answers(2) :

another solution

```
import pandas as pd
lstColumns=["col1","col2","col3"]
lstValues=[["x",12,"abc"],["x",7,"abc"],["x",5,"abc"],["x",3,"abc"],["y",10,"abc"],["y",9,"abc"]]
df=pd.DataFrame(lstValues,columns=lstColumns)
df=df.sort_values(['col1', 'col2'], ascending=[True, True])
newdf=df.drop_duplicates(subset='col1', keep="last")
col1 col2 col3
0 x 12 abc
4 y 10 abc
```

How

`max`

method calculate without mentioning the column?

According to the `pd.DataFrame.max`

it returns the maximum values over the selected axis with default being (0, index).

In your example you only have 1 numeric values and all values in col3 are identical. If col3 was also numeric, `max`

method would return the max values of that column as well with the resulting DataFrame may have different rows than the original one.

It is suitable in this case but if you only would like the output DataFrame have the rows as the original one you need to be specific about the column whose maximum you would like to consider.

```
df.loc[df.notnull().all(axis=1)].groupby('col1').max().reset_index()
col1 col2 col3
0 x 12 abc
1 y 10 abc
```

Or you can create a boolean Series first and assign it to a name to improve readability:

```
m = df.notnull().all(axis=1)
df.loc[m].groupby('col1').max().reset_index()
```

Let's say now this is your original DataFrame:

```
col1 col2 col3
0 x 12 2.0
1 x 7 20.0
2 x 5 1.0
3 x 3 NaN
4 y 10 4.0
5 y 9 11.0
```

When you apply `max`

on this without specifying the column name it will return the following:

```
col1 col2 col3
0 x 12 20.0
1 y 10 11.0
```

Comments:

2023-01-18 00:30:18

One question around that: How does the max calculated without mentioning the column name.

2023-01-18 00:30:18

I added some notes you can check them. Let me know if you need further explanations.