I have a dataset of patient surgery. Many of the patients have had multiple operations and the value_counts aggregation of their multiple operation codes (there are 4 codes) is shown below.
['O011'] 2785
['O012'] 1813
['O011', 'O011'] 811
['O013'] 532
['O012', 'O012'] 522
['O014'] 131
['O013', 'O013'] 125
['O014', 'O014'] 26
['O012', 'O011'] 24
['O011', 'O012'] 20
['O011', 'O011', 'O011'] 14
['O011', 'O013'] 12
['O012', 'O012', 'O011'] 6
['O011', 'O012', 'O012'] 6
['O011', 'O011', 'O011', 'O011'] 5
['O013', 'O013', 'O013'] 5
['O013', 'O011'] 4
['O012', 'O012', 'O012'] 4
['O012', 'O013'] 3
['O013', 'O014'] 3
['O011', 'O013', 'O013'] 3
['O012', 'O014'] 3
['O011', 'O012', 'O011'] 2
['O012', 'O013', 'O013'] 2
['O011', 'O014'] 2
['O013', 'O012', 'O012'] 2
['O014', 'O014', 'O014'] 2
['O013', 'O012'] 1
['O012', 'O012', 'O013', 'O013', 'O013'] 1
['O012', 'O011', 'O012'] 1
['O011', 'O011', 'O012'] 1
['O013', 'O013', 'O011'] 1
['O011', 'O011', 'O012', 'O012'] 1
['O014', 'O013', 'O013'] 1
['O013', 'O013', 'O012'] 1
['O012', 'O011', 'O011'] 1
['O011', 'O012', 'O013'] 1
['O013', 'O011', 'O011'] 1
['O012', 'O012', 'O012', 'O012'] 1
['O013', 'O013', 'O012', 'O012'] 1
['O014', 'O013', 'O011', 'O011'] 1
['O012', 'O011', 'O011', 'O011'] 1
['O013', 'O011', 'O012'] 1
This shows the sequence of their operations by patient count, - so 2785 patients have had just the one procedure, - O012. I want to create a new column with a boolean 'Are all the operations the same'. There is an itertools recipe for comparing the values in a list here I am a surgeon and my python skills are not up to applying it to the series, - how do I create a new column using this function?.
The series is OPERTN_01_list
I tried
from itertools import groupby
def all_equal(iterable):
g = groupby(iterable)
return next(g, True) and not next(g, False)
My dataset is mo
(multiple operations), so I tried to apply the function all_equal
to the series
mo['eq'] = all_equal(mo['OPERTN_01_list'])
but the new column mo['eq']
had all false values.
I am not sure the best way to implement the function.
When you execute your function here
all_equal(mo['OPERTN_01_list'])
This returns a single value because the method sees mo['OPERTN_01_list']
as the iterable
, rather than each row. Therefore it's checking something like this:
does row0 == row1? -> False
does row1 == row2? --> False
...
does rowN-1 == rowN? --> False
Seeing as the overall value is False
, setting it to the mo['eq']
series repeats it for all rows. See this question/answer
There are at least three different approaches to getting what I think you want.
.apply
Apply the function over the contents of each row in the mo["OPERTN_01_list"]
series.
mo["OPERTN_01_list"].apply(all_equal)
Out
# showing a sample of 5 rows for brevity
26 True
16 False
9 False
11 False
1 True
.transform
Pretty much the same as .apply
but especially useful in .groupby
operations
mo["OPERTN_01_list"].transform(all_equal)
Out
# showing a sample of 5 rows for brevity
1 True
11 False
2 True
8 False
42 False
np.vectorize
and treat mo["OPERTN_01_list"]
as the inputThis will allow you to keep your code the same but with one minor change
# vectorize the function
all_equal = np.vectorize(all_equal)
all_equal(mo["OPERTN_01_list"])
Out
# note that this returns a `np.array` instead of a `pandas.Series`
[ True True True True True True True True False False True False
False False True True False True False False False False False False
False False True False False False False False False False False False
False False True False False False False]
Using any of these should get you the result you desire, but I may suggest using one of the first two incase your index changes 🙂