menu

Questions & Answers

Pandas groupby throws: TypeError: unhashable type: 'numpy.ndarray'

I have a dataframe as shown in the picture:

problem dataframe: attdf

I would like to group the data by Source class and Destination class, count the number of rows in each group and sum up Attention values.

While trying to achieve that, I am unable to get past this type error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-100-6f2c8b3de8f2> in <module>()
----> 1 attdf.groupby(['Source Class', 'Destination Class']).count()

8 frames
pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
    458     table = hash_klass(size_hint or len(values))
    459     uniques, labels = table.factorize(values, na_sentinel=na_sentinel,
--> 460                                       na_value=na_value)
    461 
    462     labels = ensure_platform_int(labels)

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: 'numpy.ndarray'
attdf.groupby(['Source Class', 'Destination Class'])

gives me a <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f1e720f2080> which I'm not sure how to use to get what I want.

Dataframe attdf can be imported from : https://drive.google.com/open?id=1t_h4b8FQd9soVgYeiXQasY-EbnhfOEYi

Please advise.

Comments:
2023-01-23 00:30:13
Please don't post images of code/data (or links to them) either provide the code to create a dataframe, or paste the data as text so users can copy them to reproduce your question
2023-01-23 00:30:13
What is print (attdf.info()) ?
2023-01-23 00:30:13
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13264 entries, 0 to 13263 Data columns (total 5 columns): Source 13264 non-null int64 Destination 13264 non-null int64 Attention 13264 non-null float32 Source Class 13264 non-null object Destination Class 13264 non-null object dtypes: float32(1), int64(2), object(2) memory usage: 466.4+ KB None
2023-01-23 00:30:13
included link to csv of the dataframe now.
Answers(3) :

try using .agg as follows:

import pandas as pd

attdf = pd.read_csv("attdf.csv")

print(attdf.groupby(['Source Class', 'Destination Class']).agg({"Attention": ['sum', 'count']}))

Output:


                                 Attention      
                                       sum count
Source Class Destination Class                  
0            0                  282.368908  1419
             1                    7.251101    32
             2                    3.361009    23
             3                   22.482438   161
             4                   14.020189    88
             5                   10.138409    75
             6                   11.377947    80
1            0                    6.172269    32
             1                  181.582437  1035
             2                    9.440956    62
             3                   12.007303    67
             4                    3.025752    20
             5                    4.491725    28
             6                    0.279559     2
2            0                    3.349921    23
             1                    8.521828    62
             2                  391.116034  2072
             3                    9.937170    53
             4                    0.412747     2
             5                    4.441985    30
             6                    0.220316     2
3            0                   33.156251   161
             1                   11.944373    67
             2                    9.176584    53
             3                  722.685180  3168
             4                   29.776050   137
             5                    8.827215    54
             6                    2.434347    16
4            0                   17.431855    88
             1                    4.195519    20
             2                    0.457089     2
             3                   20.401789   137
             4                  378.802604  1746
             5                    3.616083    19
             6                    1.095061     6
5            0                   13.525333    75
             1                    4.289306    28
             2                    6.424412    30
             3                   10.911705    54
             4                    3.896328    19
             5                  250.309764  1132
             6                    8.643153    46
6            0                   15.249959    80
             1                    0.150240     2
             2                    0.413639     2
             3                    3.108417    16
             4                    0.850280     6
             5                    8.655959    46
             6                  151.571505   686
Comments:
2023-01-23 00:30:13
are you doing any other manipulation on the df? I tried with you .csv file and I get correct results with no errors
2023-01-23 00:30:13
actually no.. I'm also getting the right results on importing my own csv.. i don't understand what the problem was..

@Adam.Er8 and @jezarael helped me with their inputs. The unhashable type error in my case was because of the datatypes of the columns in my dataframe.

Original df and df imported from csv

It turned out that the original dataframe had two object columns which i was trying to use up in the groupby. Hence the unhashable type error. But on importing the data into a new dataframe right out of a csv fixed the datatypes. Consequently, no type errors faced anymore.

I had luck with the first answer. I was combining latitude and longitude into a tuple instead of using them independently which allowed me to arrive at the same thing if I did that instead of using the tuple-based approach.