pivot table, aggregate, data, groupby
Aggregate, Group By and Pivot with... Pivot Table!
Ajda Pretnar
Aug 27, 2019
Orange recently welcomed its new Pivot Table widget, which offers functionalities for data aggregation, grouping and, well, pivot tables. The widget is a one-stop-shop for pandas' aggregate, groupby and pivot_table functions.
Let us see how to achieve these tasks in Orange. For all of the below examples we will be using the heart_disease.tab data.
pandas.DataFrame.agg
The first task is computing a simple mean for the column age.
>>>df['age'].agg('mean')
54.43894389438944
In Orange:
In Pivot Table we set Values to age and set aggregations to mean. There is no way to turn off splitting by rows in Pivot Table, but the values in Total report the mean value for the chosen attribute.
An even better way to observe simple statistics is in Feature Statistics widget. The widget also reports on the mean value of each attribute with handy distributions plots included.
Yet another way to observe a mean value is in a Box Plot.
pandas.DataFrame.groupby
The second task is grouping the data by a discrete column. Say we want to group by gender and report the mean value for each column.
>>> df.groupby(['gender']).mean()
age rest SBP ... ST by exer. major ves. col. diameter narrowing
gender
female 55.721649 133.340206 ... 0.867010 0.546392 0.257732
male 53.834951 130.912621 ... 1.120874 0.732673 0.553398
[2 rows x 9 columns]
In Orange:
In Pivot Table set Rows to gender and aggregation method to mean. The Values option in this example has no effect. Now, connect Data Table to Pivot Table. Finally, reset the connections. Pivot Table outputs three types of data - Pivot Table, Filtered Data, and Grouped Data. The output we are looking for here is Grouped Data.
This is our data table. Exactly what we were looking for.
pandas.DataFrame.pivot_table
The third, final task is constructing a pivot table where rows are values of diameter narrowing, columns are values of gender and the values in cells is the mean of age for each subgroup. In other words, we want to see the average age of females with diameter narrowing, males with diameter narrowing, females without diameter narrowing and males without diameter narrowing.
>>> pd.pivot_table(df, values='age', index=['diameter narrowing'], columns=['gender'], aggfunc=np.mean)
gender female male
diameter narrowing
0 54.555556 51.043478
1 59.080000 56.087719
In Orange:
In Pivot Table set Rows to diameter narrowing, Columns to gender, Values to age and aggregation method to mean. The widget already offers a view of the final data table, but we can also output it and use it in other Orange widgets.
Pivot Table widget really versatile - like a Swiss knife for data transformation.