pivot table, aggregate, data, groupby

Aggregate, Group By and Pivot with... Pivot Table!

Ajda Pretnar

Aug 27, 2019

Orange recently welcomed its new Pivot Table widget, which offers functionalities for data aggregation, grouping and, well, pivot tables. The widget is a one-stop-shop for pandas' aggregate, groupby and pivot_table functions.

Let us see how to achieve these tasks in Orange. For all of the below examples we will be using the heart_disease.tab data.

pandas.DataFrame.agg

The first task is computing a simple mean for the column age.

In pandas:

>>>df['age'].agg('mean')
54.43894389438944

In Orange:

In Pivot Table we set Values to age and set aggregations to mean. There is no way to turn off splitting by rows in Pivot Table, but the values in Total report the mean value for the chosen attribute.

An even better way to observe simple statistics is in Feature Statistics widget. The widget also reports on the mean value of each attribute with handy distributions plots included.

Yet another way to observe a mean value is in a Box Plot.

pandas.DataFrame.groupby

The second task is grouping the data by a discrete column. Say we want to group by gender and report the mean value for each column.

In pandas:

>>> df.groupby(['gender']).mean()
              age    rest SBP   ...  ST by exer.  major ves. col.  diameter narrowing
gender
female  55.721649  133.340206   ...   0.867010    0.546392          0.257732
male    53.834951  130.912621   ...   1.120874    0.732673          0.553398

[2 rows x 9 columns]

In Orange:

In Pivot Table set Rows to gender and aggregation method to mean. The Values option in this example has no effect. Now, connect Data Table to Pivot Table. Finally, reset the connections. Pivot Table outputs three types of data - Pivot Table, Filtered Data, and Grouped Data. The output we are looking for here is Grouped Data.

This is our data table. Exactly what we were looking for.

pandas.DataFrame.pivot_table

The third, final task is constructing a pivot table where rows are values of diameter narrowing, columns are values of gender and the values in cells is the mean of age for each subgroup. In other words, we want to see the average age of females with diameter narrowing, males with diameter narrowing, females without diameter narrowing and males without diameter narrowing.

In pandas:

>>> pd.pivot_table(df, values='age', index=['diameter narrowing'], columns=['gender'], aggfunc=np.mean)
gender                 female       male
diameter narrowing
0                   54.555556  51.043478
1                   59.080000  56.087719

In Orange:

In Pivot Table set Rows to diameter narrowing, Columns to gender, Values to age and aggregation method to mean. The widget already offers a view of the final data table, but we can also output it and use it in other Orange widgets.

Pivot Table widget really versatile - like a Swiss knife for data transformation.