A Better Way to Summarize Pandas Dataframes.

Sept. 6, 2021

*Describe was the first function I try on any new dataset. But I found a better one now.*

I replaced it with Skimpy. It’s a small python package that shows some extended summary results for a dataset. You can also run it on a terminal window without entering a Python shell.

You can install it from PyPI using the following command.

Related: How to Execute Shell Commands With Python?

pip install skimpy

Related: Pandas Replace: The Faster and Better Approach to Change Values of a Column.


Why Skimpy?

In a previous post, I’ve shared three Python exploratory data analysis tools. With them, you can generate more complete reports about your datasets in the blink of an eye.

But what if you need a simpler cut?

If I had to start with a dataset, I’d run df.describe() almost all the time. It gives you a nice tabular view of important numbers.

Results of Pandas describe function.

But to study the dataset more closely, I have to create histograms and several other summaries.

This is where Skimpy helps us. With a single command, it generates more matrices and histograms about the dataset.

from skimpy import skim
skim(df)

Output of skimpy dataset summary

The summary above contains more information in a visually organized way.

Each section summarizes variables of the same type. Numerical variables also include histograms. I find the first last dates and frequency details about DateTime variables are handy.


Summarize datasets in a terminal; You don't need a Python REPL.

You don’t have to get into a Python reply or Jupyter notebook every time to use skimpy. You can use Skimpy CLI on the dataset to summarize.

skimpy iris.csv

Running the above command on a terminal will print the same result in the window and return.

This way, Skimpy is a convenient way to generate quick summaries of any dataset, even without writing any code.


Final thought

Skimpy is a new tool in the Python ecosystem to help us work with data more easily. Yet, it already solves a fantastic problem by generating extended summary results.

You can learn more about it from their GitHub page. And you can also contribute to improving the tool as well.

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us