A Better Way to Summarize Pandas Dataframes. | 1
|

A Better Way to Summarize Pandas Dataframes.

Describe is the first function I try on any new dataset. But I found a better one now.

I replaced it with Skimpy. It’s a small Python package that shows some extended summary results for a dataset. You can also run it on a terminal window without entering a Python shell.

You can install it from PyPI using the following command.

Related: How to Execute Shell Commands With Python?

pip install skimpy
Python

Related: Pandas Replace The Faster and Better Approach to Change Values of a Column.


Why Skimpy?

In a previous post, I shared three Python exploratory data analysis tools. With them, you can generate more complete reports about your datasets in the blink of an eye.

But what if you need a simpler cut?

If I had to start with a dataset, I’d run df.describe() almost all the time. It gives you a nice tabular view of important numbers.

The iris data set

But to study the dataset more closely, I have to create histograms and several other summaries.

This is where Skimpy helps us. With a single command, it generates more matrices and histograms about the dataset.

from skimpy import skim
skim(df)
Python

The quick dataset summary of Skimpy

The summary above contains more information in a visually organized way.

Each section summarizes variables of the same type. Numerical variables also include histograms. I find the first and last dates and frequency details about DateTime variables handy.


Summarize datasets in a terminal; You don’t need a Python REPL.

You don’t have to get into a Python reply or Jupyter Notebook every time to use Skimpy. You can use Skimpy CLI on the dataset to summarize.

skimpy iris.csv
TOML

Running the above command on a terminal will print the same result in the window and return.

This way, Skimpy is a convenient way to generate quick summaries of any dataset, even without writing any code.

Final thought

Skimpy is a new tool in the Python ecosystem to help us work with data more easily. Yet, it already solves a fantastic problem by generating extended summary results.

You can learn more about it from their GitHub page. And you can also contribute to improving the tool as well.


Thanks for the read, friend. It seems you and I have lots of common interests. Say Hi to me on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.

Similar Posts