Describe is the first function I try on any new dataset. But I found a better one now.
I replaced it with Skimpy. It’s a small Python package that shows some extended summary results for a dataset. You can also run it on a terminal window without entering a Python shell.
You can install it from PyPI using the following command.
In a previous post, I shared three Python exploratory data analysis tools. With them, you can generate more complete reports about your datasets in the blink of an eye.
But what if you need a simpler cut?
If I had to start with a dataset, I’d
run df.describe() almost all the time. It gives you a nice tabular view of important numbers.
But to study the dataset more closely, I have to create histograms and several other summaries.
This is where Skimpy helps us. With a single command, it generates more matrices and histograms about the dataset.
The summary above contains more information in a visually organized way.
Each section summarizes variables of the same type. Numerical variables also include histograms. I find the first and last dates and frequency details about DateTime variables handy.
Summarize datasets in a terminal; You don’t need a Python REPL.
You don’t have to get into a Python reply or Jupyter Notebook every time to use Skimpy. You can use Skimpy CLI on the dataset to summarize.
Running the above command on a terminal will print the same result in the window and return.
This way, Skimpy is a convenient way to generate quick summaries of any dataset, even without writing any code.
Skimpy is a new tool in the Python ecosystem to help us work with data more easily. Yet, it already solves a fantastic problem by generating extended summary results.
You can learn more about it from their GitHub page. And you can also contribute to improving the tool as well.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.