Python is already an elegant language to program. But it doesn’t mean there is no room for improvement.
Pipe is a beautiful package that takes Python’s ability to handle data to the next level. It takes a SQL-like declarative approach to manipulate elements in a collection. It could filter, transform, sort, remove duplicates, perform ‘group by’ operations, and a lot more without needing to write a gazillion lines of code.
In this little post, let’s discuss simplifying our Python code with Pipe. Most importantly, we’ll construct custom reusable pipe operations to reuse in our project.
Let’s begin with
- an inspirational example;
- some helpful out-of-the-box pipe operations and;
- construct our own pipe operations.
If you’re wondering how you’d set it up, you can easily install Pipe with PyPI. Here’s what you need to do.
Start using pipes in Python.
Here’s an example of using a Pipe. Suppose we have a list of numbers, and we want to,
- remove all the duplicates;
- filter for only the odd numbers;
- square each element in the list, and;
- sort the values in ascending order;
here’s what we’d typically do in plain Python.
The above code is pretty readable. But here’s a better way using Pipe.
Both codes produce the same results. Yet the second one is more intuitive than the first one. Obviously, it has fewer lines of code as well.
This is how Pipe helps us simplify our codes. We can chain operations on a collection without writing separate lines of code.
But there are cooler operations available in Pipe, like the ones we used in the above example. Also, we can create one if we need something very unique. Let’s first explore some pre-built operations.
Most useful pipe operations
We’ve already seen a couple of pipes in action. But there’s more. In this section, let’s discuss some other useful out-of-the-box operations for data wrangling.
These aren’t the complete list of operations you could get with Pipe installation. For an extensive inventory, please consult the Pipe’s repository on GitHub.
I trust this is the most helpful pipe available for data scientists. We prefer doing it in Pandas, and I still like using it. But converting a list to a dataset sometimes feels like overkill. I could use this group-by-pipe operation on the go in all those cases.
The above code groups our datasets into odd and even numbers. It creates a list of two tuples. Each tuple has the name specified in the lambda function and the grouped objects. Thus the above code produces the following groups.
We can now perform actions separately for each group we create. Here’s an example that takes elements from each group and squares them.
Chain and Traverse
These two operations make it easy to unfold a nested list and make it flat. The chain does it step by step, and the traverse recursively until the list is not extended further.
The following is how the chain works.
As we can see chain has unfolded the list’s outermost level. 8 and 9 remain inside a nested list as they were already nested deep one level down.
Here are the results of using traverse instead.
Traverse unfolded everything it could.
I mostly use list comprehension to unfold lists. But it gets increasingly difficult to read and understand what’s happening in the code. Also, it’s difficult to recursively extend, as the traverse operation did in the above example, when we don’t know how many nested levels are there.
Take_while and Skip_while
These two operations work like the ‘where’ operation we used earlier. The critical difference is that take_while and skip_while stop looking into additional elements in the collection if certain conditions are met. While on the other hand, evaluates every element in the list.
Here’s how both take_while and where work for a simple task of filtering values less than 5.
The results of the above code would be as follows:
Please note that the take_while operation skipped the final ‘3’ whereas the ‘where’ process includes it.
Skip_while works much like take_while, except it, only includes elements when certain conditions are met.
As I mentioned earlier, these aren’t the complete list of things you can do with the Pipe Library. Please check out the repository for more built-in functions and examples.
Creating a new pipe operation
It’s relatively easy to create new pipe operations. All we need is to annotate a function with the Pipe class.
In the below example, we convert a regular Python function into a pipe operation. It takes an integer as input and returns its square value of it.
As we have annotated the function with the
@Pipe class, it becomes a pipe operation. In line 9, we used it to square a single number.
Pipe operations can take extra arguments as well. The first argument is always the output of its previous operation in the chain. We can have any additional arguments and specify them at the time of using them in the chain.
Extra arguments can even be a function.
In the following example, we create a pipe operation that takes an additional argument. The additional argument is a function. Our pipe operation is to transform every element of the list using the function.
It’s impressive to see how Python can be further improved.
As a practicing data scientist, I find Pipe very helpful in many everyday tasks. We could also use Pandas to do most of these tasks. However, Pipe scores excellent on improving code readability. Even novice programmers could understand the transformation.
A note here is that I haven’t used Pipe on large-scale projects yet. I’m yet to explore how it would perform on massive datasets and data pipelines. But I do believe this package would play a significant role in offline analytics.
Not a Medium member yet? Please use this link to become a member because I earn a commission for referring at no extra cost for you.