Standard Python libraries that could tell the memory usage and execution time of every line
It’s interesting to see how we improved measuring algorithm performance in python. About a decade ago, when I started coding in python, I stored time into variables at different points in my code. It is the ugliest way for sure, but at that time, I thought I’m smart.
A couple of years later, when I learned to use decorators in python, I created a function to do the same. I thought I got smarter.
But the python ecosystem has grown huge in the last decade. Its applications spread beyond data science and web app development. Along with this evolution, we improved the ways to do performance audits in Python.
The need for a more accurate measure of resource usage is high in the era of cloud computing. If you’re using AWS, Azure, G-Cloud, or any other cloud infrastructure, often you’ll have to pay for resource hours.
Also, Python is the prevalent language for data-intensive applications such as machine learning and distributed computing. Thus, understanding profiling and performance auditing are essential for every Python programmer.
In this article, we’ll discuss,
- the quick and dirty way to measure execution time;
- extracting an accurate summary of running durations;
- taking memory snapshots at different points;
Before moving on, let’s also discuss the old-school methods I’ve been using for years.
The old school methods I’ll never use again.
This method is my approach when I first started programming. I store the time values before and after the execution of a function. The difference is how long the process ran.
The below code snippet counts the number of prime numbers lesser than the input value. In the function at the beginning and at the end, I’ve written codes to capture time and calculate duration. If I need to code another function that requires a performance audit, I’ll have to do the same again.
from time import time
def count_primes(max_num):
"""This function counts of prime numbers below the input value.
Input values are in thousands, ie. 40, is 40,000.
"""
t1 = time()
count = 0
for num in range(max_num * 1000 + 1):
if num > 1:
for i in range(2, num):
if num % i == 0:
break
else:
count += 1
t2 = time()
print(f"Counting prime numbers took {t2-t1} seconds")
return count
print(count_primes(20))
I used this method for several years. The biggest problem I had was my codebase filling up with lines that snapshot time. Even on a small-scale project, these repetitive lines are annoying. It reduces the code’s readability and makes debugging a nightmare.
I was excited when I learned about decorators. They could make my python codes pretty again. I only have to put a decorator on top of each function.
A decorator takes a function, adds some functionality, and returns the modified. Here is mine that calculates and prints the execution times.
from time import time
def taimr(func):
def inner(*args, **kwargs):
t1 = time()
res = func(*args, **kwargs)
t2 = time()
print(f"Your function executino took {t2-t1} seconds")
return res
return inner
@taimr
def count_primes(max_num):
count = 0
for num in range(max_num * 1000 + 1):
if num > 1:
for i in range(2, num):
if num % i == 0:
break
else:
count += 1
return count
@taimr
def skwer(n):
return n ** 2
print(count_primes(20))
print(skwer(20))
I have created a decorator to capture time before and after executing a function and print the duration in the above code. I can annotate any function, and it’ll print the duration at every execution.
As you can see, I wrote a second function — skwer. Yet, this time I didn’t repeat any time capturing code. Instead, I annotated skwer too.
Decorators are great time savers. With them, the code now looks tidier. But there’s a caveat with this method to capture execution times.
@taimr
def fib(n):
if n < 2:
return n
return fib(n - 1) + fib(n - 2)
print(fib(10))
If your script contains a recursive function, one that calls itself, this will be a mess. A workaround I’ve been using for some time is to attach the decorator to a wrapper function.
Python has some standard libraries to solve these problems conveniently. Two of them that track running duration are ‘timeit’ and ‘cProfile.’
The quickest way to measure execution times.
Python standard installation includes timeit — a convenient way to measure execution time.
import timeit
def fib(n=20):
return n if n < 2 else fib(n - 1) + fib(n - 2)
print(timeit.timeit(fib, number=10))
With timeit, you don’t have to rewrite lines to capture time and do calculations manually. Also, timeit captures the execution of a statement. Hence don’t have to worry about recursive function calls.
Also, the IPython notebook has a great magic function that prints the running duration of cells. This feature has been super helpful when working in Jupyter notebooks.

A comprehensive collection of performance statistics.
Timeit is a convenient way to collect performance statistics. Yet, it doesn’t go deep and find which parts of your program are the slowest.
Another standard Python library, cProfile, could do it better.
import cProfile
...
def fib(n):
return n if n < 2 else fib(n - 1) + fib(n - 2)
cProfile.run("fib(20)")
Running the above script will give you an illustrative summary of each line.

The Python interpreter ran 21894 functions in six milliseconds to execute the four lines in our script. The interpreter spent most time running line number three, where we defined our Fibonacci function.
It’s remarkable. In a large-scale application, we’d know where we have bottlenecks with cProfile.
Executing my application function inside another function and also in a string literal is a discomfort. But, cProfile has a more convenient way to do it. It’s your personal preference which one to use.
import cProfile
...
with cProfile.Profile() as pr:
# Your normal script
print(fib(20))
print(fib(25))
print(fib(30))
pr.print_stats()
When auditing with cProfile, I usually prefer the Profile class over the run method. Yes, the run method is very convenient. Yet, I love the Profile class because it doesn’t expect me to run the function inside another. I have the flexibility to do what I need.
The memory leakage detective.
Both timeit and cProfile simplified a crucial problem Python programmers have. Pinpointing where the code spends most of its running time is a hint for further optimization opportunities.
Yet, running time is hardly the correct measure of an algorithm’s performance. Many other external factors distort the actual execution time. Often the OS controls it rather than the code itself.
Running time isn’t a measure of performance. It’s only a proxy for resource usage.
Because of these external complexities, we can not conclude that a long-running function is indeed a bottleneck.
Python standard libraries also have a way to estimate resource usage with precision — Tracemalloc.
Tracemalloc, which stands for Trace Memory Allocation, is a standard Python library. It allows you to take snapshots of memory usage at different points in your code. Later you can compare one with another.
Here’s a basic example of tracemalloc.
import tracemalloc
def fib(n):
return n if n < 2 else fib(n - 1) + fib(n - 2)
tracemalloc.start()
for i in range(25, 35):
print(f"{i}th fibonacci number is, {fib(i)}")
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
print("---------------------------------------------------------")
[print(stat) for stat in top_stats]
Running the above will output the memory usage of each line. Like cProfile, but memory instead of running time.

The fourth line of the code was the most significant memory consumer. The interpreter has gone through this line 28 times, and it used 424B of memory every time.
This amount is small in the example application. But in real-life applications, this will be significant and critical.
Further tracemalloc allows comparison between snapshots. With this feature, we can even create a map of memory usage by different components.
tracemalloc.start()
snap1 = tracemalloc.take_snapshot()
fib(40)
snap2 = tracemalloc.take_snapshot()
top_stats = snap2.compare_to(snap1, "lineno")
for stat in top_stats:
print(stat)
The above code will print how much memory each line consumed and how much the increment was from the last snapshot.

In our code, we calculated the 30th Fibonacci number in line 9 and took our first snapshot. Then we ran the calculation for the 40th Fibonacci and took another. The output says we’ve used 4664B of additional memory and 11 more execution of line number 5.
Conclusion
A critical aspect of successfully running software is an accurate measure of how much resources it uses. This understanding allows engineers to optimize CPU cores and memory to run the application.
Today, we use Python extensively in many projects. Because of its widespread community and ecosystem, the usage multi-folded in the recent past.
This article focused on how to trace back execution times and memory usage in a Python program. Python’s standard libraries allow us to find out these matrices at a line level, even on a multi-module application.
We discussed three built-in python libraries to do performance audits. Timeit is the most convenient and has an excellent blend with Jupyter notebooks. cProfile is a comprehensive execution time recorder. Finally, we discussed tracemalloc, which allows us to take memory snapshots at different points and compare them.
I hope measuring performance in Python is now cristal clear. But how would you make Python run faster? It’s still considered a slow programing language compared to Java and C++. Check out my previous article on boosting the performance of Python scripts.
Related: How to Speed up Python Data Pipelines up to 91X?
Did you like what you read? Consider subscribing to my email newsletter because I post more like this frequently.
Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium.